Hello world, my name is Kyra Matzdorf and I am the newest member of the StatSoft Marketing team. I have been here for a couple of weeks now and am only starting to understand some of what goes on around here. One phrase heard tossed around here is “data mining”-- a term completely foreign to me. So, to get somewhat of a better grasp on the idea, I am going to watch a series produced by StatSoft called the “Data Mining” series. I will watch each video, assuredly a few times, and then write about them, in hopes to ease the average, non-statistician’s journey into understanding Data Mining. I also hope to prove that Data Mining can be applicable and helpful in solving problems in many business settings.
The first session in the series is called “Data Mining Overview.” In it, the presenter gives examples of how data mining can solve business problems. The main thing that I learned from this session is that Data Mining really can be applicable to many industries. I also learned about the three basic types of data mining applications. One type of data mining application is classification. This type of application would be used when the variable is “categorical in nature.” A good example of an industry that would benefit from classifying information can be seen in financial institutions. Say, for example, that a bank collected a wealth of information about its customers. The bank could then classify customers, based on previous credit records, into “good” or “bad” credit risk categories. With this model, the bank could now determine if and how much credit to extend to certain customers. This predictive model could save financial institutions time, effort and money.
Regression is a data mining application that can best be applied when the variable in question is continuous. A few examples could be predicting measurement for a manufacturing process, predicting revenue in dollars, or predicting a decrease in cholesterol after taking medication. At a beverage plant, a predictive model can determine which variables influence the bottling process and, ultimately, help the process run more smoothly. One can see how a regressive model could be helpful in saving time and money in many different industries as well.
The third type of application of data mining is called clustering and is "used when there is no specific variable of interest. The goal is finding groups of similar cases based on the variables that were recorded." A marketing firm could benefit from clustering customers based on interests, needs, demographic, etc. This would then allow the firm to deploy marketing campaigns more effectively. A medical firm could also benefit from a cluster analysis by clustering symptoms and then treat patients more effectively.
Whether through classification, regression, or clustering, almost any industry can benefit from the use of data mining. Overall, data mining simply helps in finding value and meaning within heaps of information. To a business, this means more effectiveness and a higher bottom line. Throughout this series, I hope to change the perspective of data mining from a scary, foreign term to an easy to understand tool that one might apply to his/her own life.
Image Credit: http://www.flickr.com/photos/brizo_the_scot/5087682344/