Request a Quote

Discover how we can put
your data to work for you.

STATISTICA News and Blogs

CRISP: Data Mining Session 2

by Kyra on Tuesday, June 25, 2013 2:20 PM

Hello fellow statistical newbs, as well as the better versed. In the last entry we reviewed “Learning Data Mining: Session 1.” In that session, Jennifer taught us that data mining projects can be either supervised learning, where a specific target variable is used such as in classification or regression type projects, or unsupervised learning such as clustering. Unsupervised learning can encompass more than just clustering. Overall, we learned that data mining simply helps find meaning and value within heaps of information. The series will continue to describe the entire data mining process.  

In this second video in the session, Jennifer discusses the standard data mining process known as CRISP. CRISP stands for Cross Industry Standard Process for data mining. Until CRISP, there was no standardized process, leaving people to come up with their own processes.  “The pioneers of the field collaborated to make a standardized process. CRISP is applicable in any industry, using any data mining software.  (Jennifer Thompson)” The CRISP process has helped to make data mining projects faster, more efficient and more cost effective.  

STATISTICA Data Miner offers the tools, Data Miner Recipes, Data Miner Workspace, and interactive Data Miner. Data Miner Recipes lays out the steps of a data mining project using CRISP.  Data Miner Workspace provides a structured work flow for data mining projects using CRISP.  Interactive Data Miner is a dialog-driven approach to CRISP.  

Next, the video goes into the process of using CRISP.

The first step, business understanding, is critical. During this step, the goals of the project are defined. One needs to determine what can be learned from the data. What questions can be answered? What business objectives can be met? It is important to define a clear plan in order to gauge the success of the project.

The next step is data understanding – accessing, collecting, and exploring data. This step requires professionals in the field. With a clear understanding of the business goals, the data are explored.   The key here is to find relationships in data that trigger business understanding. Goals and hypotheses for the project are defined by looking at the interrelationship between business and data understanding.  

Data preparation is the next and most time consuming part of the process – sometimes taking up to 80% of the project efforts. Data preparation includes cleaning the data or taking out unnecessary data (re-coding outliers, handling missing data).

After the data is collected, explored, and cleaned, it must be displayed. There are many ways to model your data, which will be discussed in future sessions.  Once the model(s) have been created, they must then be reviewed. Evaluation of the models is necessary to determine which best reflect the business goals of the project. In this phase, we determine how to use our models.   

By the deployment phase, we should have a model to best meet our business objective. Deployment uses the model to score new data.  and make final predictions. The next session will focus on examples of how one can use CRISP.

 

Author
Kyra

My name is Kyra Matzdorf and I am the new marketing assistant here at StatSoft. I have no background in analytics or statistics, but rather in writing and communications. So, with this blog, I intend to delve into a variety of topics and find a way to relate them back to statistics. Hopefully, these entries will help broaden the horizons for statistics to the masses. I am excited to learn more about the uses and values of statistical analysis, as well as how I might apply it to my own life. Hopefully your eyes will be opened to a new view on the subject as well!

Content

Contact Us

StatSoft, Inc
2300 East 14th Street
Tulsa, Oklahoma, 74104
(918) 749-1119
info@statsoft.com