Written by: STATISTICA 10/6/2009 2:09 PM
I am still reading The Handbook of Statistical Analysis and Data Mining Applications. I am not reading straight through the book, instead I am jumping around to sections that interest me.
Currently I am reading a section titled "WHAT IS DATA MINING?" (page 17, if you have your own book). It is a very short definition at 1.5 pages.
But I wanted a more approachable definition. So I talked to a couple of Statsoft statisticans. I asked google.com for its thoughts. Then I did some of my own thinking.
Data is like water. It is everywhere, but it isn't very useable for drinking. First you need to clean it.... boil and filter.
Regular statistical analysis is like a dam holding back the water. It forces the water to fit into a particular shape. The data should have a linear fit like below.
Some water might spill over the dam and be lost. Maybe the new dam has created flood conditions that never existed before. There are unexpected and unwelcome results by forcing the data.
Data mining is living without the dam. The data/water goes where it wants to... It reveals hidden patterns.
/aw
Image Credit for "Surface This Way" Sign: http://www.flickr.com/photos/pagedooley/ / CC BY 2.0
0 comment(s) so far...