In predictive analytics, several steps are necessary before building models. In addition to understanding the needs and goals of the project, the data should be prepared. Real-world data comes with real-world problems, as I was reminded reading a blog post recently. The success of the project often times is sabotaged by these data issues. So, is your data ready?
Preparing data for analysis is a tricky process. There is no one-size-fits-all way to do things. There are some valid approaches to choose from in the face of various challenges. These challenges include, but are certainly not limited to, missing values, data entry errors, points outside the valid range, outliers, impossible combinations, duplicate records, invariant data, structure and issues of data organization, and the list goes on.
Finding these data issues with the data can be accomplished with graphs and summary tables, as seen here:
STATISTICA offers easy to use tools for dealing with the issue of outliers,
and various other data issues.
These videos are part of a 35 video series on the topic of data mining and predictive analytics.