Written by: STATISTICA News 3/28/2011 7:10 AM
Recently I sat down with Dean Abbott and he took a moment to discuss his use of STATISTICA.
1. What is your background (career, education, etc.)?
My undergraduate degree is in Mathematics of Computation from Rensselaer Polytechnic Institute, with a minor in Electrical Engineering (1985). I have a Masters degree in Applied Mathematics from the University of Virginia (1987). I began working at a small aerospace engineering company in Stanardsville, VA (close to Charlottesville) while still in grad school, most working out optimal control equations (variational calculus) on an early smart-bomb project. However, the most interesting part of that application was using Polynomial Networks to encode the optimal guidance actions: my first predictive analytics application. In fact, several of those at that company went on to Data Mining fame, including most notably John Elder.
2. How do you use STATISTICA? Which modules?
Originally, about 4 years ago, I used STATISTICA for building neural networks for response modeling, and enjoyed its flexibility and automation. This led to using multiple neural networks to build ensembles of neural networks (deployed using STATISTICA's PMML code). I've also used the decision trees (including Random Forests), Support Vector Machines, Linear and Logistic Regression. Later, I used the Text and Document Mining module to extract unstructured data and make it structured. I actually look for excuses to try new algorithms I haven't yet tried!
3. What are some interesting consulting projects that you’ve used STATISTICA for?
Most recently, I've used STATISTICA in a new way for me, which is as a statistics tool. I'm working with small data (dozens of records) and was asked to compute the sample size required to achieve power levels of 0.8 (alpha of 0.05). I hadn't done this before, but found it very easy to do. In fact, I just finished writing a VB script to compute the power and sample size for 60+ variables--much easier than running the calculations for each variable, one at a time! I'm new to VB scripting, but once I had a framework, it was easy to modify the code to do what I needed.
4. What types of customers do you consult with (in other words what is your specialty)?
Broadly speaking, I help customers solve complex data mining problems sometimes by defining data mining processes for them, sometimes by helping them build data (with data mining in mind), sometimes building models, and usually teaching them how to do it themselves along the way. Customers include public sector, such as Department of Defense and Department of Treasury, as well as private sector, including both Fortune 500 and small companies. Recently, I've had several projects involving risk analysis and fraud detection, but these things go in cycles; I don't work in any single vertical market exclusively.
5. What advantages do you see in STATISTICA compared to other general tools?
STATISTICA has a comprehensive set of functions available to the analyst at a cost entry point that is very attractive. I alluded to the breadth of algorithms already, but it is much more than just a bag of methods. There are comprehensive sets of visualization, sampling, data prep, and core statistical functions available--far too many to describe succinctly here. And, because it has a friendly GUI, these are accessible to a new user in a way that is not possible with code/programming-based tools. There are few tools on the market that have everything I do, from ensembles of neural networks and trees, to text mining, power computations factor analysis, data visualization, etc.
6. What are STATISTICA strengths?
In addition to what I described in the answer above, I very much appreciate the responsiveness of Statsoft technical support. I had a specific need for a specific kind of chart. Tech support helped me build a custom Visual Basic code that enabled me to create that graph I needed to, and then this feature was added into the next release of STATISTICA.
7. Would you recommend STATISTICA to a colleague or friend? Why?
I recommend STATISTICA and have done so on many occasions. Some customers already have a tool they use, and even if they do, STATISTICA will certainly complement what they can do and likely will provide additional functionality their tool cannot do.