Written by: STATISTICA News 8/30/2010 2:33 PM
John Elder is a seasoned Data Miner. He recommends STATISTICA to his clients and uses it so much that he is a partner with StatSoft. He founded Elder Research, Inc., a leading consulting company in data mining, predictive analytics, text mining, and optimization. He also co-wrote the book The Handbook of Statistical Analysis and Data Mining Applications with one of StatSoft’s own, Gary Miner, Ph.D. Dr. Elder took the time to tell us about himself and how he has come to admire our software.
• What is your background? I'm an Engineer. I love algorithms and real-world problem-solving. I worked five years in a small, innovative aerospace consulting firm as #2 to a hard-driving leader. I loved one thing we did, known as "Inductive Modeling" or crafting models to generalize a set of specific cases - what's now called "data mining". To focus on that, I went to grad school at UVA in Charlottesville, where I was a townie [I'd moved here kicking and screaming from the center of the universe (DC), but I came to love it here]. I had a blast at UVA, where they let me craft a Ph.D. in Systems Engineering drawing from multiple fields to concentrate on data mining and optimization. I then enjoyed a two-year post-doc back at my undergraduate alma mater, Rice University, where I wrote a lot and connected with leaders in academia. In 1995, I returned to Charlottesville to concentrate on the consulting that was increasingly being requested, and started up Elder Research, Inc. (ERI).
• Has the company grown since then? Yes, though we’re just two dozen people, we're the largest consultancy in data mining. Growth isn’t fast as it's an apprenticeship process. We look for people who are strong technically and who are also humble, honest, take pride in their work, and serve others. Clients have to get value from their consultants, but they also want to like them. It’s a wonderful team to work with.
• What led you to become a partner with StatSoft? I'd long admired the software as having great power at a reasonable price. ERI knows DM tools, as we have the world's best lab of commercial and academic mining software. We often recommend STATISTICA to clients.
And, I got to know StatSoft much better after working closely with Gary Miner (and independent consultant Bob Nisbet) to write The Handbook of Statistical Analysis and Data Mining Applications.
• Does that book have lots of STATISTICA examples? Yes, it's written directly for practitioners, not academics, so a large chunk of it – and more on the included DVD and associated website – consists of step-by-step examples on real problems. Most of those use STATISTICA Data Miner, though other leading tools are also demonstrated. We were pleased that the book won the PROSE Award for Mathematics last year (and amused, as one doesn't often hear "prose" and "mathematics" in the same breath).
• What types of projects do you find the most interesting or challenging? ERI initially worked primarily on Wall Street, helping to build quantitative models for hedge funds. We had some strong successes, which put us on the map. After the tragedy of 9/11, we dove hard into National Security tasks. All along, we’ve had commercial clients, especially in fraud detection and increasingly in text mining and social network analysis. All of these are fascinating areas! Though focusing on a vertical might make more business sense, the variety we see is very refreshing and it helps ERI retain great people. We also often learn techniques in one domain that can be fruitfully ported to another.
• What kinds of challenges do projects encounter the most? I wrote and speak on the “Top 10 Data Mining Mistakes” – the primary problems plaguing projects (See Chapter 20 of the Handbook). Two main themes are:
1) There is much you need to know about a project that’s not in the data. You have to listen carefully to the people working the challenge the old way, and learn all you can. Use good sense and uncover what hoops the data went through to get to you.
2) Out-of-sample performance is all that matters. It’s so easy to let information from the future corrupt your experiment. The evaluation data has to be a complete and utter surprise to a model built on the training data. Otherwise, you’re going to think your performance is better than it really is – a formula for disaster.
• How can an analyst best succeed? Well, the best way to learn is to dive in, but inspiration and guidance from others can also help. Let me plug my course that’s coming up soon: Tools for Discovering Patterns in Data: Extracting Value from Tables, Text, and Links. It’s September 13-14 in Charlottesville, Virginia. I focus very much on practical suggestions. I overview the field and focus particularly on three key technologies: re-sampling, visualization, and ensemble models (on the third, Giovanni Seni and I wrote a slim book this year: Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions).
The good news is that STATISTICA excels at having multiple different modeling techniques, with beautiful visualization. The DVD in the Handbook (free in the course) comes with a trial version of STATISTICA Data Miner, so you can quickly try what you’re learning.
With such great tools, and such interesting projects, it’s a fantastic time to be analyzing and predicting!
0 comment(s) so far...