Welcome, Register  | Login
Search Options
Electronic Statistics Textbook
StatSoft Blog
  • Home
  • Products
    • STATISTICA Product Catalog
    • STATISTICA Product Overview
    • Connectivity and Data Integration Solutions
    • Data Mining Solutions
    • Decisioning Platform
    • Desktop Solutions
    • Enterprise Solutions
    • Power Solutions
    • Statistics Methods and Applications Book
    • Text Mining Solutions
    • Web-Based Solutions
    • Video Tutorials
    • Brochures
    • Request Quote
    • STATISTICA Upgrade Offer
  • Services
    • Services Overview
    • Custom Development
    • Consulting
    • Training
      • United States Course Schedule
    • Validation Services
  • Solutions
    • Solutions Overview
    • Automotive Manufacturing
    • Banking
    • Chemical and Petrochemical
    • Credit Cards
    • Consumer Product Goods
    • Credit Scoring
    • Food and Beverage
    • Government Agencies
    • Hedge Fund Applications
    • Heavy Equipment Manufacturing
    • Healthcare
    • Insurance
      • Health Insurance
      • Life Insurance
      • Property and Casualty Insurance
    • Manufacturing
    • Marketing
    • Medicare Fraud Detection
    • Pharmaceuticals
    • Power Industry
    • R Language Platform
    • Sarbanes-Oxley Compliance
    • SAS Alternative
    • Semiconductors
    • Sentiment Analysis
    • Six Sigma
    • Telecommunications
  • Support
    • Support Overview
    • Product Registration
    • Knowledge Base
      • Installation, Registration, & Licensing
      • User Interface
      • Analyses
      • Graphics
      • Graph Customization
      • Graphic Interactive Analysis
      • Reports
      • Spreadsheets
      • Data Import & Export
      • Data Manipulation
      • Workbooks
      • Output Management & Printing
    • Download
      • Version Manager
      • Video Tutorials
      • Webcasts
      • Brochures
      • White Papers
      • Help
      • Installation Instructions
      • STATISTICA Software Updates
      • Visual Basic Examples
      • Free STATISTICA 10 Trial
    • Books on STATISTICA
    • Electronic Statistics Textbook
    • Free STATISTICA 10 Trial
    • Blog
    • Forum
    • Section 508 Compliance
    • Privacy Statement
  • Customers
    • Customer Listing
    • Success Stories
    • Feedback
  • Academic
    • Academic Overview
    • Academic Customers
    • Academic Request Quote
  • Company
    • About StatSoft
    • History
    • Office Locations
    • News
    • Events
    • Webcasts
    • Newsletter
    • Reviews
    • Careers
    • Partners
    • Precision Benchmarks
  • Contact Us
Chat Live with StatSoft
Solutions
  • Insurance, Fraud Detection
  • Data Mining: How To Get Started
  • Financial, Credit Scoring
  • Hands-on Data Mining (video series)
Product Information
  • STATISTICA Scorecard
  • Text Miner
  • STATISTICA Data Miner Details
  • STATISTICA Data Mining Overview
  • STATISTICA Live Score
  • Market-Basket Analysis
  • Neural Networks
  • Process Optimization
What's New
  • America's Best Selling Pizza
  • Fri, 07 Jun 2013 15:38:00 GMT

  • StatSoft Again Receives High Ratings in KDnuggets™ Poll
  • Thu, 06 Jun 2013 16:00:00 GMT

  • StatSoft VP Invited to Discuss Emerging Big Data Technologies with India's Top Statisticians
  • Thu, 06 Jun 2013 15:54:00 GMT

Skip Navigation Links.
Collapse SubscriptionsSubscriptions
STATISTICA Newsletter
STATISTICA Webcasts
AnalyticBridge
YouTube
Twitter
Facebook
LinkedIn

Performance of STATISTICA on Large Data Sets and Computationally Intensive Analyses

1. Performance of STATISTICA compared to competing data analysis applications

One of the significant differentiators of the STATISTICA family of data analysis software is its performance on large data sets and computationally intensive applications, such as analyses requiring recursive access to data or complex data management and database query operations.

For example, in a recent carefully designed and conducted comparison of competing analytic software packages performed on a quad-core 64-bit machine running under a 64-bit Microsoft Windows operating system, STATISTICA outperformed other widely used data analysis packages by a wide margin:

  • Basic descriptive statistics for 30 variables (fields) and 9,000,000 rows or cases (data file size approximately 2.2 Gigabyte) were computed in approximately 3 seconds; the two major competing packages in the data analysis/BI market required 4.5 seconds (on the computing platform that purportedly also takes advantage of multiple processors) to 37 seconds.
  • Correlation matrices for 500 variables (fields) with 1,000,000 rows or cases (data file size approximately 4 Gigabytes) were computed in approximately 5 seconds; competing computing platforms required 20 to 65 seconds to perform the same task.
  • Basic data management operations as they are commonly required in data mining (predictive modeling) work (e.g., sub-setting of data) execute 3 to 4 times faster in STATISTICA.

Read more about  Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and PASW (formerly SPSS) Statistics Version 18; basic data management, basic statistics, and aggregation operations.

2. The performance optimization technology used in STATISTICA

The current version of STATISTICA software, including STATISTICA Data Miner , takes full advantage of state-of-the-art hardware and software technologies, as well as proprietary performance optimization technologies developed at StatSoft. STATISTICA is available as a native 64-bit application, and most STATISTICA computational (statistical) routines, as well as the key predictive modeling algorithms available in STATISTICA Data Miner, will take full advantage of multi-processor computing platforms.

Shown below are some performance benchmark data collected as part of the STATISTICA and STATISTICA Data Miner software validation and release process. Each analysis was repeated multiple times on 64 bit computers with either 1, 2, 3 or 4 processors (and otherwise identical hardware configurations). STATISTICA was designed to take advantage of available hardware resources to achieve maximum performance for complex predictive modeling analyses (e.g., via regression trees, stochastic gradient boosting, or random forests analyses), as well as common statistical analyses (e.g., computing correlation coefficients).

regression tree

stochastic gradient boosted tree 

complex random forest

correlation matrix

 

Performance of Predictive Modeling Algorithms

STATISTICA Data Miner contains multithreaded implementations of Classification and Regression Trees, CHAID, stochastic gradient boosting of trees (Boosted Trees), Random Forests (voting trees), and others, as well as multithreaded implementation of traditional generalized linear modeling techniques (e.g., logit regression, etc.). The performance of these predictive modeling algorithms on modern 64-bit multi-core hardware and 64-bit operating system platforms is spectacular, and as of this writing not matched by any other general software platform for predictive modeling (see also graphs shown above). Analyses with hundreds of variables and millions of cases will complete in minutes.

Data Buffering and Storage

In part, the unmatched performance of STATISTICA and STATISTICA Data Miner computational algorithms was achieved through carefully redesigned intelligent data access, storage, and buffering methods. Data can be read asynchronously in multiple threads servicing different parallel computations for a single (e.g., classification and regression trees) analysis. Data arrays are never stored explicitly in memory, so there are no limitations on file sizes; yet, the available memory is used intelligently to buffer the data (read by multiple threads) to make them available for computation.


Using these technologies, STATISTICA data analysis and STATISTICA Data Miner software has leapfrogged the competition.

Home   |   Products   |   Services   |   Solutions   |   Support   |   Customers   |   Academic   |   Company   |   Contact Us
Copyright © 2013 by StatSoft Inc. Privacy Policy