Welcome, Register  | Login
Search Options
Electronic Statistics Textbook
StatSoft Blog
  • Home
  • Products
    • STATISTICA Product Catalog
    • <div class="hr"></div>
    • STATISTICA Product Overview
    • Enterprise Solutions
    • Decisioning Platform
    • Web-Based Solutions
    • Data Mining Solutions
    • Text Mining Solutions
    • Desktop Solutions
    • Connectivity and Data Integration Solutions
    • Power Solutions
    • Statistics Methods and Applications Book
    • <div class="hr" id="hr2"></div>
    • Video Tutorials
    • STATISTICA Brochures
    • Request Quote
    • STATISTICA Upgrade Offer
  • Services
    • Services Overview
    • Custom Development
    • Consulting
    • Training
      • United States Course Schedule
    • Validation Services
  • Solutions
    • Solutions Overview
    • <div class="hr"></div>
    • Automotive Manufacturing
    • Banking
    • Chemical and Petrochemical
    • Credit Cards
    • Consumer Product Goods
    • Credit Scoring
    • Food and Beverage
    • Government Agencies
    • Hedge Fund Applications
    • Heavy Equipment Manufacturing
    • Healthcare
    • Insurance
      • Health Insurance
      • Life Insurance
      • Property and Casualty Insurance
    • Manufacturing
    • Marketing
    • Pharmaceuticals
    • Medicare Fraud Detection
    • Power Industry
    • R Language Platform
    • SAS Alternative
    • Semiconductors
    • Six Sigma
    • Sarbanes-Oxley Compliance
  • Support
    • Support Overview
    • Product Registration
    • Knowledge Base
      • Installation, Registration, & Licensing
      • User Interface
      • Analyses
      • Graphics
      • Graph Customization
      • Graphic Interactive Analysis
      • Reports
      • Spreadsheets
      • Data Import & Export
      • Data Manipulation
      • Workbooks
      • Output Management & Printing
    • <div class="hr" id="5"></div>
    • Download
      • Video Tutorials
      • Webcasts
      • <div class="hr"></div>
      • Brochures
      • White Papers
      • <div class="hr" id="hr2"></div>
      • Example Applications
      • Help
      • Installation Instructions
      • STATISTICA Software Updates
      • Version Manager
      • Visual Basic Examples
      • <div class="hr" id="3"></div>
      • Free STATISTICA 10 Trial
    • Books on STATISTICA
    • Electronic Statistics Textbook
    • <div class="hr" id="4"></div>
    • Free STATISTICA 10 Trial
    • <div class="hr" id="7"></div>
    • Blog
    • Forum
    • <div class="hr" id="6"></div>
    • Section 508 Compliance
    • Privacy Statement
  • Customers
    • Customer Listing
    • Success Stories
    • Feedback
  • Academic
    • Academic Overview
    • Academic Customers
    • Academic Request Quote
  • Company
    • About StatSoft
    • History
    • Office Locations
    • <div class="hr"></div>
    • News
    • Events
    • Webcasts
    • Newsletter
    • Reviews
    • <div class="hr" id="hr2"></div>
    • Careers
    • Partners
  • Contact Us
Chat Live with StatSoft
Solutions
  • Insurance, Fraud Detection
  • Data Mining: How To Get Started
  • Financial, Credit Scoring
  • Hands-on Data Mining (video series)
Product Information
  • STATISTICA Scorecard
  • Text Miner
  • STATISTICA Data Miner Details
  • STATISTICA Data Mining Overview
  • STATISTICA Live Score
  • Market-Basket Analysis
  • Neural Networks
  • Process Optimization
What's New

STATISTICA cited in prestigious journal

Mon, 14 May 2012 14:05:00 -0500

What American Women Do For Work, via NPR

Fri, 04 May 2012 14:00:00 -0500

STATISTICA Enterprise™ Helps Enhance the Operational Process Flow at Instrumentation Laboratory, a Global Medical Device Manufacturer

Fri, 27 Apr 2012 17:49:00 -0500

Skip Navigation Links.
Collapse SubscriptionsSubscriptions
STATISTICA Newsletter
STATISTICA Webcasts
AnalyticBridge
YouTube
Twitter
Facebook
LinkedIn

Performance of STATISTICA on Large Data Sets and Computationally Intensive Analyses

1. Performance of STATISTICA compared to competing data analysis applications

One of the significant differentiators of the STATISTICA family of data analysis software is its performance on large data sets and computationally intensive applications, such as analyses requiring recursive access to data or complex data management and database query operations.

For example, in a recent carefully designed and conducted comparison of competing analytic software packages performed on a quad-core 64-bit machine running under a 64-bit Microsoft Windows operating system, STATISTICA outperformed other widely used data analysis packages by a wide margin:

  • Basic descriptive statistics for 30 variables (fields) and 9,000,000 rows or cases (data file size approximately 2.2 Gigabyte) were computed in approximately 3 seconds; the two major competing packages in the data analysis/BI market required 4.5 seconds (on the computing platform that purportedly also takes advantage of multiple processors) to 37 seconds.
  • Correlation matrices for 500 variables (fields) with 1,000,000 rows or cases (data file size approximately 4 Gigabytes) were computed in approximately 5 seconds; competing computing platforms required 20 to 65 seconds to perform the same task.
  • Basic data management operations as they are commonly required in data mining (predictive modeling) work (e.g., sub-setting of data) execute 3 to 4 times faster in STATISTICA.

Read more about  Performance comparison of STATISTICA Version 9 on multi-core 64-bit machines with current 64-bit releases of SAS (Version 9.2) and PASW (formerly SPSS) Statistics Version 18; basic data management, basic statistics, and aggregation operations.

2. The performance optimization technology used in STATISTICA

The current version of STATISTICA software, including STATISTICA Data Miner , takes full advantage of state-of-the-art hardware and software technologies, as well as proprietary performance optimization technologies developed at StatSoft. STATISTICA is available as a native 64-bit application, and most STATISTICA computational (statistical) routines, as well as the key predictive modeling algorithms available in STATISTICA Data Miner, will take full advantage of multi-processor computing platforms.

Shown below are some performance benchmark data collected as part of the STATISTICA and STATISTICA Data Miner software validation and release process. Each analysis was repeated multiple times on 64 bit computers with either 1, 2, 3 or 4 processors (and otherwise identical hardware configurations). STATISTICA was designed to take advantage of available hardware resources to achieve maximum performance for complex predictive modeling analyses (e.g., via regression trees, stochastic gradient boosting, or random forests analyses), as well as common statistical analyses (e.g., computing correlation coefficients).

regression tree

stochastic gradient boosted tree 

complex random forest

correlation matrix

 

Performance of Predictive Modeling Algorithms

STATISTICA Data Miner contains multithreaded implementations of Classification and Regression Trees, CHAID, stochastic gradient boosting of trees (Boosted Trees), Random Forests (voting trees), and others, as well as multithreaded implementation of traditional generalized linear modeling techniques (e.g., logit regression, etc.). The performance of these predictive modeling algorithms on modern 64-bit multi-core hardware and 64-bit operating system platforms is spectacular, and as of this writing not matched by any other general software platform for predictive modeling (see also graphs shown above). Analyses with hundreds of variables and millions of cases will complete in minutes.

Data Buffering and Storage

In part, the unmatched performance of STATISTICA and STATISTICA Data Miner computational algorithms was achieved through carefully redesigned intelligent data access, storage, and buffering methods. Data can be read asynchronously in multiple threads servicing different parallel computations for a single (e.g., classification and regression trees) analysis. Data arrays are never stored explicitly in memory, so there are no limitations on file sizes; yet, the available memory is used intelligently to buffer the data (read by multiple threads) to make them available for computation.


Using these technologies, STATISTICA data analysis and STATISTICA Data Miner software has leapfrogged the competition.

Home   |   Products   |   Services   |   Solutions   |   Support   |   Customers   |   Academic   |   Company   |   Contact Us
Copyright (c) 2012 www.statsoft.com Privacy Statement   |  Terms Of Use