STATISTICA 12 New Features

STATISTICA 12 New Features

  Arrow icon

New to STATISTICA?

Request a quote and get a FREE consultation

  Arrow icon

Already using STATISTICA?

Upgrade now and get a FREE consultation

Better. Bigger. Faster.

I have been extremely happy with version 12’s overall performance and stability, even with very large data sets.
Manager, Strategic Marketing and Pricing Fortune 100 Company

An explosive combination of Big Data growth, digital storage capabilities, and technological advances has forever altered the modern business analytics landscape. The application of analytic tools and decision making is no longer limited to the realm of data scientists, computer programmers, engineers, and the like. Rather, analytics are now being integrated into day-to-day tasks across all departments, utilized by project managers, business analysts, predictive modelers, customer agents, and executive leaders who need access to sensible, actionable information. People who need visual user interfaces to create, consume, and share KPIs, graphs, reports, slide presentations, and more.

An example image of the new STATISTICA workspace
An example of the new workspace in version 12. (Click to enlarge)

To meet these changes head on, we made STATISTICA even faster, more flexible, and more functional than ever:

  • We boosted the Big Data performance of the entire product line.
  • We added a visual user interface to write SQL queries with the new Advanced Query Builder in all products .
  • We reinvented the visual analytic workspace in STATISTICA Enterprise and Data Miner for a more intuitive user experience, with greater visual workflow and storage capabilities to help users understand and communicate their findings.
  • We strengthened the predictive/prescriptive capabilities of Decisioning Platform®.
  • We introduced the highly flexible Reporting Tables product that enables users to visually build tables of summary statistics and use them in presentations and other reports.
  • We developed new nodes, such as the practical Data Health Check® that facilitates cleanup of a large number of variables.

With the rollout of STATISTICA 12 in April 2013, StatSoft builds on its nearly 30-year legacy of exceeding customer expectations, furnishing this ever-growing business landscape with a host of relevant features and performance improvements that will make our analytic solutions even faster, more accessible, and more effective for business leaders and power users alike.

We fit into your IT world better than any alternative. Whether handling medium data or Big Data, STATISTICA 12 takes greater advantage of existing data warehouses and IT tools than ever before, helping move businesses even closer and faster to meaningful ROI.

Advanced Query Builder

Advanced Query Builder (AQB) makes it possible for even non-technical staff to write complex queries to retrieve data . It has a new visual user interface to build queries (dragging, dropping, nesting, selecting). The application's parsing engine determines the current context.

the new advanced query builder dialog

Offering features usually found only in specialized applications, AQB can build left, right, and full outer joins graphically; can build queries with aggregate functions; is capable of building complex queries involving unions and minus operations; can graphically represent complex SQL queries and ER diagrams; and can provide the means for SQL dialect to be changed when the universal default is not practical.

Spreadsheet Improvements

New File Format for Better Support of Big Data

STATISTICA now features a new data file format that is optimized for Big Data by supporting variable storage length for text variables. When text variables include sparsely populated columns, the space occupied by those values is automatically optimized, reducing spreadsheet sizes sufficiently to produce significant performance improvements.

Spreadsheet “Virtual Variables”

Spreadsheets now use virtual variables that can be specified by formula and evaluated at run time, requiring no real storage.  These virtual variables are added or deleted behind the scenes without needing to rewrite entire spreadsheet data sections, so users will notice only enhanced performance.  New data hides in a separate vector on disk and is reunited with the original spreadsheet when data is saved. This especially adds significant performance improvements to large spreadsheets when you need to add transformed variables.

Increase in Text Labels

Text Label support in spreadsheets has now been increased to millions of distinct labels with significant performance improvements for name/value lookup. This makes Text Labels a good choice for text fields with large numbers of distinct values, inheriting all the performance benefits from a fixed storage size of the numeric value and avoiding duplication of repeated values.

Aggregate Function in OLE DB Provider for STATISTICA Spreadsheets

The OLE DB provider now allows for the utilization of aggregate functions such as average, count, max, min, or sum.

Importing Text Files Using Auto-Fixed Importing Variable Operations

This enhancement to STATISTICA provides the ability to take blocks of data that contain fixed-length pieces of information, and specify the fixed length to import variable- specific information.

the new text file import prompt

STATISTICA now has the option for a Fixed import setting.

Data Visualization

Several new options have been added to provide additional features and tools for visualizing data.

  • "Orthogonal regression" fit type is now supported in 2D scatterplots
  • Points on graphs can now be annotated
  • New options in compound graphs improve visual appearance by controlling the scaling display
  • A new data file can be created by brushing the points to be included
  • Date and time s upport was added for “meaningful time intervals” in graph scales
  • Now you can modify the margins of all plots in an original graph (e.g., multi-graph layout)
  • Create Pareto charts more easily
  • We added a new graph type, the parallel coordinate plot, which shows multiple variables, side-by-side, on comparable scales, thus making it easier to compare values across variables (see below).
    an example of the new graph type parallel coordinate plot
    Each Y-axis corresponds to a variable in a STATISTICA spreadsheet and can be defined according to standalone values or two-sided values (e.g., range boundaries, upper and lower limits, etc.)

Statistics

False Discovery Rate

False Discovery Rate (FDR) and Qvalues were added. FDR performs the Benjamini and Hochberg method, and Qvalues performs the method described in the 2002 Storey paper .

New Distributions

New distributions were added to the Probability Distribution Calculator, STATISTICA Visual Basic functions, and spreadsheet functions. These are for hypergeometric distributions (inverse, cumulative, prob) and the inverse P oisson and inverse binomial distributions.

Stepwise Model Builder (STATISTICA Advanced)

Stepwise Model Builder provides control over model building and gives the modeler a “what-if” environment. This is useful when regulation or a company’s standard practices limit which variables can be used to build models. For example, a bank cannot discriminate based on age or gender.

Negative Binomial Distribution (STATISTICA Advanced)

This new option is available within GLZ. It enables you to specify the Negative Binomial as the distribution for the response variable. This specific form is referred to as the Poisson-Gamma mixture form and is the discrete analog to the continuous gamma distribution.

Quality Control Charts (STATISTICA Quality Control)

Quality Control now includes options that can set the background color for in control, out of control, and out of warning lines on quality control graphs.

Other

Microsoft Office 2010 Style Toolbars

STATISTICA now uses the Office 2010 style toolbars. The Help menu has been moved to the File tab.

Search Facility

Now you can search for modules by name, select a module, and start it. This feature indexes all available ribbon bar options and displays them alphabetically. Typing in the search box will start restricting the list to those entries that match any of the words from the ribbon bar option. Pressing ENTER will open the selected module’s dialog box.

High Resolution DPI 120 Supported

Starting with the release of Microsoft Vista and the greater availability of very high resolution monitors, Microsoft made it much easier to change DPI. And for Windows 7, themes come with a default of DPI 120 for high resolution.  This resolution is now supported with STATISTICA.

Data Miner Workspace Enhancements

The Workspace has been upgraded to include a large number of new features to improve usability and performance, especially with respect to handing very large data sets.

an example of the new Data Miner workspace

A new system of nodes has been introduced with enhancements of the user interface to closely resemble the user interface in the respective modules. The previous nodes are still offered and supported for backwards compatibility.

Enhanced Ability to Import Excel files

STATISTICA now has the ability to import Excel files using the nomenclature of Excel spreadsheets: letters for columns and numbers for cases.

an example of the new import process for excel
an example of the new import process for excel

This functionality is not only available interactively, but is also translated to the Workspace utilizing the new Import Excel node.

an example of the new import process for excel

You can use this node to import Excel data directly from a spreadsheet into a Workspace.

Analytic Enhancements

Data Health Check®the icon for the new data health check node

The Data Health Check node is new in STATISTICA 12 and is available to all STATISTICA Data Miner users. This node detects common data issues for each variable, completes basic data cleaning, and generates a report that can be used in deciding how to further clean the data. The Data Health Check node is especially useful for exploring a large number of variables automatically.

Construction of Trees, Sensitivity Analysis

This new “sensitivity” option enables you to learn more detail about a specific node. You can then use this knowledge to redefine the splits of the proposed tree in an expert way.

Ordered Twoing Criterion

This is an option to treat categorical dependent variables in order. It is useful when categories represent levels (low, medium, high).

Predictor Screening

This is a new method for analyzing predictors that was added to Feature Selection. This functionality can be used as a quick, first look at a predictor to provide a basic set of statistics.

Data Access Enhancements

Teradata Code Deployment (STATISTICA Data Miner with Code Generator )

User-defined functions can now be defined for the Teradata database, which allows for in-database scoring.

Enterprise Workspace Enhancements

The Workspace has been upgraded to include a large number of new features to improve usability and performance, especially with respect to handing very large data sets.

an example of the new enterprise workspace

A new system of nodes has been introduced with enhancements of the user interface to closely resemble the user interface in the respective modules. The previous nodes are still offered and supported for backwards compatibility.

Enhanced Ability to Import Excel files

STATISTICA now has the ability to import Excel files using the nomenclature of Excel spreadsheets: letters for columns and numbers for cases.

an example of the new import process for excel
an example of the new import process for excel

This functionality is not only available interactively, but is also translated to the Workspace utilizing the new Import Excel node.

an example of the new import process for excel

You can use this node to import Excel data directly from a spreadsheet into a Workspace.

Analytic Enhancements

Data Health Check®the icon for the new data health check node

The Data Health Check node is new in STATISTICA 12 and is available to all STATISTICA Enterprise users. This node detects common data issues for each variable, completes basic data cleaning, and generates a report that can be used in deciding how to further clean the data. The Data Health Check node is especially useful for exploring a large number of variables automatically.

Reporting

A new enhancement is the selection of spreadsheet cells into dynamic tags, which allows inserting the value of a particular cell into the text of a report and can be used for both text (including paragraph text strings) and numeric values.

Individual workbook items can be specified as dynamic tags, making it possible for these items to be included in reports.

Additionally, STATISTICA now supports an expanded list of keyword tags, including workflow name, SDMS version numbers, and more .

Quality Control Charts

STATISTICA Enterprise now supports full color and pattern control for the elements of QC charts, in the same manner that these options are supported in the interactive usage of STATISTICA. These controls are accessible from inside the Enterprise Manager application.

Data Access Enhancements

SVB Data Configurations

With SVBData Configurations, you can access non traditional databases that don’t have an ODBC or OLE DB provider. As an example, a large text file can be thought of as a database if someone desired to obtain its data . As a text file, however, it does not have an ODBC or OLE DB provider. But with an SVB Data Configuration, it is possible to access this text file as a database and make its data available to STATISTICA. If you want to execute different queries based on predetermined conditions, those conditions can also be coded into the SVB Data Configuration.

General Document Store

Files can now be saved/opened within the Enterprise System View , so STATISTICA documents and other document types can be stored within Enterprise Manager and shared among users outside a file share. The Enterprise System View is the default destination for saving reports. Additionally, standard STATISTICA Enterprise permissions and SDMS versioning are supported.

SVB and SVX code can be stored within Enterprise using the General Document store. Now all the places in Enterprise that use SVB can reference the stored code; changing the code in one place can simultaneously implement that change in SVB Analysis Configurations, SVB Data Configurations, Workspace node code, and Secondary SVB Programs within Enterprise.

Browser Support (STATISTICA Enterprise Server)

Support is provided for all main stream browsers: Internet Explorer, Chrome, Firefox, Safari, and Opera. This makes it possible for you to use STATISTICA Enterprise Server from your iPad or laptop.

Workbook Supported (STATISTICA Enterprise Server)

Workbooks can now be shared easily with others through the STATISTICA Enterprise Server Portal. After a file is published, a Download from Server link (URL) will be provided.

Versioning Support (STATISTICA Enterprise Compliance Edition)

STATISTICA Enterprise Compliance Edition is an integration of STATISTICA Enterprise with a highly scalable document management system that enables you to securely manage documents of any kind, and it is designed to ensure compliance with FDA 21 CFR Part 11 regulations, Sarbanes-Oxley legislation, as well as ISO 9000, 9001, and 14001 documentation requirements. New functionality provides for easy version comparison and opening of previous versions of documents.

Version Comparison

Now when SDMS integration is enabled, you can compare different versions of SDMS objects in Enterprise Manager. Each versionable Enterprise object will have a text representation:

  • Data Configuration – list of query, data types, and OLE DB column properties
  • IQC Analysis Configuration – summary of QC settings/parameters
  • SVB Analysis Configuration – SVB text and properties
  • Rules object – text representation of rules
  • PMML object – PMML representation of model
  • Workflow – text detailing all contained nodes and parameters

Open Previous Version

For those versionable objects that can be opened directly in Enterprise, including Workspaces, PMML, and Rules objects, STATISTICA will allow a specified previous version of the object to be opened as a read-only object.

Labels (STATISTICA Web Data Entry)

Labels are used with the Data Entry product. Labels can now be stored in one or more system folders. Customers will find it easier to manage Labels with this new option.

Calibration Tests

Calibration Tests is a tool that makes it possible to compare the forecast probability of default ( PD) with the realized PD that eventually occurs.
A typical use case in financial institutions is to divide customers into segments of like customers, realizing that each separate segment will have a certain number of customers who meet credit obligations and a certain number who will not. Based upon the model the financial institution has agreed upon, each segment has a forecast PD. After the model has been used for a period of time, the accuracy of the model must be tested. Performing such tests is very easy in STATISTICA , which even includes a built-in "traffic light approach " described in a popular reference on guidelines in credit risk management (Oesterreichishe Nationalbank, 2004).

Rules

STATISTICA Scorecard is now integrated with STATISTICA Decisioning Platform. This tool can now generate rules for batch scoring or live scoring.

Versioning Support

STATISTICA Compliance Edition is an integration of STATISTICA with a highly scalable document management system that enables you to securely manage documents of any kind, and it is designed to ensure compliance with FDA 21 CFR Part 11 regulations, Sarbanes-Oxley legislation, as well as ISO 9000, 9001, and 14001 documentation requirements. New functionality provides for easy version comparison and opening of previous versions of documents.

Version Comparison

Now when SDMS integration is enabled, you can compare different versions of SDMS objects. Each versionable object will have a text representation:

  • Data Configuration – list of query, data types, and OLE DB column properties
  • IQC Analysis Configuration – summary of QC settings/parameters
  • SVB Analysis Configuration – SVB text and properties
  • Rules object – text representation of rules
  • PMML object – PMML representation of model
  • Workflow – text detailing all contained nodes and parameters

Weight of Evidence

This new product is important to anyone engaged in binary prediction (yes/no). This tool automates a time- consuming task to bin predictors.

an example of the new weight of evidence feature

Two methods are used:

  • Optimal
  • Interpreted (e.g., observed risk of prediction probability)

Rules Builder

Every organization has rules that govern its behavior. Consistently applying these rules to analytic projects or reports is a common challenge. Rules Builder solves this problem.

Business users, developers, or modelers find it easy to create, maintain, share, and re-use sets of rules. A “rule set” for data transformation could be created and then used by one or thousands of analytic projects. Role-based security controls access to these rules.

an example of the new rules builder dialogue

Rules Builder has the ability to conditionally execute models with pre-scoring segment rules and then apply post-scoring policy rules. Rules can retrieve reason codes for individual predictions, which can be critical for many industries, such as banking or insurance. For example, banks are required to state why a loan application was denied.

The execution of rules can be visually traced with sample data to aid in troubleshooting complex scenarios.

STATISTICA Reporting Tables (optional)

Businesses are challenged to:

  • Summarize large amounts of data into formats that are easily understood
  • Easily emphasize particular data segments (e.g. , only report on Oklahoma and France)
an example of the new reporting tables

STATISTICA Reporting Tables (an optional product to be purchased separately for Version 12) automatically sorts and summarizes data based on specifications made while developing the table. The tables are generated interactively by visually dragging and dropping variables into the appropriate four sections of the Reporting Tables dialog box (Layers, Column Label, Row Label , and Sigma). As the tables are customized, they can be previewed, and final results can be generated with the click of a button.

an example of the new reporting tables

Options are available for processing Multiple Response Categories, Crosstable Groups , and Conditional Formatting.

 

Content

Contact Us

StatSoft, Inc
2300 East 14th Street
Tulsa, Oklahoma, 74104
(918) 749-1119
info@statsoft.com