STATISTICA cited in prestigious journal
Mon, 14 May 2012 14:05:00 -0500
What American Women Do For Work, via NPR
Fri, 04 May 2012 14:00:00 -0500
STATISTICA Enterprise™ Helps Enhance the Operational Process Flow at Instrumentation Laboratory, a Global Medical Device Manufacturer
Fri, 27 Apr 2012 17:49:00 -0500
Learn more from our STATISTICA 10 in 10 video series.
And visit the free STATISTICA trial page.
STATISTICA 10 (released November 2010) features further, significant performance improvements achieved by automatically taking advantage of the 64-bit CPU technology (if available on the currently used hardware), as well as highly optimized multithreading.
Many STATISTICA functions throughout data management and analyses (C&RT, CHAID, General Linear Models, etc.), which operated in a single thread mode in version 9, are now optimized using advanced multithreading technology so that they can take advantage of multiple cores or processors.
Other New Features:
The input into (and output from) STATISTICA 10 has now been integrated with the fastest growing standard for data exchange and integration – Microsoft SharePoint. STATISTICA documents can now be conveniently checked in and checked out of SharePoint from within the STATISTICA user interface. To the best of our knowledge, STATISTICA 10 is currently the only analytics or data mining application that offers this (seamlessly integrated) functionality.
STATISTICA imports directly native Office 2007 and 2010 files including the formatting information. This new technology has improved both the speed and fault tolerance of imports from Excel 2007 and 2010 to STATISTICA spreadsheets; the Excel 2007/2010 import/export now handles formatted cell text.
STATISTICA Query can now retrieve data from OLAP cube providers such as the Microsoft OLE DB Provider for Analysis Services or SAP Business Warehouse. MDX queries can be generated with a drag-and-drop environment, or the MDX code can be entered directly (currently offered in Beta release).
It is now easier to install and manage the STATISTICA PI Connector in STATISTICA 10; the PI connector is distributed as part of version 10, and a separate installer is no longer necessary.
The STATISTICA Graph display technology has been substantially upgraded to automatically detect and take advantage of the high-performance hardware acceleration, which is now available not only in the high-end, but also in many mid-range video display controllers available in both desktop and laptop computer workstations.
The resulting output is not only generated faster, but also supports more advanced smoothing and gradient display options. All STATISTICA Graphs have been enhanced with improved appearance, thanks to the new gradient/fill colors and smoother line display procedures (curves, surfaces).
Also, all STATISTICA Graph windows (both stand-alone and integrated into workbooks) now feature interactive graphics controls (a bar with sliders and other controls placed at the bottom of the graph window), which enable you to interactively adjust these new display features. The benefits include not only a vastly improved appearance of all graphs, but also new analytic and exploratory options, such as tools to reveal hidden trends by gradually desaturating dense displays and to rotate 3D graphs vertically and horizontally.
You can now directly interact with the scaling on the graph by hovering the mouse pointer above the axis labels toward the end of the axis and pulling left or right to change the scaling. Interactive Scaling is a powerful graphical exploratory technique that enables you to reveal hidden trends by stretching or compressing the desired parts of the display.
You can now directly interact with the graph axis to pan to the right or the left by hovering the mouse pointer above the axis labels toward the center of the axis. Interactive Panning is a powerful graphical exploratory technique that assists you to explore trends hidden in the data.
STATISTICA 10 supports transparency (interactively controlled with on-screen sliders) for controlling plot areas and desaturating overlapping markers (requires Windows Vista SP 2 or Windows 7). Transparency control is a powerful graphical exploratory technique that enables you to reveal trends hidden in the dense concentrations of data points (especially scatterplots and scatterplot matrices generated from extremely large data sets).
The goal is to achieve the optimal density level to uncover patterns obscured by a large number of random points (white noise) that create the “ink blot” effect. Additionally, making plot areas transparent allows portions of the plot to overlap while still being visible.
Reference lines can be added to graphs much more easily in STATISTICA 10 through dedicated Reference Lines options, accessible in the Graph Options dialog.
Text can now be interactively edited on-screen (by simply clicking and typing in the edits), without a need to open the editor window. The graph text editor controls are still available and support the more advanced editing options.
A large number of usability improvements have been implemented in STATISTICA 10 to enhance the user comfort and experience (“touch and feel”) of the application, and to support the latest ergonomics and human factors science in the area of (1) reducing eye strain, and (2) improving the efficiency of the human-computer interaction. STATISTICA 10 offers a better and more efficient user interface, achieved by completely redesigned display technology as well as new iconography.
All ribbon bars have been updated and they now include completely redesigned symbols [the traditional, pull-down menu user interface (classic menus) continues to be supported for compatibility purposes]. STATISTICA Visual Basic macros can now be added to the STATISTICA Ribbon Bars.
The STATISTICA Data Miner workspace now offers larger (and visually optimized) icons. Other new features to improve this user interface have also been implemented.
The STATISTICA Ribbon Bar can now be programmatically controlled. Developers can now customize the ribbon bar through API (Application Programming Interface) calls. This is particularly useful for creating STATISTICA Add-Ins.
The STATISTICA Distribution & Simulation module and functionality introduced with version 9 has been further refined and enhanced. STATISTICA 10 makes it easier to generate simulated data from a specific distribution with Design Simulation.
Now you can find the distribution that best fits the variables, and then use that information, along with the correlation structure of the data, to simulate a specified number of cases. Instead of having to wait to accrue the required data, you can fit theoretical distributions to the observed data, simulate from those distributions, and then draw conclusions based upon the simulation. Additionally, data can be simulated using the correlations of variables. This functionality is extremely useful for “what-if” analyses and is becoming more accepted and adopted in different industries.
For example a company creates machines with precision parts. The knowledge about these machines and parts could be used to generate the data. Then the simulated data is analyzed for reliability. Below is a correlation matrix for the defect rate and sample completion times for these precision parts. This correlation is estimated based on previous processes and information about this specific process. The means and standard deviations are estimates as well, as the production runs have not yet begun. Using the Design Simulation tool, data are simulated from the theoretical distributions for each variable, their parameter values and correlation. The user has the flexibility to choose the exact distribution for each variable. The resulting data is seen in the scatterplot. The correlation structure between the variables, -0.45, is maintained in the simulated data as well as the specified distributions and parameters. These data can be used before production begins to learn more about the process.
Another example is the Quality by Design initiative from the US Food and Drug Administration (FDA) and the use of multivariate simulation. It is used for determining expected outcomes from pharmaceutical manufacturing processes.
A comprehensive and highly scalable implementation of the Cox Proportional Hazards Models (a powerful modeling technique for lifetime data) has been added to STATISTICA 10. Applications of this new module include:
The Cox Proportional Hazards Models module allows for flexible handling of censored data, categorical predictors, and designs that include interactions and/or nested effects. It uses model building techniques such as best subsets and stepwise regression. Deployment of the survival functions on new data is available with STATISTICA Rapid Deployment.
Numerous minor improvements were made to the computation of descriptive statistics, often yielding significant speed improvements for large data volumes. For example, the multithreading of by-group statistics, including percentile computations, has been further improved to achieve extremely fast performance for very large data volumes.
In STATISTICA 10, the STATISTICA MSPC Online option makes it easier to deploy multivariate analysis (PCA, PLS) models to STATISTICA Enterprise for real-time-updating, monitoring, and interactive drill-down from component scores, to contribution plots, and univariate charts.
Profit charts can now be created with STATISTICA’s Rapid Deployment of Predictive Models. The profit chart summarizes the costs and the estimated profit for the current model, and can be used in a wide variety of data mining application as one of the tools to evaluate the models.
ROC curves can now be created with STATISTICA’s Rapid Deployment of Predictive Models. It is another useful tool to evaluate the quality of models by visualizing the “true” positive versus the “false” positive rate. It is useful in many different fields such as medicine, quality control, and psychology. Side note: Interestingly, the ROC curve method has its roots in early days of radar technology, when it was used during World War II. Radar operators were evaluated on their ability to find “true” signals (airplanes) versus the “false” signals (birds). ROC curves are used today in data mining for similar reasons.
In response to the recent trends in text mining, where enormously large data sets are being submitted for exploration and modeling, the main computational engine of STATISTICA Text Miner has been substantially redesigned and further optimized to improve its scalability and performance. The internal database handling procedures have been redesigned and the module can now handle extremely large data set very efficiently by extensive use of multithreading.
STATISTICA 10 provides two new deployment options: Java and C#. The latter also includes the ability to generate C# code in a form that can be directly incorporated into a SQL Server user-defined function, which can then be used in a stored-procedure to score the model directly inside the database. The Java code can be used the same way within Oracle user-defined functions. Note that this capability requires additional licensing. The main advantage of this deployment method is performance gains; the inside database deployment can be executed by an order of magnitude faster, compared to external processing.
The scorecard-builder wizard is now fully integrated into the STATISTICA solution platform and includes further improvements.
STATISTICA Scorecard is a dedicated solution for developing, evaluating, and monitoring scorecards including steps for Feature Selection, Attribute Building, Scorecard Building, Cutoff Point Selection, Reject Inference, and Population Stability.
The program can build "traditional" regression-based scorecards, and enables you to compare the quality of those scorecards with data mining (predictive modeling) based scorecards. Scorecard also supports various specialized analyses and graphical exploration tools for scoring of new cases and evaluation of model accuracy. For more details read STATISTICA Credit Scoring.
Additional significant performance improvements have been achieved for various predictive modeling algorithms when working with very large data sets. For example, all modeling performed via Generalized Linear Models (e.g., Logistic Regression) will now take advantage of multi-core processors and can handle very large data volumes. Similar scalability and significant performance improvements have been achieved for the C&RT and CHAID algorithms.
Application navigation in the STATISTICA 10 Enterprise Manager application is simpler and more efficient with the new ribbon bar.
Data configurations are now available for selection from the STATISTICA System View, allowing the user to “explore” a data configuration from within the STATISTICA user interface, without needing to use Enterprise Manager.
The Database Migration tool is updated for the STATISTICA 10 Enterprise database schema, and is now available directly within STATISTICA Enterprise. It can be run by an administrator to copy configurations from one database to another database.
STATISTICA 10 makes it easier to publish macros to STATISTICA Enterprise. This is a simpler method to create SVB Analysis Configurations, and works not only for SVB but also for R scripts. To access this new option, after creating the macro in STATISTICA, switch to the Enterprise tab and click Deploy Macro.
Enterprise Manager now allows more flexibility in defining the names of STATISTICA Enterprise configurations. Names now need to be unique only within the same System View folder.
Analysis Configurations that are set to auto-update will now also auto-update when run in a Web browser; the user can adjust the auto-update interval from the browser, or initiate a manual update. The implementation uses the latest web technologies to update the image on the graph without needing to reload the web page (i.e., no “flashing” of the web page).
Quality Control Charts now support interactive brushing when run in a Web browser. The assignment of Causes, Actions, and Comments (as well as Include/Exclude) actions can now be accomplished through the web interface. The implementation uses the latest web technologies to update the image on the graph without needing to reload the web page (i.e., no “flashing” of the web page).
STATISTICA Web Data Entry enables users to define data entry screens for entering data via Web browsers and storing/managing these data in the STATISTICA Enterprise database.
STATISTICA 10 Web Data Entry includes numerous enhancements, such as:
A new and improved version of STATISTICA Live Score is released with STATISTICA 10. STATISTICA Live Score is STATISTICA Server software within the STATISTICA Data Analysis and Data Mining Platform.
Data are aggregated and cleaned and models are trained and validated using the STATISTICA Data Miner software. Once the models are validated, they are deployed to the STATISTICA Live Score Server.
STATISTICA Live Score provides multi-threaded, efficient, and platform-independent scoring of data from line-of-business applications. Some examples of the use of STATISTICA Live Score include:
STATISTICA Scorecard is a dedicated solution for development, evaluating, and monitoring Scorecards including steps for Feature Selection, Attribute Building, Scorecard Building, Cutoff Point Selection, Reject Inference, and Population Stability.
Hundreds of STATISTICA Visual Basic examples have been added to the Help.