STATISTICA
Contact Us
    
 
   
Products / Solutions
 
 
 
Unique Product Features
 
 
Product Information
   
 
   
Related Information
   
 

STATISTICA Extract,
Transform, & Load (ETL)


The enhanced STATISTICA Extract, Transform, and Load (STATISTICA ETL) features complete the capabilities of STATISTICA Enterprise to support highly specialized data warehouses that can integrate time-stamped parameter data for multiple process steps with quality, rework, and outcome data, for a complete advanced process monitoring solution.

Summary: Ask Yourself...

Given your current databases and process monitoring methods, can you quickly determine how different process steps affected measured quality an hour ago, yesterday, last week? Whether or not changes in trends have occurred? Whether the relationships between certain process parameters are starting to drift?

  • STATISTICA ETL is the most advanced solution available today for creating data warehouses to support comprehensive views of your data, with tools to extract actionable information that will quickly create a significant return on investment from your existing data collection equipment, tools, and IT infrastructure.


  • With STATISTICA ETL, deployed inside STATISTICA Enterprise, you can quickly


  • Set up standard control charting and process capability computations and monitoring


  • Compute charts and process capability across multiple processes and from diverse data sources


  • Apply to your whole process advanced process monitoring techniques such as neural-network based virtual sensors, advanced pattern recognition methods, sensitive change-point detection algorithms that will tell you that something "is about to go wrong" before it goes wrong, or the most advanced data mining algorithms and methods available today for efficient root cause detection in complex data
Conclusion: If you need to manage and optimize complex processes, you need STATISTICA ETL.

Overview of Functionality

STATISTICA ETL combines the capabilities of the STATISTICA system for efficient processing of data from standard databases (Mircosoft SQL® , Oracle® ) as well as specialized process databases (e.g., OSI Pi® ), with the powerful STATISTICA data processing capabilities for data filtering, aggregation, and analyses.

STATISTICA ETL is the ideal solution for:

  • Building enterprise analysis platforms that will integrate process historians with quality control and advanced process monitoring systems


  • Creating specialized data warehouses that will align and validate time-stamped (e.g., batch-time data, as they are commonly collected in various process industries) with outcome (e.g., assay) data


  • Building data warehouses for ad-hoc and automated root cause analysis for complex manufacturing processes (e.g., chemical or pharmaceutical manufacturing, power generation, mining, etc.)


  • Creating 21 CFR Part 11 compliant data warehouses for validated reporting, for complex processes


  • Any data warehousing application that requires specialized data validation, pre-processing, aggregation, standardization, or merging of unconventional data, and thus cannot be built with off-the-shelf standard database tools
Extract Data

STATISTICA Enterprise provides a secure platform for managing efficiently multiple database connections to various types of databases, including process databases (e.g., via the specialized STATISTICA OSI PI Connector). STATISTICA Enterprise will store the metadata describing the nature of the tables that are queried, such as control limits, specification limits, valid data ranges etc. See STATISTICA Enterprise for more details.

Transform Data

The STATISTICA ETL module provides unique capabilities for processing and merging data, in particular process data that are difficult to manage using standard database tools.

Aggregation, allignment, and replication of time-stamped data. In order to monitor ongoing continuous processes, such as chemical or pharmaceutical manufacturing, power generation, refining, and so on, it is necessary that critical process parameters be recorded into a process "historian" at regular time intervals. Dedicated high-performance databases, such as the OSI Soft's PI database, are typically deployed to provide efficient high-frequency data recording capabilities. However, to make such data available for useful data analyses, e.g., for root-cause analyses or process monitoring, it is necessary that such data are aggregated and aligned, for example, with outcome data.

  • STATISTICA ETL provides simple tools to automate the process of aligning time-stamped process data with other data sources, such as process data collected at different time intervals, or only collected once per part, ID, batch, etc.
Automatic stacking and unstacking, and normalizing of batch-time data, for batch processes. The manufacture of pharmaceuticals and chemicals often involves the processing of batches of materials through multiple steps, where in each step some maturation of the batch is recorded. The resulting data, recorded into some laboratory information management (LIMS) system consist of time-stamped process data, organized by batch ID. In order to make such data available for useful data analyses, it is necessary to transform the time-stamps into elapsed-within-process-step times, and to normalize the data so that for each batch a comparable number of elapsed time recordings are available for analyses.

  • STATISTICA ETL provides efficient tools for processing of batch-time data, to achieve equal batch "lengths", and for unstacking such data to make them available for subsequent analyses and process monitoring of the maturation process (see also STATISTICA MSPC for details).
Aggregating data using robust statistics. The aggregation of real process data (e.g., for example of time-stamped one-minute-interval data to align with hourly data) usually requires the application of aggregation methods that go far beyond the capabilities of standard database tools. For example, time-stamped data may include outliers, or may be very "noisy", thus hiding important trends, or changes in trends.

  • STATISTICA ETL provides numerous tools and methods for aggregating and/or smoothing of data, so that meaningful subsequent process monitoring methods (e.g., for change-point or trend detection) can be applied to robust or smoothed estimates of process averages within aggregated time intervals.
Aggregation and alignment of multiple varied sources. Complex processes, such as the manufacture of semiconductors, pharmaceutical manufacturing, etc. require complex data storage, suited to the specific nature of the process that is to be recorded and monitored. Therefore, it is common that multiple separate databases or data sources, such as automatically created (from gages) CSV files, data from OSI PI, assay data from a LIMS system, etc., must be aggregated and aligned, to enable meaningful root cause analyses of problems, or comprehensive process monitoring.

  • STATISTICA ETL provides tools for configuring complex data alignment tasks of multiple diverse data sources into a single ETL object, which can be deployed into STATISTICA Enterprise, to be applied ad-hoc or as scheduled ETL tasks, to support a dedicated data warehouse that maintains validated and aligned data for comprehensive process monitoring and optimization.
The Transformation capabilities of STATISTICA ETL go far beyond those available in standard database or querying tools, and will allow you to build dedicated specialized data warehouses to optimize your processes without the need to program custom-applications in-house. STATISTICA ETL is the one-stop solution for creating data warehouses with automated simple and sophisticated analytic capabilities that will allow you to derive the full value from the data that you are collecting!

Load Data

The STATISTICA ETL solution will automate the process of validating and aligning multiple diverse data sources into data tables suitable for ad-hoc or automated analyses. When deployed inside the STATISTICA Enterprise framework, data can be written back to dedicated database tables, or to STATISTICA data tables, to provide analysts or process engineers convenient access to real-time performance data, without the need to perform tedious data preprocessing or cleaning before any actionable information can be extracted.




©Copyright StatSoft, Inc., 1984-2008. StatSoft, StatSoft logo, and STATISTICA, are trademarks of StatSoft, Inc.