Welcome, Register  | Login
Search Options
Electronic Statistics Textbook
StatSoft Blog
  • Home
  • Products
    • STATISTICA Product Catalog
    • STATISTICA Product Overview
    • Connectivity and Data Integration Solutions
    • Data Mining Solutions
    • Decisioning Platform
    • Desktop Solutions
    • Enterprise Solutions
    • Power Solutions
    • Statistics Methods and Applications Book
    • Text Mining Solutions
    • Web-Based Solutions
    • Video Tutorials
    • Brochures
    • Request Quote
    • STATISTICA Upgrade Offer
  • Services
    • Services Overview
    • Custom Development
    • Consulting
    • Training
      • United States Course Schedule
    • Validation Services
  • Solutions
    • Solutions Overview
    • Automotive Manufacturing
    • Banking
    • Chemical and Petrochemical
    • Credit Cards
    • Consumer Product Goods
    • Credit Scoring
    • Food and Beverage
    • Government Agencies
    • Hedge Fund Applications
    • Heavy Equipment Manufacturing
    • Healthcare
    • Insurance
      • Health Insurance
      • Life Insurance
      • Property and Casualty Insurance
    • Manufacturing
    • Medicare Fraud Detection
    • Marketing
    • Pharmaceuticals
    • Power Industry
    • R Language Platform
    • Sarbanes-Oxley Compliance
    • SAS Alternative
    • Semiconductors
    • Sentiment Analysis
    • Six Sigma
    • Telecommunications
  • Support
    • Support Overview
    • Product Registration
    • Knowledge Base
      • Installation, Registration, & Licensing
      • User Interface
      • Analyses
      • Graphics
      • Graph Customization
      • Graphic Interactive Analysis
      • Reports
      • Spreadsheets
      • Data Import & Export
      • Data Manipulation
      • Workbooks
      • Output Management & Printing
    • Download
      • Video Tutorials
      • Webcasts
      • Brochures
      • White Papers
      • Help
      • Installation Instructions
      • STATISTICA Software Updates
      • Visual Basic Examples
      • Free STATISTICA 10 Trial
    • Books on STATISTICA
    • Electronic Statistics Textbook
    • Free STATISTICA 10 Trial
    • Blog
    • Forum
    • Section 508 Compliance
    • Privacy Statement
  • Customers
    • Customer Listing
    • Success Stories
    • Feedback
  • Academic
    • Academic Overview
    • Academic Customers
    • Academic Request Quote
  • Company
    • About StatSoft
    • History
    • Office Locations
    • News
    • Events
    • Webcasts
    • Newsletter
    • Reviews
    • Careers
    • Partners
  • Contact Us
Featuring
  • Upgrade to 64-bit
  • Enterprise Analytics, Integrate Multiple Data Sources
  • Reduce Power Plant Emissions
  • Data Mining Databases
Product Information
  • Web-based Data Entry
  • Connect to OSIsoft PI System
  • STATISTICA MultiStream for Power Industries
  • Collaborative Analytics, Enterprise Wide Analytics
What's New
  • StatSoft’s VP Hill Accepts Keynote Role at Big Data Analytics Conclave
  • Mon, 20 May 2013 19:00:00 GMT

  • What does it mean to not have enough codes?
  • Fri, 17 May 2013 20:28:00 GMT

  • Magic Bullet
  • Mon, 13 May 2013 08:48:00 GMT

Skip Navigation Links.
Collapse SubscriptionsSubscriptions
STATISTICA Newsletter
STATISTICA Webcasts
AnalyticBridge
YouTube
Twitter
Facebook
LinkedIn

STATISTICA Extract, Transform, & Load

  • Overview
  • Details
  • System Requirements

STATISTICA Extract, Transform, & Load (ETL) combines the capabilities of the STATISTICA system for efficient processing of data from standard databases (Microsoft SQL, Oracle) as well as specialized process databases using the optional PI Connector tool (e.g., OSI Pi), with the powerful STATISTICA data processing capabilities for data filtering, aggregation, and analyses.

If you need to manage and optimize complex processes, you need STATISTICA Extract, Transform, and Load (ETL).

Given your current databases and process monitoring methods, can you quickly determine how different process steps affected measured quality an hour ago, yesterday, last week?

Can you quickly determine whether or not changes in trends have occurred? Whether the relationships between certain process parameters are starting to drift?

STATISTICA ETL can be combined with the capabilities of STATISTICA Enterprise for a complete advanced statistical process monitoring solution. This solution can support highly specialized data warehouses that can integrate time-stamped parameter data for multiple process steps with quality, rework, and outcome data.

  • STATISTICA ETL is the most advanced solution available today for creating data warehouses to support comprehensive views of your data, with tools to extract actionable information that will quickly create a significant return on investment from your existing data collection equipment, tools, and IT infrastructure.
  • With STATISTICA ETL, deployed inside STATISTICA Enterprise, you can quickly:
    • Set up standard control charting and process capability computations and monitoring
    • Compute charts and process capability across multiple processes and from diverse data sources
    • Apply to your whole process advanced process monitoring techniques such as neural-network based virtual sensors, advanced pattern recognition methods, sensitive change-point detection algorithms that will tell you that something "is about to go wrong" before it goes wrong, or the most advanced data mining algorithms and methods available today for efficient root cause detection in complex data

STATISTICA ETL - An Ideal Solution

  • Building enterprise analysis platforms that will integrate process historians with quality control and advanced process monitoring systems
  • Creating specialized data warehouses that will align and validate time-stamped (e.g., batch-time data, as they are commonly collected in various process industries) with outcome (e.g., assay) data
  • Building data warehouses for ad-hoc and automated root cause analysis for complex manufacturing processes (e.g., chemical or pharmaceutical manufacturing, power generation, mining, etc.)
  • Creating 21 CFR Part 11 compliant data warehouses for validated reporting, for complex processes
  • Any data warehousing application that requires specialized data validation, pre-processing, aggregation, standardization, or merging of unconventional data, and thus cannot be built with off-the-shelf standard database tools

Additional STATISTICA ETL Information

  • Extract Data
  • Transform Data
    • Aggregate Time Stamped Data
    • Aggregate Batch Time Data
    • Aggregate with Robust Statistics
    • Aggregate from Multiple Data Sources
  • Load Data

Extract, Transform, and Load

STATISTICA ETL provides capabilities to extract, transform, and load data.

STATISTICA Extract, Transform, & Load (ETL) combines the capabilities of the STATISTICA system for efficient processing of data from standard databases (Microsoft SQL, Oracle) as well as specialized process databases with the optional PI Connector tool (e.g., OSI Pi), with the powerful STATISTICA data processing capabilities for data filtering, aggregation, and analyses. STATISTICA ETL can be combined with the capabilities of STATISTICA Enterprise for a complete advanced statistical process monitoring solution. This solution can support highly specialized data warehouses that can integrate time-stamped parameter data for multiple process steps with quality, rework, and outcome data.

Extract Data

STATISTICA Enterprise provides a secure platform for managing efficiently multiple database connections to various types of databases, including process databases (e.g., via the specialized STATISTICA PI Connector). STATISTICA Enterprise will store the metadata describing the nature of the tables that are queried, such as control limits, specification limits, valid data ranges etc. See STATISTICA Enterprise for more details.

Transform Data

The STATISTICA ETL module provides unique capabilities for processing and merging data, in particular process data that are difficult to manage using standard database tools.

ETL, extract, transform, load, spreadsheet, screenshot

Aggregation, alignment, and replication of time-stamped data

In order to monitor ongoing continuous processes, such as chemical or pharmaceutical manufacturing, power generation, refining, and so on, it is necessary that critical process parameters be recorded into a process "historian" at regular time intervals. Dedicated high-performance databases, such as the OSI Soft's PI database, are typically deployed to provide efficient high-frequency data recording capabilities. However, to make such data available for useful data analyses, e.g., for root-cause analyses or process monitoring, it is necessary that such data are aggregated and aligned, for example, with outcome data.

  • STATISTICA ETL provides simple tools to automate the process of aligning time-stamped process data with other data sources, such as process data collected at different time intervals, or only collected once per part, ID, batch, etc.

ETL, extract, transform, load, spreadsheet

Automatic stacking and unstacking, and normalizing of batch-time data, for batch processes

The manufacture of pharmaceuticals and chemicals often involves the processing of batches of materials through multiple steps, where in each step some maturation of the batch is recorded. The resulting data, recorded into some laboratory information management system (LIMS) consist of time-stamped process data, organized by batch ID. In order to make such data available for useful data analyses, it is necessary to transform the time-stamps into elapsed-within-process-step times, and to normalize the data so that for each batch a comparable number of elapsed time recordings are available for analyses.

  • STATISTICA ETL provides efficient tools for processing of batch-time data, to achieve equal batch "lengths," and for unstacking such data to make them available for subsequent analyses and process monitoring of the maturation process (see also STATISTICA MSPC for details).

Aggregating data using robust statistics

The aggregation of real process data (e.g., time-stamped one-minute-interval data to align with hourly data) usually requires the application of aggregation methods that go far beyond the capabilities of standard database tools. For example, time-stamped data may include outliers, or may be very "noisy," thus hiding important trends, or changes in trends.

  • STATISTICA ETL provides numerous tools and methods for aggregating and/or smoothing of data, so that meaningful subsequent process monitoring methods (e.g., for change-point or trend detection) can be applied to robust or smoothed estimates of process averages within aggregated time intervals.

Aggregation and alignment of multiple varied sources

Complex processes, such as the manufacture of semiconductors, pharmaceutical manufacturing, etc. require complex data storage, suited to the specific nature of the process that is to be recorded and monitored. Therefore, it is common that multiple separate databases or data sources, such as automatically created (from gages) CSV files, data from OSI PI, assay data from a LIMS system, etc., must be aggregated and aligned, to enable meaningful root cause analyses of problems, or comprehensive process monitoring.

  • STATISTICA ETL provides tools for configuring complex data alignment tasks of multiple diverse data sources into a single ETL object, which can be deployed into STATISTICA Enterprise, to be applied ad-hoc or as scheduled ETL tasks, to support a dedicated data warehouse that maintains validated and aligned data for comprehensive process monitoring and optimization.

The Transformation capabilities of STATISTICA ETL go far beyond those available in standard database or querying tools, and will allow you to build dedicated specialized data warehouses to optimize your processes without the need to program custom-applications in-house. STATISTICA ETL is the one-stop solution for creating data warehouses with automated simple and sophisticated analytic capabilities that will allow you to derive the full value from the data that you are collecting!

Load Data

The STATISTICA ETL solution will automate the process of validating and aligning multiple diverse data sources into data tables suitable for ad-hoc or automated analyses. When deployed inside the STATISTICA Enterprise framework, data can be written back to dedicated database tables, or to STATISTICA data tables, to provide analysts or process engineers convenient access to real-time performance data, without the need to perform tedious data preprocessing or cleaning before any actionable information can be extracted.

 STATISTICA Extract, Transform, & Load is compatible with Windows XP, Windows Vista, and Windows 7.

Minimum System Requirements

  • Operating System: Windows XP or above
  • RAM: 512 MB
  • Processor Speed: 500 MHz

Recommended System Requirements

  • Operating System: Windows 7
  • RAM: 1 GB
  • Processor Speed: 2.0 GHz, 64-bit, dual core

Native 64-bit versions and highly optimized multiprocessor versions are available.

 


Home   |   Products   |   Services   |   Solutions   |   Support   |   Customers   |   Academic   |   Company   |   Contact Us
Copyright © 2013 by StatSoft Inc. Privacy Statement   |  Terms Of Use