#
*STATISTICA Multivariate Statistical Process Control*

*STATISTICA Multivariate Statistical Process Control* (*MSPC*) is a complete solution for multivariate statistical process control, deployed within a scalable, secure analytics software platform.

Modern automated production processes typically measure large numbers of variables. Examples of products include biochemicals, cement, fertilizers, food, paint, perfume, pharmaceuticals, petroleum products, polymers, pulp, and semiconductors.

Common goals for the production process is to reduce product variability and increase quality. Finding problems sooner rather then later has the potential to save money.

But standard quality control charting techniques (e.g., Shewhart charts, X-bar and R charts, etc.) are applicable only to single variables. Therefore, when applied to modern production processes with hundreds of important variables that need to be monitored, the criteria typically applied to univariate charts will lead to a large number of false alarms, and in many cases nearly constant, perpetual alarms.

To rectify these shortcomings, methods were developed to monitor multiple variables simultaneously, using multivariate statistical procedures.

*STATISTICA*'s *MSPC *capabilities enable you to:

- Apply univariate and multivariate statistical methods for quality control, predictive modeling, and data reduction to complex manufacturing processes
- Determine the most critical process, raw materials, and environment factors and their optimal settings for delivering products of the highest quality
- Monitor the process characteristics interactively or automatically during production stages, rather than waiting for final testing
- Build, evaluate, and deploy predictive models based on the known outcomes from historical data

*MSPC* Features

- Offline Analyses vs Online Analyses
*MSPC*Deployment (Optional)- Principal Components Analysis (PCA)
- Partial Least Squares (PLS)
- Batch-Wise Multi-Way Partial Least Squares (BMPLS)
- Time-Wise Multi-Way Principal Component Analysis (TMPCA) and Time-Wise Multi-Way Partial Least Squares (TMPLS)

## Process Analytical Technology

The goal of PAT is to understand and control the manufacturing process, which is consistent with our current drug quality system: *quality cannot be tested into products; it should be built-in or should be by design. *

*STATISTICA MultiStream* for Pharmaceutical Industries is the solution package for PAT applications. Validation packages (IQ/OQ/PQ) and Validation Services are available for purchase.

*STATISTICA* can support different modes for employing *MSPC *techniques.

## Offline Analyses

- Historical analysis, data exploration, data visualization, predictive model building and evaluation, model deployment to monitoring server

## Online Analyses

- Interactive Monitoring with Dashboard summary displays and automatic-updating results
- Automated Monitoring with rules, alarm events, and configurable actions

*MSPC* Deployment

Deployment is an option that enables you to apply existing models created from *STATISTICA MSPC* to new data in order to make further predictions. You can save models in C\C++, Visual Basic, and PMML formats. But, *MSPC *will only accept PMML for deployment.

## Principal Components Analysis (PCA)

The aim of Principal Components Analysis (PCA) is to reduce the dimensionality of a set of variables while trying to preserve as much information contained in the data as possible.

Equally important applications of PCA include data diagnostics, both on observation and variable levels. The observation level helps us to detect outliers, while the variable level provides us with insight of how the variables contribute to the observations and relate (correlate) to one another.

These diagnostic features of STATISTICA PCA are particularly useful for process monitoring and quality control as they provide us with effective and convenient analytic and graphic tools for detecting abnormalities that may rise during the development phase of a product. PCA data diagnostics also play an important role in batch processing where the quality of the end product can only be ensured through constant monitoring during its production phase.

Nonlinear Iterative Partial Least Squares (NIPALS) algorithm can be used within PCA.

## Partial Least Squares (PLS)

*Partial Least Squares (PLS)* (also known as Projection to Latent Structure) is a popular method for modeling industrial applications. It was developed by Wold in the 1960s as an economic technique, but soon its usefulness was recognized by many areas of science and applications including *Multivariate Statistical Process Control (MSPC)* in general and chemical engineering in particular.

It many ways, *PLS *can be regarded as a substitute for the method of multiple regression, especially when the number of predictor variables is large. In such cases with regression, there is seldom enough data to construct a reliable model that can be used for predicting the **dependent data Y** from the **predictor variables X**. Instead, we get a model that can perfectly fit the training data while performing poorly on unseen samples. This problem is known as over-fitting (Bishop 1995).

*PLS *alleviates this problem by adopting the "often correct" assumption that, although there might be a large number of predictor variables, the data may actually be much simpler and can be modeled with the aid of just a handful of components (also known as latent).

Nonlinear Iterative Partial Least Squares (NIPALS) algorithm can be used within PLS.

## Batch-Wise Multi-Way Partial Least Squares (BMPLS)

Although the *PLS *method is extremely useful for tackling *MSPC *problems, it is strictly applicable to 2-dimensional problems. Thus, before analyzing 3-dimensional batch data, it must be transformed.

This can be achieved using the method of unfolding batch-wise. The 3-dimensional matrix is unfolded in the direction of the batches.

## Time-Wise Multi-Way Principal Component Analysis (TMPCA) and Time-Wise Multi-Way Partial Least Squares (TMPLS)

Batch processes are by nature time based. It is not only the trajectory of the batch variables that vary in time but also the correlation among them. Therefore, any monitoring system should implicitly include this dynamic time dependency.

As a result, *PCA *and *PLS *models built on time-wise unfolding are particularly sensitive, not only to the quality of a batch as a whole but also to the time dependent conditions under which the batch was evolved.