This is a continuation of Predictive Analytics - Solve a Critical Quality Problem. A BioPharmaceutical Manufacturing company was scrapping about 30% of batches, which is very expensive. The company's engineers tried to solve the problem with various techiques.
But it was not until they started using predictive analytics (also know as data mining) that they uncovered actionable process improvements. These improvements are predicted to lower the scrap rate from around 30% to 5%.
How were these improvements discovered?
The Data Mining Approach for Root Cause Analysis: Data mining is a broad term used in a variety of ways, in addition to other terms such as "predictive modeling" or "advanced analytics."
Here, it means the application of the latest data-driven analytics to build models of a phenomenon, such as a manufacturing process, based on historical data. In a nutshell, in the last 10-15 years, there has been a great leap forward in terms of the flexibility and ease of building models and the amount of data that can be utilized efficiently due to advances in computing hardware.
Data mining has changed the world of analytics
... in a good way.
Companies that embrace these changes and learn to apply them will benefit.
Data mining begins with the definition and aggregation of the relevant data. In this case, it was the last 12 months of all the data from the manufacturing process, including:
- raw materials characteristics
- process parameters across the unit operation for each batch
- product quality outcomes on the critical-to-quality responses on which they based their judgment about whether to release the batch or scrap it
Once the relevant data were gathered, StatSoft consultants sat down with the engineering team before we began the model building process. This is a critical step and one that you should consider as you adopt data mining.
We asked the engineers questions such as:
- Which factors can you control, and which ones can you not control?
- Which factors are easy to control, and which ones are difficult or expensive to control?
The rationale is that data mining is not an academic exercise when applied to manufacturing. It is being done to improve the process, and that requires action as the end result. A model that is accurate but based solely on parameters that are impossible or expensive to tweak is impractical (which is a nice way of saying ― useless).
Empowered with this information, model building is the next step in the data mining process. In short, many data mining model types are applied to the data to determine which one results in the optimal goodness of fit, such as the smallest residuals between predicted and actual values.
Various methods are employed to ensure that the best models are selected. For example, a random hold-out sample of the historical data is used for each model to make predictions. This helps protect against the potential for the model to get very good at predicting one set of historical data to the point at which it is really bad at predicting the outcomes for other batches.
A major advantage of data mining is that you don‘t need to make assumptions ahead of time about the nature of the data and the nature of the relationships between the predictors and the responses. Traditional least squares linear modeling, such as what is taught in Six Sigma classes on the analytic tools, does require this knowledge.
For Root Cause Analysis, most data mining techniques provide importance plots or similar ways to see very quickly which raw materials and process parameters are the major predictors of the outcomes, and, as valuable, which factors don‘t matter.
At this point in the data mining process, StatSoft consultants sat down with the engineering team to review the most important parameters. Typically, there is an active discussion with comments from the engineers such as:
- that can‘t be
- I don‘t see how that parameter would be relevant
The conversation gradually transforms over the course of an hour to:
- I could see how those parameters could interact with the ones later in the process to impact the product quality
Data mining methods are really good at modeling large amounts of data from lots of parameters, a typical situation in manufacturing. Humans are good at thinking about a few factors at a time and interpreting a limited time window of data.
As shown above, the two approaches complement each other, with the results from data mining as important insights about the manufacturing process that can then be evaluated, validated, and utilized by the engineering team to determine:
- Now, what do we do to improve the process? What are the priorities?
The company then planned to implement Process improvements that are predicted to lower the scrap rate of batches from ~30% to ~5%!
Note: To get from root cause analysis to process improvements, the models were used for optimization (another data mining technique).
Next blog: Considerations for the Application of Data Mining.
Article was first published in Z Consulting’s Elevate Manufacturing newsletter for January 2011.