### Glossary Index

###### 2

- 2D Bar/Column Plots
- 2D Box Plots
- 2D Box Plots - Box Whiskers
- 2D Box Plots - Boxes
- 2D Box Plots - Columns
- 2D Box Plots - Error Bars
- 2D Box Plots - Whiskers
- 2D Categorized Detrended Probability Plots
- 2D Categorized Half-Norm. Probability Plots
- 2D Categorized Normal Probability Plots
- 2D Detrended Probability Plots
- 2D Histograms
- 2D Histograms - Hanging Bars
- 2D Histograms - Double-Y
- 2D Line Plots
- 2D Line Plots - Aggregated
- 2D Line Plots - Double-Y
- 2D Line Plots - Multiple
- 2D Line Plots - Regular
- 2D Line Plots - XY Trace
- 2D Range Plots - Error Bars
- 2D Matrix Plots
- 2D Matrix Plots - Columns
- 2D Matrix Plots - Lines
- 2D Matrix Plots - Scatterplot
- 2D Normal Probability Plots
- 2D Probability-Probability Plots
- 2D Probability-Probability Plots-Categorized
- 2D Quantile-Quantile Plots
- 2D Quantile-Quantile Plots - Categorized
- 2D Scatterplot
- 2D Scatterplot - Categorized Ternary Graph
- 2D Scatterplot - Double-Y
- 2D Scatterplot - Frequency
- 2D Scatterplot - Multiple
- 2D Scatterplot - Regular
- 2D Scatterplot - Voronoi
- 2D Sequential/Stacked Plots
- 2D Sequential/Stacked Plots - Area
- 2D Sequential/Stacked Plots - Column
- 2D Sequential/Stacked Plots - Lines
- 2D Sequential/Stacked Plots - Mixed Line
- 2D Sequential/Stacked Plots - Mixed Step
- 2D Sequential/Stacked Plots - Step
- 2D Sequential/Stacked Plots - Step Area
- 2D Ternary Plots - Scatterplot

###### 3

- 3D Bivariate Histogram
- 3D Box Plots
- 3D Box Plots - Border-style Ranges
- 3D Box Plots - Double Ribbon Ranges
- 3D Box Plots - Error Bars
- 3D Box Plots - Flying Blocks
- 3D Box Plots - Flying Boxes
- 3D Box Plots - Points
- 3D Categorized Plots - Contour Plot
- 3D Categorized Plots - Deviation Plot
- 3D Categorized Plots - Scatterplot
- 3D Categorized Plots - Space Plot
- 3D Categorized Plots - Spectral Plot
- 3D Categorized Plots - Surface Plot
- 3D Deviation Plots
- 3D Range Plot - Error Bars
- 3D Raw Data Plots - Contour/Discrete
- 3D Scatterplots
- 3D Scatterplots - Ternary Graph
- 3D Space Plots
- 3D Ternary Plots
- 3D Ternary Plots - Categorized Scatterplot
- 3D Ternary Plots - Categorized Space
- 3D Ternary Plots - Categorized Surface
- 3D Ternary Plots - Categorized Trace
- 3D Ternary Plots - Contour/Areas
- 3D Ternary Plots - Contour/Lines
- 3D Ternary Plots - Deviation
- 3D Ternary Plots - Space
- 3D Trace Plots

###### A

- Aberration, Minimum
- Abrupt Permanent Impact
- Abrupt Temporary Impact
- Accept-Support Testing
- Accept Threshold
- Activation Function (in Neural Networks)
- Additive Models
- Additive Season, Damped Trend
- Additive Season, Exponential Trend
- Additive Season, Linear Trend
- Additive Season, No Trend
- Adjusted means
- Aggregation
- AID
- Akaike Information Criterion (AIC)
- Algorithm
- Alpha
- Anderson-Darling Test
- ANOVA
- Append a Network
- Append Cases and/or Variables
- Application Programming Interface (API)
- Arrow
- Assignable Causes and Actions
- Association Rules
- Asymmetrical Distribution
- AT&T Runs Rules
- Attribute (attribute variable)
- Augmented Product Moment Matrix
- Autoassociative Network
- Automatic Network Designer

###### B

- B Coefficients
- Back Propagation
- Bagging (Voting, Averaging)
- Balanced ANOVA Design
- Banner Tables
- Bar/Column Plots, 2D
- Bar Dev Plot
- Bar Left Y Plot
- Bar Right Y Plot
- Bar Top Plot
- Bar X Plot
- Bartlett Window
- Basis Functions
- Batch algorithms in
*STATISTICA Neural Net* - Bayesian Information Criterion (BIC)
- Bayesian Networks
- Bayesian Statistics
- Bernoulli Distribution
- Best Network Retention
- Best Subset Regression
- Beta Coefficients
- Beta Distribution
- Bimodal Distribution
- Binomial Distribution
- Bivariate Normal Distribution
- Blocking
- Bonferroni Adjustment
- Bonferroni Test
- Boosting
- Boundary Case
- Box Plot/Medians (Block Stats Graphs)
- Box Plot/Means (Block Stats Graphs)
- Box Plots, 2D
- Box Plots, 2D - Box Whiskers
- Box Plots, 2D - Boxes
- Box Plots, 2D - Whiskers
- Box Plots, 3D
- Box Plots, 3D - Border-Style Ranges
- Box Plots, 3D - Double Ribbon Ranges
- Box Plots, 3D - Error Bars
- Box Plots, 3D - Flying Blocks
- Box Plots, 3D - Flying Boxes
- Box Plots, 3D - Points
- Box-Ljung Q Statistic
- Breakdowns
- Breaking Down (Categorizing)
- Brown-Forsythe Homogeneity of Variances
- Brushing
- Burt Table

###### C

- Canonical Correlation
- Cartesian Coordinates
- Casewise Missing Data Deletion
- Categorical Dependent Variable
- Categorical Predictor
- Categorized Graphs
- Categorized Plots, 2D-Detrended Prob. Plots
- Categorized Plots, 2D-Half-Normal Prob. Plots
- Categorized Plots, 2D - Normal Prob. Plots
- Categorized Plots, 2D - Prob.-Prob. Plots
- Categorized Plots, 2D - Quantile Plots
- Categorized Plots, 3D - Contour Plot
- Categorized Plots, 3D - Deviation Plot
- Categorized Plots, 3D - Scatterplot
- Categorized Plots, 3D - Space Plot
- Categorized Plots, 3D - Spectral Plot
- Categorized Plots, 3D - Surface Plot
- Categorized 3D Scatterplot (Ternary graph)
- Categorized Contour/Areas (Ternary graph)
- Categorized Contour/Lines (Ternary graph)
- Categorizing
- Cauchy Distribution
- Cause-and-Effect Diagram
- Censoring (Censored Observations)
- Censoring, Left
- Censoring, Multiple
- Censoring, Right
- Censoring, Single
- Censoring, Type I
- Censoring, Type II
- CHAID
- Characteristic Life
- Chernoff Faces (Icon Plots)
*Chi*-square Distribution- Circumplex
- City-Block (Manhattan) Distance
- Classification
- Classification (in Neural Networks)
- Classification and Regression Trees
- Classification by Labeled Exemplars (in NN)
- Classification Statistics (in Neural Networks)
- Classification Thresholds (in Neural Networks)
- Classification Trees
- Class Labeling (in Neural Networks)
- Cluster Analysis
- Cluster Diagram (in Neural Networks)
- Cluster Networks (in Neural Networks)
- Coarse Coding
- Codes
- Coding Variable
- Coefficient of Determination
- Coefficient of Variation
- Column Sequential/Stacked Plot
- Columns (Box Plot)
- Columns (Icon Plot)
- Common Causes
- Communality
- Complex Numbers
- Conditional Probability
- Conditioning (Categorizing)
- Confidence Interval
- Confidence Interval for the Mean
- Confidence Interval vs. Prediction Interval
- Confidence Limits
- Confidence Value (Association Rules)
- Confusion Matrix (in Neural Networks)
- Conjugate Gradient Descent (in Neural Net)
- Continuous Dependent Variable
- Contour/Discrete Raw Data Plot
- Contour Plot
- Control, Quality
- Cook's Distance
- Correlation
- Correlation, Intraclass
- Correlation (Pearson r)
- Correlation Value (Association Rules)
- Correspondence Analysis
- Cox-Snell Gen. Coefficient Determination
- Cpk, Cp, Cr
- CRISP
- Cross Entropy (in Neural Networks)
- Cross Verification (in Neural Networks)
- Cross-Validation
- Crossed Factors
- Crosstabulations
- C-SVM Classification
- Cubic Spline Smoother
- "Curse" of Dimensionality

###### D

- Daniell (or Equal Weight) Window
- Data Mining
- Data Preparation Phase
- Data Reduction
- Data Rotation (in 3D space)
- Data Warehousing
- Decision Trees
- Degrees of Freedom
- Deleted Residual
- Denominator Synthesis
- Dependent t-test
- Dependent vs. Independent Variables
- Deployment
- Derivative-Free Funct. Min. Algorithms
- Design, Experimental
- Design Matrix
- Desirability Profiles
- Detrended Probability Plots
- Deviance
- Deviance Residuals
- Deviation
- Deviation Assign. Algorithms (in Neural Net)
- Deviation Plot (Ternary Graph)
- Deviation Plots, 3D
- DFFITS
- DIEHARD Suite of Tests & Randm. Num. Gen.
- Differencing (in Time Series)
- Dimensionality Reduction
- Discrepancy Function
- Discriminant Function Analysis
- Distribution Function
- DOE
- Document Frequency
- Double-Y Histograms
- Double-Y Line Plots
- Double-Y Scatterplot
- Drill-Down Analysis
- Drilling-down (Categorizing)
- Duncan's test
- Dunnett's test
- DV

###### E

- Effective Hypothesis Decomposition
- Efficient Score Statistic
- Eigenvalues
- Ellipse, Prediction Area and Range
- EM Clustering
- Endogenous Variable
- Ensembles (in Neural Networks)
- Enterprise Resource Planning (ERP)
- Enterprise SPC
- Enterprise-Wide Software Systems
- Entropy
- Epoch in (Neural Networks)
- Eps
- EPSEM Samples
- ERP
- Error Bars (2D Box Plots)
- Error Bars (2D Range Plots)
- Error Bars (3D Box Plots)
- Error Bars (3D Range Plots)
- Error Function (in Neural Networks)
- Estimable Functions
- Euclidean Distance
- Euler's e
- Exogenous Variable
- Experimental Design
- Explained Variance
- Exploratory Data Analysis
- Exponential Distribution
- Exponential Family of Distributions
- Exponential Function
- Exponentially Weighted Moving Avg. Line
- Extrapolation
- Extreme Values (in Box Plots)
- Extreme Value Distribution

###### F

- F Distribution
- FACT
- Factor Analysis
- Fast Analysis Shared Multidimensional Info. FASMI
- Feature Extraction (vs. Feature Selection)
- Feature Selection
- Feedforward Networks
- Fisher LSD
- Fixed Effects (in ANOVA)
- Free Parameter
- Frequencies, Marginal
- Frequency Scatterplot
- Frequency Tables
- Function Minimization Algorithms

###### G

- g2 Inverse
- Gains Chart
- Gamma Coefficient
- Gamma Distribution
- Gaussian Distribution
- Gauss-Newton Method
- General ANOVA/MANOVA
- General Linear Model
- Generalization (in Neural Networks)
- Generalized Additive Models
- Generalized Inverse
- Generalized Linear Model
- Genetic Algorithm
- Genetic Algorithm Input Selection
- Geometric Distribution
- Geometric Mean
- Gibbs Sampler
- Gini Measure of Node Impurity
- Gompertz Distribution
- Goodness of Fit
- Gradient
- Gradient Descent
- Gradual Permanent Impact
- Group Charts
- Grouping (Categorizing)
- Grouping Variable
- Groupware

###### H

- Half-Normal Probability Plots
- Half-Normal Probability Plots - Categorized
- Hamming Window
- Hanging Bars Histogram
- Harmonic Mean
- Hazard
- Hazard Rate
- Heuristic
- Heywood Case
- Hidden Layers (in Neural Networks)
- High-Low Close
- Histograms, 2D
- Histograms, 2D - Double-Y
- Histograms, 2D - Hanging Bars
- Histograms, 2D - Multiple
- Histograms, 2D - Regular
- Histograms, 3D Bivariate
- Histograms, 3D - Box Plots
- Histograms, 3D - Contour/Discrete
- Histograms, 3D - Contour Plot
- Histograms, 3D - Spikes
- Histograms, 3D - Surface Plot
- Hollander-Proschan Test
- Hooke-Jeeves Pattern Moves
- Hosmer-Lemeshow Test
- HTM
- HTML
- Hyperbolic Tangent (tanh)
- Hyperplane
- Hypersphere

###### I

- Icon Plots
- Icon Plots - Chernoff Faces
- Icon Plots - Columns
- Icon Plots - Lines
- Icon Plots - Pies
- Icon Plots - Polygons
- Icon Plots - Profiles
- Icon Plots - Stars
- Icon Plots - Sun Rays
- Increment vs Non-Increment Learning Algr.
- Independent Events
- Independent t-test
- Independent vs. Dependent Variables
- Industrial Experimental Design
- Inertia
- Inlier
- In-Place Database Processing (IDP)
- Interactions
- Interpolation
- Interval Scale
- Intraclass Correlation Coefficient
- Invariance Const. Scale Factor ICSF
- Invariance Under Change of Scale (ICS)
- Inverse Document Frequency
- Ishikawa Chart
- Isotropic Deviation Assignment
- Item and Reliability Analysis
- IV

###### J

###### K

###### L

- Lack of Fit
- Lambda Prime
- Laplace Distribution
- Latent Semantic Indexing
- Latent Variable
- Layered Compression
- Learned Vector Quantization (in Neural Net)
- Learning Rate (in Neural Networks)
- Least Squares (2D graphs)
- Least Squares (3D graphs)
- Least Squares Estimator
- Least Squares Means
- Left and Right Censoring
- Levenberg-Marquardt Algorithm (in Neural Net)
- Levene's Test for Homogeneity of Variances
- Leverage values
- Life Table
- Life, Characteristic
- Lift Charts
- Likelihood
- Lilliefors test
- Line Plots, 2D
- Line Plots, 2D - Aggregated
- Line Plots, 2D (Case Profiles)
- Line Plots, 2D - Double-Y
- Line Plots, 2D - Multiple
- Line Plots, 2D - Regular
- Line Plots, 2D - XY Trace
- Linear (2D graphs)
- Linear (3D graphs)
- Linear Activation function
- Linear Modeling
- Linear Units
- Lines (Icon Plot)
- Lines (Matrix Plot)
- Lines Sequential/Stacked Plot
- Link Function
- Local Minima
- Locally Weighted (Robust) Regression
- Logarithmic Function
- Logistic Distribution
- Logistic Function
- Logit Regression and Transformation
- Log-Linear Analysis
- Log-Normal Distribution
- Lookahead (in Neural Networks)
- Loss Function
- LOWESS Smoothing

###### M

- Machine Learning
- Mahalanobis Distance
- Mallow's CP
- Manifest Variable
- Mann-Scheuer-Fertig Test
- MANOVA
- Marginal Frequencies
- Markov Chain Monte Carlo (MCMC)
- Mass
- Matching Moments Method
- Matrix Collinearity
- Matrix Ill-Conditioning
- Matrix Inverse
- Matrix Plots
- Matrix Plots - Columns
- Matrix Plots - Lines
- Matrix Plots - Scatterplot
- Matrix Rank
- Matrix Singularity
- Maximum Likelihood Loss Function
- Maximum Likelihood Method
- Maximum Unconfounding
- MD (Missing data)
- Mean
- Mean/S.D. Algorithm (in Neural Networks)
- Mean, Geometric
- Mean, Harmonic
- Mean Substitution of Missing Data
- Means, Adjusted
- Means, Unweighted
- Median
- Meta-Learning
- Method of Matching Moments
- Minimax
- Minimum Aberration
- Mining, Data
- Missing values
- Mixed Line Sequential/Stacked Plot
- Mixed Step Sequential/Stacked Plot
- Mode
- Model Profiles (in Neural Networks)
- Models for Data Mining
- Monte Carlo
- Multi-Pattern Bar
- Multicollinearity
- Multidimensional Scaling
- Multilayer Perceptrons
- Multimodal Distribution
- Multinomial Distribution
- Multinomial Logit and Probit Regression
- Multiple Axes in Graphs
- Multiple Censoring
- Multiple Dichotomies
- Multiple Histogram
- Multiple Line Plots
- Multiple Scatterplot
- Multiple R
- Multiple Regression
- Multiple Response Variables
- Multiple-Response Tables
- Multiple Stream Group Charts
- Multiplicative Season, Damped Trend
- Multiplicative Season, Exponential Trend
- Multiplicative Season, Linear Trend
- Multiplicative Season, No Trend
- Multivar. Adapt. Regres. Splines MARSplines
- Multi-way Tables

###### N

- Nagelkerke Gen. Coefficient Determination
- Naive Bayes
- Neat Scaling of Intervals
- Negative Correlation
- Negative Exponential (2D graphs)
- Negative Exponential (3D graphs)
- Neighborhood (in Neural Networks)
- Nested Factors
- Nested Sequence of Models
- Neural Networks
- Neuron
- Newman-Keuls Test
- N-in-One Encoding
- Noise Addition (in Neural Networks)
- Nominal Scale
- Nominal Variables
- Nonlinear Estimation
- Nonparametrics
- Non-Outlier Range
- Nonseasonal, Damped Trend
- Nonseasonal, Exponential Trend
- Nonseasonal, Linear Trend
- Nonseasonal, No Trend
- Normal Distribution
- Normal Distribution, Bivariate
- Normal Fit
- Normality Tests
- Normalization
- Normal Probability Plots
- Normal Probability Plots (Computation Note)
- n Point Moving Average Line

###### O

- ODBC
- Odds Ratio
- OLE DB
- On-Line Analytic Processing (OLAP)
- One-Off (in Neural Networks)
- One-of-N Encoding (in Neural Networks)
- One-Sample t-Test
- One-Sided Ranges Error Bars Range Plots
- One-Way Tables
- Operating Characteristic Curves
- Ordinal Multinomial Distribution
- Ordinal Scale
- Outer Arrays
- Outliers
- Outliers (in Box Plots)
- Overdispersion
- Overfitting
- Overlearning (in Neural Networks)
- Overparameterized Model

###### P

- Pairwise Del. Missing Data vs Mean Subst.
- Pairwise MD Deletion
- Parametric Curve
- Pareto Chart Analysis
- Pareto Distribution
- Part Correlation
- Partial Correlation
- Partial Least Squares Regression
- Partial Residuals
- Parzen Window
- Pearson Correlation
- Pearson Curves
- Pearson Residuals
- Penalty Functions
- Percentiles
- Perceptrons (in Neural Networks)
- Pie Chart
- Pie Chart - Counts
- Pie Chart - Multi-Pattern Bar
- Pie Chart - Values
- Pies (Icon Plots)
- PMML (Predictive Model Markup Language)
- PNG Files
- Poisson Distribution
- Polar Coordinates
- Polygons (Icon Plots)
- Polynomial
- Population Stability Report
- Portable Network Graphics Files
- Positive Correlation
- Post hoc Comparisons
- Post Synaptic Potential (PSP) Function
- Posterior Probability
- Power (Statistical)
- Power Goal
- Ppk, Pp, Pr
- Prediction Interval Ellipse
- Prediction Profiles
- Predictive Data Mining
- Predictive Mapping
- Predictive Model Markup Language (PMML)
- Predictors
- PRESS Statistic
- Principal Components Analysis
- Prior Probabilities
- Probability
- Probability Plots - Detrended
- Probability Plots - Normal
- Probability Plots - Half-Normal
- Probability-Probability Plots
- Probability-Probability Plots - Categorized
- Probability Sampling
- Probit Regression and Transformation
- PROCEED
- Process Analysis
- Process Capability Indices
- Process Performance Indices
- Profiles, Desirability
- Profiles, Prediction
- Profiles (Icon Plots)
- Pruning (in Classification Trees)
- Pseudo-Components
- Pseudo-Inverse Algorithm
- Pseudo-Inverse-Singular Val. Decomp. NN
- PSP (Post Synaptic Potential) Function
- Pure Error
- p-Value (Statistical Significance)

###### Q

###### R

- R Programming Language
- Radial Basis Functions
- Radial Sampling (in Neural Networks)
- Random Effects (in Mixed Model ANOVA)
- Random Forests
- Random Num. from Arbitrary Distributions
- Random Numbers (Uniform)
- Random Sub-Sampling in Data Mining
- Range Ellipse
- Range Plots - Boxes
- Range Plots - Columns
- Range Plots - Whiskers
- Rank
- Rank Correlation
- Ratio Scale
- Raw Data, 3D Scatterplot
- Raw Data Plots, 3D - Contour/Discrete
- Raw Data Plots, 3D - Spikes
- Raw Data Plots, 3D - Surface Plot
- Rayleigh Distribution
- Receiver Oper. Characteristic Curve
- Receiver Oper. Characteristic (in Neural Net)
- Rectangular Distribution
- Regression
- Regression (in Neural Networks)
- Regression, Multiple
- Regression Summary Statistics (in Neural Net)
- Regular Histogram
- Regular Line Plots
- Regular Scatterplot
- Regularization (in Neural Networks)
- Reject Inference
- Reject Threshold
- Relative Function Change Criterion
- Reliability
- Reliability and Item Analysis
- Representative Sample
- Resampling (in Neural Networks)
- Residual
- Resolution
- Response Surface
- Right Censoring
- RMS (Root Mean Squared) Error
- Robust Locally Weighted Regression
- ROC Curve
- ROC Curve (in Neural Networks)
- Root Cause Analysis
- Root Mean Square Stand. Effect RMSSE
- Rosenbrock Pattern Search
- Rotating Coordinates, Method of
- r (Pearson Correlation Coefficient)
- Runs Tests (in Quality Control)

###### S

- Sampling Fraction
- Scalable Software Systems
- Scaling
- Scatterplot, 2D
- Scatterplot, 2D-Categorized Ternary Graph
- Scatterplot, 2D - Double-Y
- Scatterplot, 2D - Frequency
- Scatterplot, 2D - Multiple
- Scatterplot, 2D - Regular
- Scatterplot, 2D - Voronoi
- Scatterplot, 3D
- Scatterplot, 3D - Raw Data
- Scatterplot, 3D - Ternary Graph
- Scatterplot Smoothers
- Scheffe's Test
- Score Statistic
- Scree Plot, Scree Test
- S.D. Ratio
- Semi-Partial Correlation
- SEMMA
- Sensitivity Analysis (in Neural Networks)
- Sequential Contour Plot, 3D
- Sequential/Stacked Plots, 2D
- Sequential/Stacked Plots, 2D - Area
- Sequential/Stacked Plots, 2D - Column
- Sequential/Stacked Plots, 2D - Lines
- Sequential/Stacked Plots, 2D - Mixed Line
- Sequential/Stacked Plots, 2D - Mixed Step
- Sequential/Stacked Plots, 2D - Step
- Sequential/Stacked Plots, 2D - Step Area
- Sequential Surface Plot, 3D
- Sets of Samples in Quality Control Charts
- Shapiro-Wilks' W test
- Shewhart Control Charts
- Short Run Control Charts
- Shuffle, Back Propagation (in Neural Net)
- Shuffle Data (in Neural Networks)
- Sigma Restricted Model
- Sigmoid Function
- Signal Detection Theory
- Simple Random Sampling (SRS)
- Simplex Algorithm
- Single and Multiple Censoring
- Singular Value Decomposition
- Six Sigma (DMAIC)
- Six Sigma Process
- Skewness
- Slicing (Categorizing)
- Smoothing
- SOFMs Self-Organizing Maps Kohonen Net
- Softmax
- Space Plots 3D
- SPC
- Spearman R
- Special Causes
- Spectral Plot
- Spikes (3D graphs)
- Spinning Data (in 3D space)
- Spline (2D graphs)
- Spline (3D graphs)
- Split Selection (for Classification Trees)
- Splitting (Categorizing)
- Spurious Correlations
- SQL
- Square Root of the Signal to Noise Ratio (f)
- Stacked Generalization
- Stacking (Stacked Generalization)
- Standard Deviation
- Standard Error
- Standard Error of the Mean
- Standard Error of the Proportion
- Standardization
- Standardized DFFITS
- Standardized Effect (Es)
- Standard Residual Value
- Stars (Icon Plots)
- Stationary Series (in Time Series)
- STATISTICA Advanced Linear/Nonlinear
- STATISTICA Automated Neural Networks
- STATISTICA Base
- STATISTICA Data Miner
- STATISTICA Data Warehouse
- STATISTICA Document Management System
- STATISTICA Enterprise
- STATISTICA Enterprise/QC
- STATISTICA Enterprise Server
- STATISTICA Enterprise SPC
- STATISTICA Monitoring and Alerting Server
- STATISTICA MultiStream
- STATISTICA Multivariate Stat. Process Ctrl
- STATISTICA PI Connector
- STATISTICA PowerSolutions
- STATISTICA Process Optimization
- STATISTICA Quality Control Charts
- STATISTICA Sequence Assoc. Link Analysis
- STATISTICA Text Miner
- STATISTICA Variance Estimation Precision
- Statistical Power
- Statistical Process Control (SPC)
- Statistical Significance (p-value)
- Steepest Descent Iterations
- Stemming
- Steps
- Stepwise Regression
- Stiffness Parameter (in Fitting Options)
- Stopping Conditions
- Stopping Conditions (in Neural Networks)
- Stopping Rule (in Classification Trees)
- Stratified Random Sampling
- Stub and Banner Tables
- Studentized Deleted Residuals
- Studentized Residuals
- Student's t Distribution
- Sum-Squared Error Function
- Sums of Squares (Type I, II, III (IV, V, VI))
- Sun Rays (Icon Plots)
- Supervised Learning (in Neural Networks)
- Support Value (Association Rules)
- Support Vector
- Support Vector Machine (SVM)
- Suppressor Variable
- Surface Plot (from Raw Data)
- Survival Analysis
- Survivorship Function
- Sweeping
- Symmetrical Distribution
- Symmetric Matrix
- Synaptic Functions (in Neural Networks)

###### T

- Tables
- Tapering
- t Distribution (Student's)
- Tau, Kendall
- Ternary Plots, 2D - Scatterplot
- Ternary Plots, 3D
- Ternary Plots, 3D - Categorized Scatterplot
- Ternary Plots, 3D - Categorized Space
- Ternary Plots, 3D - Categorized Surface
- Ternary Plots, 3D - Categorized Trace
- Ternary Plots, 3D - Contour/Areas
- Ternary Plots, 3D - Contour/Lines
- Ternary Plots, 3D - Deviation
- Ternary Plots, 3D - Space
- Text Mining
- THAID
- Threshold
- Time Series
- Time Series (in Neural Networks)
- Time-Dependent Covariates
- Tolerance (in Multiple Regression)
- Topological Map
- Trace Plots, 3D
- Trace Plot, Categorized (Ternary Graph)
- Training/Test Error/Classification Accuracy
- Transformation (Probit Regression)
- Trellis Graphs
- Trimmed Means
- t-Test (independent & dependent samples)
- Tukey HSD
- Tukey Window
- Two-State (in Neural Networks)
- Type I, II, III (IV, V, VI) Sums of Squares
- Type I Censoring
- Type II Censoring
- Type I Error Rate

###### U

###### V

###### W

###### X

###### Y

###### Z

Machine Learning. Machine learning, computational learning theory, and similar terms are often used in the context of Data Mining, to denote the application of generic model-fitting or classification algorithms for predictive data mining. Unlike traditional statistical data analysis, which is usually concerned with the estimation of population parameters by statistical inference, the emphasis in data mining (and machine learning) is usually on the accuracy of prediction (predicted classification), regardless of whether or not the "models" or techniques that are used to generate the prediction is interpretable or open to simple explanation. A good example of this type of technique often applied to predictive data mining are neural networks or meta-learning techniques such as boosting, etc. These methods usually involve the fitting of very complex "generic" models, that are not related to any reasoning or theoretical understanding of underlying causal processes; instead, these techniques can be shown to generate accurate predictions or classification in crossvalidation samples.

Mahalanobis Distance. We can think of the independent variables (in a regression equation) as defining a multidimensional space in which each observation can be plotted. Also, we can plot a point representing the means for all independent variables. This "mean point" in the multidimensional space is also called the centroid. The *Mahalanobis distance* is the distance of a case from the centroid in the multidimensional space, defined by the correlated independent variables (if the independent variables are uncorrelated, it is the same as the simple Euclidean distance). Thus, this measure provides an indication of whether or not an observation is an outlier with respect to the independent variable values. See also, standard residual value, deleted residual and Cook’s distance.

Mallow's CP. If *p* regressors are selected from a set of *k, Cp* is defined as:

S (y-yp)^{2} / s^{2} - n+2p

where

y_{p} is the predicted value of y from the p regressors

s^{2} is the residual mean square after regression on the complete set of k

n *is the sample size*

The model is then chosen to give a minimum value of the criterion, or a value that is acceptably small. It is essential a special case of Akaike Information Criterion. Mallow's CP is used in *General Regression Models (GRM)* as the criterion for choosing the best subset of predictor effects when a best subset regression analysis is being performed. This measure of the quality of fit for a model tends to be less dependent (than the *R-square*) on the number of effects in the model, and hence, it tends to find the best subset that includes only the important predictors of the respective dependent variable. See Best Subset Regression Options in GRM for further details.

Manifest Variable. A manifest variable is a variable that is directly observable or measurable. In path analysis diagrams used in structural modeling (see Path Diagram), manifest variables are usually represented by enclosing the variable name within a square or a rectangle.

Mann-Scheuer-Fertig Test. This test, proposed by Mann, Scheuer, and Fertig (1973), is described in detail in, for example, Dodson (1994) or Lawless (1982). The null hypothesis for this test is that the population follows the Weibull distribution with the estimated parameters. Nelson (1982) reports this test to have reasonably good power, and this test can be applied to Type II censored data. For computational details refer to Dodson (1994) or Lawless (1982); the critical values for the test statistic have been computed based on Monte Carlo studies, and have been tabulated for *n* (sample sizes) between 3 and 25; for *n* greater than 25, this test is not computed.

The *Mann-Scheuer-Fertig test* is used in Weibull and Reliability/Failure Time Analysis; see also, Hollander-Proschan Test and Anderson-Darling Test.

Map-Reduce. As a general approach, when analyzing hundreds of terabytes of data, or petabytes of data, it is not feasible to extract the data to another location for analysis; the process of moving data across wires to a separate server or servers (for parallel processing) would take too long and require too much bandwidth. Instead, the analytic computations must be performed physically close to where the data are stored. It is easier to bring the analytics to the data than the data to the analytics.

Map-reduce algorithms, i.e., data processing algorithms designed according to this pattern, do exactly that. A central component of the algorithm will map sub-computations to different locations in the distributed file system and combine the results (the reduce-step) that are computed at the individual nodes of the file system. In short, to compute a count, the algorithm would compute sub-totals within each node and in parallel in the distributed file system, and report back to the map component the subtotals, which are then added up.

Marginal Frequencies. In a Multi-way table, the values in the margins of the table are simply one-way (frequency) tables for all values in the table. They are important in that they help us to evaluate the arrangement of frequencies in individual columns or rows. The differences between the distributions of frequencies in individual rows (or columns) and in the respective margins inform us about the relationship between the crosstabulated variables. For more information on Marginal frequencies, see the Crosstabulations section of Basic Statistics.

Markov Chain Monte Carlo (MCMC). The term *"Monte Carlo method"* (suggested by John von Neumann and S. M. Ulam, in the 1940s) refers to simulation of processes, using random numbers. The term *Monte Carlo* (a city long known for its gambling casinos) derived from the fact that "numbers of chance" (i.e., *Monte Carlo* simulation methods) were used in order to solve some of the integrals of the complex equations involved in the design of the first nuclear bombs (integrals of quantum dynamics). By generating large samples of random numbers from, for example, mixtures of distributions, the integrals of these (complex) distributions can be approximated from the (generated) data.

Complex equations with difficult to solve integrals are often involved in Bayesian Statistics Analyses. For a simple example of the *MCMC* method for generating bivariate normal random variables, see the description of the Gibbs Sampler.

For a detailed discussion of *MCMC* methods, see Gilks, Richardson, and Spiegelhalter (1996). See also the description of the Gibbs Sampler, and Bayesian Statistics (Analysis).

Mass. The term *mass* in correspondence analysis is used to denote the entries in the two-way table of relative frequencies (i.e., each entry is divided by the sum of all entries in the table). Note that the results from correspondence analysis are still valid if the entries in the table are not frequencies, but some other measure of correspondence, association, similarity, confusion, etc. Since the sum of all entries in the table of relative frequencies is equal to 1.0, we could say that the table of relative frequencies shows how one unit of mass is distributed across the cells of the table. In the terminology of correspondence analysis, the row and column totals of the table of relative frequencies are called the row mass and column mass, respectively.

Matching Moments Method. This method can be employed to determine parameter estimates for a distribution (see *Quantile- Quantile Plots*, *Probability-Probability Plots*, and *Process Analysis*). The method of matching moments sets the distribution moments equal to the data moments and solves to obtain estimates for the distribution parameters. For example, for a distribution with two parameters, the first two moments of the distribution (the mean and variance of the distribution, respectively, e.g., and , respectively) would be set equal to the first two moments of the data (the sample mean and variance, respectively, e.g., the unbiased estimators x-bar and s**2, respectively) and solved for the parameter estimates. Alternatively, you could use the Maximum Likelihood Method to estimate the parameters. For more information, see Hahn and Shapiro, 1994.

Matrix Collinearity, Multicollinearity. This term is used in the context of correlation matrices or covariance matrices, to describe the condition when one or more variables from which the respective matrix was computed are linear functions of other variables; as a consequence such matrices cannot be inverted (only the generalized Inverse can be computed). See also Matrix Singularity for additional details.

Matrix Ill-Conditioning. *Matrix ill-conditioning *is a general term used to describe a rectangular matrix of values which is unsuitable for use in a particular analysis.

This occurs perhaps most frequently in applications of linear multiple regression when the matrix of *correlations* for the predictors is *singular* and thus the regular matrix inverse cannot be computed. In some modules (i.e., in *Factor Analysis*) this problem is dealt with by issuing a respective warning and then artificially lowering all *correlations *in the correlation matrix by adding a small constant to the diagonal elements of the matrix, and then restandardizing it. This procedure will usually yield a matrix for which the regular matrix inverse can be computed.

Note that in many applications of the general linear model* *and the generalized linear/nonlinear model, matrix singularity is not abnormal (i.e., when the overparameterized model is used to represent effects for *categorical predictor variables*) and is dealt with by computing a generalized inverse rather than the regular *matrix inverse*.

Another example of matrix ill-conditioning* *is intransitivity of the correlations* *in a correlation matrix. If in a correlation matrix variable *A* is highly positively correlated with *B*, *B* is highly positively correlated with *C*, and *A* is highly negatively correlated with *C*, this "impossible" pattern of correlations signals an error in the elements of the matrix. See also matrix singularity, matrix inverse, generalized inverse.

Matrix Inverse. The *regular inverse* of a rectangular matrix of values is an extension of the concept of a numeric reciprocal. For a *nonsingular matrix* ** A, **its

*inverse*(denoted by a superscript of -1) is the unique matrix that satisfies

A^{-1}AA=A

No such regular inverse exists for singular matrices, but generalized inverses* *(an infinite number of them) can be computed for any singular matrix. See also matrix singularity, generalized inverse.

Matrix Plots. Matrix graphs summarize the relationships between several variables in a matrix of true *X-Y* plots. The most common type of matrix plot is the scatterplot, which can be considered to be the graphical equivalent of the correlation matrix.

Matrix Plots - Columns. In this type of Matrix Plot, columns represent projections of individual data points onto the *X- axis* (showing the distribution of the maximum values), arranged in a matrix format. Histograms representing the distribution of each variable are displayed along the diagonal of the matrix (in square matrices, see example below) or along the edges (in rectangular matrices).

Matrix Plots - Lines. In this type of Matrix Plot, a matrix of *X-Y* (i.e., nonsequential) line plots (similar to a scatterplot matrix) is produced in which individual points are connected by a line in the order of their appearance in the data file. Histograms representing the distribution of each variable are displayed along the diagonal of the matrix (in square matrices) or along the edges (in rectangular matrices, see example below).

Matrix Plots - Scatterplot. In this type of Matrix Plot, 2D Scatterplots are arranged in a matrix format (values of the column variable are used as *X* coordinates, values of the row variable represent the *Y* coordinates). Histograms representing the distribution of each variable are displayed along the diagonal of the matrix (in square matrices, see example below) or along the edges (in rectangular matrices).

See also, Data Reduction.

Matrix Rank. The column (or row) *rank* of a rectangular matrix of values (e.g., a sums of squares and cross-products matrix) is equal to the number of linearly independent columns (or rows) of elements in the matrix. If there are no columns that are linearly dependent on other columns, then the rank of the matrix is equal to the number of its columns and the matrix is said to have full (column) *rank*. If the *rank *is less than the number of columns, the matrix is said to have reduced (column) *rank* and is *singular*. See also matrix singularity.

Matrix Singularity. A rectangular matrix of values (e.g., a sums of squares and cross-products matrix) is *singular *if the elements in a column (or row) of the matrix are linearly dependent on the elements in one or more other columns (or rows) of the matrix. For example, if the elements in one column of a matrix are *1, -1, 0, *and the elements in another column of the matrix are 2, -2, 0, then the matrix is singular* *because *2 *times each of the elements in the first column is equal to each of the respective elements in the second column. Such matrices are also said to suffer from multicollinearity problems, since one or more columns are linearly related to each other.

A unique, regular matrix inverse* *cannot be computed for singular matrices, but generalized inverses* *(an infinite number of them) can be computed for any singular matrix. See also, matrix inverse.

Maximum Likelihood Loss Function. An common alternative to the least squares loss function is to maximize the likelihood or log-likelihood function (or to minimize the negative log-likelihood function; the term maximum likelihood was first used by Fisher, 1922a). These functions are typically used when fitting non-linear models. In most general terms, the likelihood function is defined as:

L=F(Y,Model)=^{n}_{i=1} { p[y_{i} , Model Parameters(x_{i})]}

Maximum Likelihood Method. The method of maximum likelihood (the term first used by Fisher, 1922a) is a general method of estimating parameters of a population by values that maximize the *likelihood* (*L*) of a sample. The *likelihood L* of a sample of n observations *x _{1}, x_{2}, ..., x_{n}*, is the joint probability function

*p*(

*x*) when

_{1}, x_{2}, ..., x_{n}*x*are discrete random variables. If

_{1}, x_{2}, ..., x_{n}*x*are continuous random variables, then the

_{1}, x_{2}, ..., x_{n}*likelihood L*of a sample of

*n*observations,

*x*, is the joint density function

_{1}, x_{2}, ..., x_{n}*f*(

*x*).

_{1}, x_{2}, ..., x_{n}Let *L* be the likelihood of a sample, where *L* is a function of the parameters _{1}, _{2}, ... _{k}. Then the maximum likelihood estimators of _{1}, _{2}, ... _{k} are the values of _{1}, _{2}, ... _{k} that maximize *L*.

Let be an element of . If is an open interval, and if *L*() is differentiable and assumes a maximum on W, then the MLE will be a solution of the following equation: (d*L*())/d = 0. For more information, see Bain and Engelhardt (1989) and Neter, Wasserman, and Kutner (1989).

See also, Nonlinear Estimation or Variance Components and Mixed Model ANOVA/ANCOVA.

Maximum Unconfounding. *Maximum unconfounding* is an experimental design criterion that is subsidiary to the criterion of design resolution. The *maximum unconfounding* criterion specifies that design generators should be chosen such that the maximum number of interactions of less than or equal to the crucial order, given the *resolution*, are unconfounded with all other interactions of the crucial order. It is an alternative to the *minimum aberration* criterion for finding the "best" design of maximum resolution. . For discussions of the role of design criteria in experimental design see 2**(k-p) fractional factorial designs and 2**(k-p) Maximally Unconfounded and Minimum Aberration Designs.

MD (Missing data). See Missing values.

Mean. The mean is a particularly informative measure of the "central tendency" of the variable if it is reported along with its confidence intervals. Usually we are interested in statistics (such as the mean) from our sample only to the extent to which they are informative about the population. The larger the sample size, the more reliable its mean. The larger the variation of data values, the less reliable the mean (see also Elementary Concepts).

Mean = (x_{i})/n

where

*n* is the sample size.

See also, Descriptive Statistics

Mean/S.D. An algorithm (used in neural networks) to assign linear scaling coefficients for a set of numbers. The mean and standard deviation of the set are found, and scaling factors selected so that these are mapped to desired mean and standard deviation values. See also, Neural Networks.

Mean Substitution of Missing Data. When you select *Mean Substitution*, the missing data will be replaced by the means for the respective variables during an analysis. See also, Casewise vs. pairwise deletion of missing data

Median. A measure of central tendency, the *median* (the term first used by Galton, 1882) of a sample is the value for which one-half (50%) of the observations (when ranked) will lie above that value and one-half will lie below that value. When the number of values in the sample is even, the *median* is computed as the average of the two middle values. See also, Descriptive Statistics.

Meta-Learning. The concept of meta-learning applies to the area of predictive data mining, to combine the predictions from multiple models. It is particularly useful when the types of models included in the project are very different. In this context, this procedure is also referred to as Stacking (Stacked Generalization).

Suppose your data mining project includes tree classifiers, such as C&RT and CHAID, linear discriminant analysis (e.g., GDA), and Neural Networks. Each computes predicted classifications for a crossvalidation sample, from which overall goodness-of-fit statistics (e.g., misclassification rates) can be computed. Experience has shown that combining the predictions from multiple methods often yields more accurate predictions than can be derived from any one method (e.g., see Witten and Frank, 2000). The predictions from different classifiers can be used as input into a meta-learner, which will attempt to combine the predictions to create a final best predicted classification. So, for example, the predicted classifications from the tree classifiers, linear model, and the neural network classifier(s) can be used as input variables into a neural network meta-classifier, which will attempt to "learn" from the data how to combine the predictions from the different models to yield maximum classification accuracy.

We can apply meta-learners to the results from different meta-learners to create "meta-meta"-learners, and so on; however, in practice such exponential increase in the amount of data processing, in order to derive an accurate prediction, will yield less and less marginal utility.

Minimax. An algorithm to assign linear scaling coefficients for a set of numbers. The minimum and maximum of the set are found, and scaling factors selected so that these are mapped to desired minimum and maximum values. See also, Neural Networks.

Minimum Aberration. *Minimum aberration* is an experimental design criterion that is subsidiary to the criterion of design resolution. The *minimum aberration* design is defined as the design of maximum *resolution* "which minimizes the number of words in the defining relation that are of minimum length" (Fries & Hunter, 1980). Less technically, the criterion apparently operates by choosing design generators that produce the smallest number of pairs of confounded interactions of the crucial order. For example, the *minimum aberration* resolution IV design would have the minimum number of pairs of confounded 2-factor interactions. For discussions of the role of design criteria in experimental design see 2**(k-p) fractional factorial designs and 2**(k-p) Maximally Unconfounded and Minimum Aberration Designs.

Missing Values. Values of variables within data sets which are not known. Although such cases that contain missing data are incomplete, they can still be used in data analysis. Various methods exist to substitute missing data (e.g., by mean substitution, various types of interpolations and extrapolations). Also, pairwise deletion of missing data can be used. See also, Pairwise deletion of missing data, Casewise (Listwise) deletion of missing data, Pairwise deletion of missing data vs. mean substitution, and Casewise vs. pairwise deletion of missing data.

Mode. A measure of central tendency, the *mode* (the term first used by Pearson, 1895) of a sample is the value which occurs most frequently in the sample. See also, Descriptive Statistics.

Model Profiles (in Neural Networks). Model profiles are concise text strings indicating the architecture of networks and ensembles. A profile consists of a type code followed by a code giving the number of input and output variables and number of layers and units (networks) or members (ensembles). For time series networks, the number of steps and the lookahead factor are also given. The individual parts of the profile are:

Model Type. The codes are:

MLP |
Multilayer Perceptron Network |

RBF |
Radial Basis Function Network |

SOFM |
Kohonen Self-Organizing Feature Map |

Linear |
Linear Network |

PNN |
Probabilistic Neural Network |

GRNN |
Generalized Regression Neural Network |

PCA |
Principal Components Network |

Cluster |
Cluster Network |

Output |
Output Ensemble |

Conf |
Confidence Ensemble |

Network architecture. This is of the form I:N-N-N:O, where *I* is the number of input variable, *O* the number of output variables, and *N* the number of units in each layer.

**Example.** 2:4-6-3:1 indicates a network with *2* input variables, *1* output variable, *4* input neurons, *6* hidden neurons, and *3* output neurons.

For a time series network, the steps factor is prepended to the profile, and signified by an "s."

**Example.** s10 1:10-2-1:1 indicates a time series network with steps factor (lagged input) 10.

Ensemble architecture. This is of the form I:[N]:O, where *I* is the number of input variable, *O* the number of output variables, and *N* the number of members of the ensemble.

Models for Data Mining. In the business environment, complex data mining projects may require the coordinate efforts of various experts, stakeholders, or departments throughout an entire organization. In the data mining literature, various "general frameworks" have been proposed to serve as blueprints for how to organize the process of gathering data, analyzing data, disseminating results, implementing results, and monitoring improvements.

One such model, CRISP (Cross-Industry Standard Process for data mining) was proposed in the mid-1990s by a European consortium of companies to serve as a non-proprietary standard process model for data mining. This general approach postulates the following (perhaps not particularly controversial) general sequence of steps for data mining projects:

Another approach - the Six Sigma methodology - is a well-structured, data-driven methodology for eliminating defects, waste, or quality control problems of all kinds in manufacturing, service delivery, management, and other business activities. This model has recently become very popular (due to its successful implementations) in various American industries, and it appears to gain favor worldwide. It postulated a sequence of, so-called, DMAIC steps -

- that grew up from the manufacturing, quality improvement, and process control traditions and is particularly well suited to production environments (including "production of services," i.e., service industries).

Another framework of this kind (actually somewhat similar to Six Sigma) is the approach proposed by SAS Institute called SEMMA -

- which is focusing more on the technical activities typically involved in a data mining project.

All of these models are concerned with the process of how to integrate data mining methodology into an organization, how to "convert data into information," how to involve important stake-holders, and how to disseminate the information in a form that can easily be converted by stake-holders into resources for strategic decision making.

Some software tools for data mining are specifically designed and documented to fit into one of these specific frameworks.

The general underlying philosophy of StatSoft's *STATISTICA* Data Miner is to provide a flexible data mining workbench that can be integrated into any organization, industry, or organizational culture, regardless of the general data mining process-model that the organization chooses to adopt. For example, *STATISTICA* Data Miner can include the complete set of (specific) necessary tools for ongoing company wide Six Sigma quality control efforts, and users can take advantage of its (still optional) DMAIC-centric user interface for industrial data mining tools. It can equally well be integrated into ongoing marketing research, CRM (Customer Relationship Management) projects, etc. that follow either the CRISP or SEMMA approach - it fits both of them perfectly well without favoring either one. Also, *STATISTICA* Data Miner offers all the advantages of a general data mining oriented "development kit" that includes easy to use tools for incorporating into your projects not only such components as custom database gateway solutions, prompted interactive queries, or proprietary algorithms, but also systems of access privileges, workgroup management, and other collaborative work tools that allow you to design large scale, enterprise-wide systems (e.g., following the CRIPS, SEMMA, or a combination of both models) that involve your entire organization. See also Data Mining Techniques.

Monte Carlo. A computer-intensive technique for assessing how a statistic will perform under repeated sampling. In Monte Carlo methods, the computer uses random number simulation techniques to mimic a statistical population. In the *STATISTICA* Monte Carlo procedure, the computer constructs the population according to the user’s prescription, then does the following:

For each Monte Carlo replication, the computer:

- Simulates a random sample from the population,
- Analyzes the sample,
- Stores the results.

After many replications, the stored results will mimic the sampling distribution of the statistic. Monte Carlo techniques can provide information about sampling distributions when exact theory for the sampling distribution is not available.

Multidimensional Scaling. *Multidimensional scaling* (*MDS*) can be considered to be an alternative to factor analysis (see Factor Analysis), and it is typically used as an exploratory method. In general, the goal of the analysis is to detect meaningful underlying dimensions that allow the researcher to explain observed similarities or dissimilarities (distances) between the investigated objects. In factor analysis, the similarities between objects (e.g., variables) are expressed in the correlation matrix. With *MDS,* we can analyze not only correlation matrices but also any kind of similarity or dissimilarity matrix (including sets of measures that are not internally consistent, e.g., do not follow the rule of transitivity). For more information, see the Multidimensional Scaling overview.

Multilayer Perceptrons. Feedforward neural networks having linear PSP functions and (usually) non-linear activation functions.

Multimodal Distribution. A distribution that has multiple modes (thus two or more "peaks").

Multimodality of the distribution in a sample is often a strong indication that the distribution of the variable in population is not normal. Multimodality of the distribution may provide important information about the nature of the investigated variable (i.e., the measured quality). For example, if the variable represents a reported preference or attitude, then multimodality may indicate that there are several pronounced views or patterns of response in the questionnaire. Often however, the multimodality may indicate that the sample is not homogenous and the observations come in fact from two or more "overlapping" distributions. Sometimes, multimodality of the distribution may indicate problems with the measurement instrument (e.g, "gage calibration problems" in natural sciences, or "response biases" in social sciences). See also, unimodal distribution, bimodal distribution.

Multinomial Distribution. The multinomial distribution arises when a response variable is categorical in nature, i.e., consists of data describing the membership of the respective cases to a particular category. For example, if a researcher recorded the outcome for the driver in accidents as "uninjured, "injury not requiring hospitalization", "injury requiring hospitalization", or "fatality", then the distribution of the counts in these categories would be multinomial (see Agresti, 1996). The multinomial distribution is a generalization of the binomial distribution to more than two categories.

If the categories for the response variable can be ordered, then the distribution of that variable is referred to as *ordinal multinomial*. For example, if in a survey the responses to a question are recorded such that respondents have to choose from the pre-arranged categories "Strongly agree", "Agree", "Neither agree nor disagree", "Disagree", and "Strongly disagree", then the counts (number of respondents) that endorsed the different categories would follow an ordinal multinomial distribution (since the response categories are ordered with respect to increasing degrees of disagreement).

Specialized methods for analyzing multinomial and ordinal multinomial response variables can be found in *Generalized Linear Models*.

Multinomial Logit and Probit Regression. The multinomial logit and probit regression models are extensions of the standard logit and probit regression models to the case where the dependent variable has more than two categories (e.g., not just *Pass - Fail*, but *Pass*, *Fail*, *Withdrawn*), i.e., when the dependent or response variable of interest follows a multinomial distribution rather than binomial distribution. When multinomial responses contain rank-order information, they are also called *ordinal multinomial responses *(see ordinal multinomial distribution).

For additional details, see also the discussion of Link Functions, Probit Transformation and Regression, Logit Transformation and Regression, or *Generalized Linear Models*.

Multi-Pattern Bar. Multi-pattern bar plots may be used to represent individual data values of the *X* variable (the same type of data as in pie charts), however, consecutive data values of the *X* variable are represented by the heights of sequential vertical bars, each of a different color and pattern (rather than as pie wedges of different widths).

Multiple Axes in Graphs. An arrangement of axes (coordinate scales) in graphs, where two or more axes are placed parallel to each other, in order to either:

- represent different units in which the variable(s) depicted in the graph can be measured (e.g., a Celsius and Fahrenheit scales of temperature), or

- allow for a comparison of trends or shapes between several plots placed in one graph (e.g., one axis for each plot) which otherwise would be obscured by incompatible measurement units or ranges of values for each variable (that is an extension of the common "double-Y" type of graph).

The latter instance, which requires the appropriate plot legends to be attached to each axis, is illustrated in the graph above.

Multiple Dichotomies. One possible coding scheme that can be used when more than one response is possible from a given question is to code responses using *Multiple dichotomies* . For example, as part of a larger market survey, suppose you asked a sample of consumers to name their three favorite soft drinks. The specific item on the questionnaire may look like this:

**Write down your three favorite soft drinks:
1:__________ 2:__________ 3:__________**

Suppose in the above example we were only interested in *Coke*, *Pepsi*, and *Sprite*. One way to code the data in that case would be as follows:

COKE | PEPSI | SPRITE | . . . . | |
---|---|---|---|---|

case 1 case 2 case 3 . . . |
1 . . . |
1 1 . . . |
1 . . . |

In other words, one variable was created for each soft drink, then a value of *1* was entered into the respective variable whenever the respective drink was mentioned by the respective respondent. Note that each variable represents a *dichotomy*; that is, only "*1*"s and "*not 1*"s are allowed (we could have entered *1*'s and *0*'s, but to save typing we can also simply leave the *0*'s as blanks or as missing values). When tabulating these variables, we would like to compute the number and percent of respondents (and responses) for each soft drink. In a sense, we "compact" the three variables *Coke*, *Pepsi*, and *Sprite* into a single variable (*Soft Drink*) consisting of *multiple dichotomies*. For more information on *Multiple dichotomies*, see the Multiple Response Tables section of Basic Statistics.

Multiple Histogram. Multiple histograms present frequency distributions of more than one variable in one 2D graph. Unlike the Double-Y Histograms, the frequencies for all variables are plotted against the same left-*Y* axis.

Also, the values of all examined variables are plotted against a single *X-axis*, which facilitates comparisons between analyzed variables.

Multiple R. The coefficient of multiple correlation (Multiple R) is the positive square root of *R-square* (the coefficient of multiple determination, see Residual Variance and R-Square). This statistic is useful in multivariate regression (i.e., multiple independent variables) when you want to describe the relationship between the variables.

Multiple Regression. The general purpose of *multiple regression* (the term was first used by Pearson, 1908) is to analyze the relationship between several independent or predictor variables and a dependent or criterion variable.

The computational problem that needs to be solved in multiple regression analysis is to fit a straight line (or plane in an *n*-dimensional space, where *n* is the number of independent variables) to a number of points. In the simplest case - one dependent and one independent variable - we can visualize this in a scatterplot (scatterplots are two-dimensional plots of the scores on a pair of variables). It is used as either a hypothesis testing or exploratory method. For more information, see the Multiple Regression overview.

Multiple Response Variables. Coding the responses to *Multiple response variables* is necessary when more than one response is possible from a given question. For example, as part of a larger market survey, suppose you asked a sample of consumers to name their three favorite soft drinks. The specific item on the questionnaire may look like this:

**Write down your three favorite soft drinks:
1:__________ 2:__________ 3:__________**

Thus, the questionnaires returned to you will contain somewhere between 0 and 3 answers to this item. Also, a wide variety of soft drinks will most likely be named. One way to record the various responses would be to use three *multiple response variables* and a coding scheme for the many soft drinks. Then we could enter the respective codes (or alphanumeric labels) into the three variables, in the same way that respondents wrote them down in the questionnaire.

Resp. 1 | Resp. 2 | Resp. 3 | |
---|---|---|---|

case 1 case 2 case 3 . . . |
COKE SPRITE PERRIER . . . |
PEPSI SNAPPLE GATORADE . . . |
JOLT DR. PEPPER MOUNTAIN DEW . . . |

For more information, see the Multiple Response Tables section of Basic Statistics.

Multiple-Response Tables. *Multiple-response tables* are Crosstabulation tables used when the categories of interest are not mutually exclusive. Such tables can accommodate Multiple response variables as well as Multiple dichotomies.

For more information, see the Multiple Response Tables section of Basic Statistics.

Multiple Stream Group Charts. Variable and attribute control charts (see also Quality Control) can be computed for multiple-stream processes (e.g., operators, machines, assembly lines); the resulting *multiple stream group chart* summarizes the measurements for all streams simultaneously. These charts can also be produced for short production runs, and the measurements summarized in *short run group charts*. In addition to the standard parameters for determining the control limits and other characteristics of the control charts, the number of consecutive points *r* from the same process stream (i.e., "runs" of length *r*) to be highlighted in the chart can be specified.

Multiplicative Season, Damped Trend. In this Time Series model, the simple exponential smoothing forecasts are "enhanced" both by a damped trend component (independently smoothed with the single parameter ; this model is an extension of Brown's one-parameter linear model, see Gardner, 1985, p. 12-13) and a multiplicative seasonal component (smoothed with parameter ). For example, suppose we wanted to forecast from month to month the number of households that purchase a particular consumer electronics device (e.g., VCR). Every year, the number of households that purchase a VCR will increase, however, this trend will be damped (i.e., the upward trend will slowly disappear) over time as the market becomes saturated. In addition, there will be a seasonal component, reflecting the seasonal changes in consumer demand for VCR's from month to month (demand will likely be smaller in the summer and greater during the December holidays). This seasonal component may be multiplicative, for example, sales during the December holidays may increase by factor of 1.4 (or 40%) over the average annual sales. To compute the smoothed values for the first season, initial values for the seasonal components are necessary. Also, to compute the smoothed value (forecast) for the first observation in the series, both estimates of *S _{0}* and

*T*(initial trend) are necessary. These values are computed as:

_{0}T_{0} = (1/)*M_{k}-M_{1})/[(k-1)*p]

where

is the smoothing parameter

k is the number of complete seasonal cycles

M_{k} is the mean for the last seasonal cycle

M_{1} is the mean for the first seasonal cycle

p is the length of the seasonal cycle

and S_{0} = M_{1}-p*T_{0}/2

Multiplicative Season, Exponential Trend. In this Time Series model, the simple exponential smoothing forecasts are "enhanced" both by an exponential trend component (independently smoothed with parameter ) and a multiplicative seasonal component (smoothed with parameter ). For example, suppose we wanted to forecast the monthly revenue for a resort area. Every year, revenue may increase by a certain percentage or *factor*, resulting in an exponential trend in overall revenue. In addition, there could be an multiplicative seasonal component, that is, given the respective annual revenue, each year 20% of the revenue is produced during the month of December, that is, during Decembers the revenue grows by a particular (multiplicative) *factor*.

To compute the smoothed values for the first season, initial values for the seasonal components are necessary. Also, to compute the smoothed value (forecast) for the first observation in the series, both estimates of *S _{0}* and

*T*(initial trend) are necessary. By default, these values are computed as:

_{0}T_{0} = exp{[log(M_{2})-log(M_{1})]/p}

where

M_{2} is the mean for the second seasonal cycle

M_{1} is the mean for the first seasonal cycle

p is the length of the seasonal cycle

and S_{0} = exp{log(M_{1})-p*log(T_{0})/2}

Multiplicative Season, Linear Trend. In this Time Series model, the simple exponential smoothing forecasts are "enhanced" both by a linear trend component (independently smoothed with parameter ) and a multiplicative seasonal component (smoothed with parameter ). For example, suppose we were to predict the monthly budget for snow-removal in a community. There may be a trend component (as the community grows, there is an upward trend for the cost of snow removal from year to year). At the same time, there is obviously a seasonal component, reflecting the differential likelihood of snow during different months of the year. This seasonal component could be multiplicative, meaning that given a respective budget figure, it may increase by a *factor* of, for example, 1.4 during particular winter months; or it may be additive (see above), that is, a particular fixed additional amount of money is necessary during the winter months. To compute the smoothed values for the first season, initial values for the seasonal components are necessary. Also, to compute the smoothed value (forecast) for the first observation in the series, both estimates of *S _{0}* and

*T*(initial trend) are necessary. By default, these values are computed as:

_{0}T_{0} = (M_{k}-M_{1})/((k-1)*p)

where

k is the number of complete seasonal cycles

M_{k} is the mean for the last seasonal cycle

M_{1} is the mean for the first seasonal cycle

p is the length of the seasonal cycle

and S_{0} = M_{1} - T_{0}/2

Multiplicative Season, No Trend. This Time Series model is partially equivalent to the simple exponential smoothing model; however, in addition, each forecast is "enhanced" by a multiplicative component that is smoothed independently (see *The seasonal smoothing parameter * in Time Series Analysis). This model would, for example, be adequate when computing forecasts for monthly expected sales for a particular toy. The level of sales may be stable from year to year, or change only slowly; at the same time, there will be seasonal changes (e.g., greater sales during the December holidays), which again may change slowly from year to year. The seasonal changes may affect the sales in a multiplicative fashion, for example, depending on the respective overall level of sales, December sales may always be greater by a *factor* of 1.4.

Multivariate Adaptive Regression Splines (MARSplines). Multivariate adaptive regression splines (or MARSplines for short) is a nonparametric regression procedure which makes no assumption about the underlying functional relationship between the dependent and independent variables. Instead MARSplines constructs this relation from a set of coefficients and basis functions that are entirely "driven" from the regression data. The MARSplines technique has become particularly popular in the area of data mining, because it does not assume or impose any particular type or class of relationship (e.g., linear, logistic, and so on) between the predictor variables and the dependent (outcome) variable of interest.

The general MARSplines model equation (see Hastie et al., 2001, equation 9.19) is given as:

where the summation is over the M predictors in the model. To summarize, y is predicted as a function of the predictor variables X (and their interactions); this function consists of an intercept parameter ( and the weighted (by sum of one or more basis functions . You may also think of this model as "selecting" a weighted sum of basis functions from the set of (a large number of) basis functions that span all values of each predictor (i.e., that set would consist of one basis function, and "knot" parameter t, for each distinct value for each predictor variable): The MARSplines algorithm then searches over the space of all inputs and predictor values (knot locations t) as well as interactions between variables. During this search an increasingly larger number of basis functions are added to the model (selected from the set of possible basis functions), to maximize an overall least squares goodness-of-fit criterion.

For more information about this technique, and how it compares to other methods for nonlinear regression (or regression trees), see Hastie, Tishirani, and Friedman (2001).

Multivariate Statistical Process Control (MSPC). *Multivariate statistical process control *is a methodology for simultaneously monitoring multiple inputs or variables describing a process, for the purpose of ensuring that the overall process is in control. It is an extension of simple univariate (one variable at a time) quality control.

Modern automated production processes typically measure large numbers of variables that describe the process at each stage and across multiple stages. Standard quality control charting techniques (e.g., Shewhart charts, X-bar and R charts, etc.) are applicable only to single variables. Therefore, when applied to modern production processes with hundreds of important variables that need to be monitored, the criteria typically applied to univariate charts will lead to a large number of false alarms, and in many cases nearly constant, perpetual, alarms. Furthermore, this approach will ignore the inherent correlations between variables and, thus, lose important information (e.g., consider a single measure of temperature collected by one sensor drifting out of control, while 50 others stay within control, vs. a scenario where all 50 temperature readings begin to slowly drift upward; intuitively, the latter condition would be the more "significant" event).

To rectify these shortcomings, methods have been developed to monitor simultaneously multiple variables, using multivariate statistical procedures, such as Principal Component Analysis (PCA) and Partial Least Squares (PLS) methods. In short, these techniques will enable you to identify a) when multiple correlated variables start to drift out of control, and b) when the fundamental relationships between variables change (so that the correlations between variables observed when the process was known to be in control are no longer applicable and valid).

A special application of MSPC is commonly found in process monitoring and quality control for industrial batch processing. Batch processes are those where goods are manufactured in "chunks" or batches, such as beer, pharmaceuticals, chemicals, polymers, paint, fertilizers, cement, petroleum products, biochemicals, perfumes, or semiconductors. In those applications, one can define in-control ("good") batches; those batches can be characterized by particular maturing effects, as various measures systematically change over time (e.g., as the alcohol ferments). By building multivariate models (e.g., via *PLS*) describing the relationship of the various variables of interest to time (i.e., to the maturing process) for those good batches, quality control schemes can be derived to detect when a batch deviates from this known "good" multivariate pattern. For details regarding these procedures, see also Nomikos and MacGregor (1995).