### Glossary Index

###### 2

- 2D Bar/Column Plots
- 2D Box Plots
- 2D Box Plots - Box Whiskers
- 2D Box Plots - Boxes
- 2D Box Plots - Columns
- 2D Box Plots - Error Bars
- 2D Box Plots - Whiskers
- 2D Categorized Detrended Probability Plots
- 2D Categorized Half-Norm. Probability Plots
- 2D Categorized Normal Probability Plots
- 2D Detrended Probability Plots
- 2D Histograms
- 2D Histograms - Hanging Bars
- 2D Histograms - Double-Y
- 2D Line Plots
- 2D Line Plots - Aggregated
- 2D Line Plots - Double-Y
- 2D Line Plots - Multiple
- 2D Line Plots - Regular
- 2D Line Plots - XY Trace
- 2D Range Plots - Error Bars
- 2D Matrix Plots
- 2D Matrix Plots - Columns
- 2D Matrix Plots - Lines
- 2D Matrix Plots - Scatterplot
- 2D Normal Probability Plots
- 2D Probability-Probability Plots
- 2D Probability-Probability Plots-Categorized
- 2D Quantile-Quantile Plots
- 2D Quantile-Quantile Plots - Categorized
- 2D Scatterplot
- 2D Scatterplot - Categorized Ternary Graph
- 2D Scatterplot - Double-Y
- 2D Scatterplot - Frequency
- 2D Scatterplot - Multiple
- 2D Scatterplot - Regular
- 2D Scatterplot - Voronoi
- 2D Sequential/Stacked Plots
- 2D Sequential/Stacked Plots - Area
- 2D Sequential/Stacked Plots - Column
- 2D Sequential/Stacked Plots - Lines
- 2D Sequential/Stacked Plots - Mixed Line
- 2D Sequential/Stacked Plots - Mixed Step
- 2D Sequential/Stacked Plots - Step
- 2D Sequential/Stacked Plots - Step Area
- 2D Ternary Plots - Scatterplot

###### 3

- 3D Bivariate Histogram
- 3D Box Plots
- 3D Box Plots - Border-style Ranges
- 3D Box Plots - Double Ribbon Ranges
- 3D Box Plots - Error Bars
- 3D Box Plots - Flying Blocks
- 3D Box Plots - Flying Boxes
- 3D Box Plots - Points
- 3D Categorized Plots - Contour Plot
- 3D Categorized Plots - Deviation Plot
- 3D Categorized Plots - Scatterplot
- 3D Categorized Plots - Space Plot
- 3D Categorized Plots - Spectral Plot
- 3D Categorized Plots - Surface Plot
- 3D Deviation Plots
- 3D Range Plot - Error Bars
- 3D Raw Data Plots - Contour/Discrete
- 3D Scatterplots
- 3D Scatterplots - Ternary Graph
- 3D Space Plots
- 3D Ternary Plots
- 3D Ternary Plots - Categorized Scatterplot
- 3D Ternary Plots - Categorized Space
- 3D Ternary Plots - Categorized Surface
- 3D Ternary Plots - Categorized Trace
- 3D Ternary Plots - Contour/Areas
- 3D Ternary Plots - Contour/Lines
- 3D Ternary Plots - Deviation
- 3D Ternary Plots - Space
- 3D Trace Plots

###### A

- Aberration, Minimum
- Abrupt Permanent Impact
- Abrupt Temporary Impact
- Accept-Support Testing
- Accept Threshold
- Activation Function (in Neural Networks)
- Additive Models
- Additive Season, Damped Trend
- Additive Season, Exponential Trend
- Additive Season, Linear Trend
- Additive Season, No Trend
- Adjusted means
- Aggregation
- AID
- Akaike Information Criterion (AIC)
- Algorithm
- Alpha
- Anderson-Darling Test
- ANOVA
- Append a Network
- Append Cases and/or Variables
- Application Programming Interface (API)
- Arrow
- Assignable Causes and Actions
- Association Rules
- Asymmetrical Distribution
- AT&T Runs Rules
- Attribute (attribute variable)
- Augmented Product Moment Matrix
- Autoassociative Network
- Automatic Network Designer

###### B

- B Coefficients
- Back Propagation
- Bagging (Voting, Averaging)
- Balanced ANOVA Design
- Banner Tables
- Bar/Column Plots, 2D
- Bar Dev Plot
- Bar Left Y Plot
- Bar Right Y Plot
- Bar Top Plot
- Bar X Plot
- Bartlett Window
- Basis Functions
- Batch algorithms in
*STATISTICA Neural Net* - Bayesian Information Criterion (BIC)
- Bayesian Networks
- Bayesian Statistics
- Bernoulli Distribution
- Best Network Retention
- Best Subset Regression
- Beta Coefficients
- Beta Distribution
- Bimodal Distribution
- Binomial Distribution
- Bivariate Normal Distribution
- Blocking
- Bonferroni Adjustment
- Bonferroni Test
- Boosting
- Boundary Case
- Box Plot/Medians (Block Stats Graphs)
- Box Plot/Means (Block Stats Graphs)
- Box Plots, 2D
- Box Plots, 2D - Box Whiskers
- Box Plots, 2D - Boxes
- Box Plots, 2D - Whiskers
- Box Plots, 3D
- Box Plots, 3D - Border-Style Ranges
- Box Plots, 3D - Double Ribbon Ranges
- Box Plots, 3D - Error Bars
- Box Plots, 3D - Flying Blocks
- Box Plots, 3D - Flying Boxes
- Box Plots, 3D - Points
- Box-Ljung Q Statistic
- Breakdowns
- Breaking Down (Categorizing)
- Brown-Forsythe Homogeneity of Variances
- Brushing
- Burt Table

###### C

- Canonical Correlation
- Cartesian Coordinates
- Casewise Missing Data Deletion
- Categorical Dependent Variable
- Categorical Predictor
- Categorized Graphs
- Categorized Plots, 2D-Detrended Prob. Plots
- Categorized Plots, 2D-Half-Normal Prob. Plots
- Categorized Plots, 2D - Normal Prob. Plots
- Categorized Plots, 2D - Prob.-Prob. Plots
- Categorized Plots, 2D - Quantile Plots
- Categorized Plots, 3D - Contour Plot
- Categorized Plots, 3D - Deviation Plot
- Categorized Plots, 3D - Scatterplot
- Categorized Plots, 3D - Space Plot
- Categorized Plots, 3D - Spectral Plot
- Categorized Plots, 3D - Surface Plot
- Categorized 3D Scatterplot (Ternary graph)
- Categorized Contour/Areas (Ternary graph)
- Categorized Contour/Lines (Ternary graph)
- Categorizing
- Cauchy Distribution
- Cause-and-Effect Diagram
- Censoring (Censored Observations)
- Censoring, Left
- Censoring, Multiple
- Censoring, Right
- Censoring, Single
- Censoring, Type I
- Censoring, Type II
- CHAID
- Characteristic Life
- Chernoff Faces (Icon Plots)
*Chi*-square Distribution- Circumplex
- City-Block (Manhattan) Distance
- Classification
- Classification (in Neural Networks)
- Classification and Regression Trees
- Classification by Labeled Exemplars (in NN)
- Classification Statistics (in Neural Networks)
- Classification Thresholds (in Neural Networks)
- Classification Trees
- Class Labeling (in Neural Networks)
- Cluster Analysis
- Cluster Diagram (in Neural Networks)
- Cluster Networks (in Neural Networks)
- Coarse Coding
- Codes
- Coding Variable
- Coefficient of Determination
- Coefficient of Variation
- Column Sequential/Stacked Plot
- Columns (Box Plot)
- Columns (Icon Plot)
- Common Causes
- Communality
- Complex Numbers
- Conditional Probability
- Conditioning (Categorizing)
- Confidence Interval
- Confidence Interval for the Mean
- Confidence Interval vs. Prediction Interval
- Confidence Limits
- Confidence Value (Association Rules)
- Confusion Matrix (in Neural Networks)
- Conjugate Gradient Descent (in Neural Net)
- Continuous Dependent Variable
- Contour/Discrete Raw Data Plot
- Contour Plot
- Control, Quality
- Cook's Distance
- Correlation
- Correlation, Intraclass
- Correlation (Pearson r)
- Correlation Value (Association Rules)
- Correspondence Analysis
- Cox-Snell Gen. Coefficient Determination
- Cpk, Cp, Cr
- CRISP
- Cross Entropy (in Neural Networks)
- Cross Verification (in Neural Networks)
- Cross-Validation
- Crossed Factors
- Crosstabulations
- C-SVM Classification
- Cubic Spline Smoother
- "Curse" of Dimensionality

###### D

- Daniell (or Equal Weight) Window
- Data Mining
- Data Preparation Phase
- Data Reduction
- Data Rotation (in 3D space)
- Data Warehousing
- Decision Trees
- Degrees of Freedom
- Deleted Residual
- Denominator Synthesis
- Dependent t-test
- Dependent vs. Independent Variables
- Deployment
- Derivative-Free Funct. Min. Algorithms
- Design, Experimental
- Design Matrix
- Desirability Profiles
- Detrended Probability Plots
- Deviance
- Deviance Residuals
- Deviation
- Deviation Assign. Algorithms (in Neural Net)
- Deviation Plot (Ternary Graph)
- Deviation Plots, 3D
- DFFITS
- DIEHARD Suite of Tests & Randm. Num. Gen.
- Differencing (in Time Series)
- Dimensionality Reduction
- Discrepancy Function
- Discriminant Function Analysis
- Distribution Function
- DOE
- Document Frequency
- Double-Y Histograms
- Double-Y Line Plots
- Double-Y Scatterplot
- Drill-Down Analysis
- Drilling-down (Categorizing)
- Duncan's test
- Dunnett's test
- DV

###### E

- Effective Hypothesis Decomposition
- Efficient Score Statistic
- Eigenvalues
- Ellipse, Prediction Area and Range
- EM Clustering
- Endogenous Variable
- Ensembles (in Neural Networks)
- Enterprise Resource Planning (ERP)
- Enterprise SPC
- Enterprise-Wide Software Systems
- Entropy
- Epoch in (Neural Networks)
- Eps
- EPSEM Samples
- ERP
- Error Bars (2D Box Plots)
- Error Bars (2D Range Plots)
- Error Bars (3D Box Plots)
- Error Bars (3D Range Plots)
- Error Function (in Neural Networks)
- Estimable Functions
- Euclidean Distance
- Euler's e
- Exogenous Variable
- Experimental Design
- Explained Variance
- Exploratory Data Analysis
- Exponential Distribution
- Exponential Family of Distributions
- Exponential Function
- Exponentially Weighted Moving Avg. Line
- Extrapolation
- Extreme Values (in Box Plots)
- Extreme Value Distribution

###### F

- F Distribution
- FACT
- Factor Analysis
- Fast Analysis Shared Multidimensional Info. FASMI
- Feature Extraction (vs. Feature Selection)
- Feature Selection
- Feedforward Networks
- Fisher LSD
- Fixed Effects (in ANOVA)
- Free Parameter
- Frequencies, Marginal
- Frequency Scatterplot
- Frequency Tables
- Function Minimization Algorithms

###### G

- g2 Inverse
- Gains Chart
- Gamma Coefficient
- Gamma Distribution
- Gaussian Distribution
- Gauss-Newton Method
- General ANOVA/MANOVA
- General Linear Model
- Generalization (in Neural Networks)
- Generalized Additive Models
- Generalized Inverse
- Generalized Linear Model
- Genetic Algorithm
- Genetic Algorithm Input Selection
- Geometric Distribution
- Geometric Mean
- Gibbs Sampler
- Gini Measure of Node Impurity
- Gompertz Distribution
- Goodness of Fit
- Gradient
- Gradient Descent
- Gradual Permanent Impact
- Group Charts
- Grouping (Categorizing)
- Grouping Variable
- Groupware

###### H

- Half-Normal Probability Plots
- Half-Normal Probability Plots - Categorized
- Hamming Window
- Hanging Bars Histogram
- Harmonic Mean
- Hazard
- Hazard Rate
- Heuristic
- Heywood Case
- Hidden Layers (in Neural Networks)
- High-Low Close
- Histograms, 2D
- Histograms, 2D - Double-Y
- Histograms, 2D - Hanging Bars
- Histograms, 2D - Multiple
- Histograms, 2D - Regular
- Histograms, 3D Bivariate
- Histograms, 3D - Box Plots
- Histograms, 3D - Contour/Discrete
- Histograms, 3D - Contour Plot
- Histograms, 3D - Spikes
- Histograms, 3D - Surface Plot
- Hollander-Proschan Test
- Hooke-Jeeves Pattern Moves
- Hosmer-Lemeshow Test
- HTM
- HTML
- Hyperbolic Tangent (tanh)
- Hyperplane
- Hypersphere

###### I

- Icon Plots
- Icon Plots - Chernoff Faces
- Icon Plots - Columns
- Icon Plots - Lines
- Icon Plots - Pies
- Icon Plots - Polygons
- Icon Plots - Profiles
- Icon Plots - Stars
- Icon Plots - Sun Rays
- Increment vs Non-Increment Learning Algr.
- Independent Events
- Independent t-test
- Independent vs. Dependent Variables
- Industrial Experimental Design
- Inertia
- Inlier
- In-Place Database Processing (IDP)
- Interactions
- Interpolation
- Interval Scale
- Intraclass Correlation Coefficient
- Invariance Const. Scale Factor ICSF
- Invariance Under Change of Scale (ICS)
- Inverse Document Frequency
- Ishikawa Chart
- Isotropic Deviation Assignment
- Item and Reliability Analysis
- IV

###### J

###### K

###### L

- Lack of Fit
- Lambda Prime
- Laplace Distribution
- Latent Semantic Indexing
- Latent Variable
- Layered Compression
- Learned Vector Quantization (in Neural Net)
- Learning Rate (in Neural Networks)
- Least Squares (2D graphs)
- Least Squares (3D graphs)
- Least Squares Estimator
- Least Squares Means
- Left and Right Censoring
- Levenberg-Marquardt Algorithm (in Neural Net)
- Levene's Test for Homogeneity of Variances
- Leverage values
- Life Table
- Life, Characteristic
- Lift Charts
- Likelihood
- Lilliefors test
- Line Plots, 2D
- Line Plots, 2D - Aggregated
- Line Plots, 2D (Case Profiles)
- Line Plots, 2D - Double-Y
- Line Plots, 2D - Multiple
- Line Plots, 2D - Regular
- Line Plots, 2D - XY Trace
- Linear (2D graphs)
- Linear (3D graphs)
- Linear Activation function
- Linear Modeling
- Linear Units
- Lines (Icon Plot)
- Lines (Matrix Plot)
- Lines Sequential/Stacked Plot
- Link Function
- Local Minima
- Locally Weighted (Robust) Regression
- Logarithmic Function
- Logistic Distribution
- Logistic Function
- Logit Regression and Transformation
- Log-Linear Analysis
- Log-Normal Distribution
- Lookahead (in Neural Networks)
- Loss Function
- LOWESS Smoothing

###### M

- Machine Learning
- Mahalanobis Distance
- Mallow's CP
- Manifest Variable
- Mann-Scheuer-Fertig Test
- MANOVA
- Marginal Frequencies
- Markov Chain Monte Carlo (MCMC)
- Mass
- Matching Moments Method
- Matrix Collinearity
- Matrix Ill-Conditioning
- Matrix Inverse
- Matrix Plots
- Matrix Plots - Columns
- Matrix Plots - Lines
- Matrix Plots - Scatterplot
- Matrix Rank
- Matrix Singularity
- Maximum Likelihood Loss Function
- Maximum Likelihood Method
- Maximum Unconfounding
- MD (Missing data)
- Mean
- Mean/S.D. Algorithm (in Neural Networks)
- Mean, Geometric
- Mean, Harmonic
- Mean Substitution of Missing Data
- Means, Adjusted
- Means, Unweighted
- Median
- Meta-Learning
- Method of Matching Moments
- Minimax
- Minimum Aberration
- Mining, Data
- Missing values
- Mixed Line Sequential/Stacked Plot
- Mixed Step Sequential/Stacked Plot
- Mode
- Model Profiles (in Neural Networks)
- Models for Data Mining
- Monte Carlo
- Multi-Pattern Bar
- Multicollinearity
- Multidimensional Scaling
- Multilayer Perceptrons
- Multimodal Distribution
- Multinomial Distribution
- Multinomial Logit and Probit Regression
- Multiple Axes in Graphs
- Multiple Censoring
- Multiple Dichotomies
- Multiple Histogram
- Multiple Line Plots
- Multiple Scatterplot
- Multiple R
- Multiple Regression
- Multiple Response Variables
- Multiple-Response Tables
- Multiple Stream Group Charts
- Multiplicative Season, Damped Trend
- Multiplicative Season, Exponential Trend
- Multiplicative Season, Linear Trend
- Multiplicative Season, No Trend
- Multivar. Adapt. Regres. Splines MARSplines
- Multi-way Tables

###### N

- Nagelkerke Gen. Coefficient Determination
- Naive Bayes
- Neat Scaling of Intervals
- Negative Correlation
- Negative Exponential (2D graphs)
- Negative Exponential (3D graphs)
- Neighborhood (in Neural Networks)
- Nested Factors
- Nested Sequence of Models
- Neural Networks
- Neuron
- Newman-Keuls Test
- N-in-One Encoding
- Noise Addition (in Neural Networks)
- Nominal Scale
- Nominal Variables
- Nonlinear Estimation
- Nonparametrics
- Non-Outlier Range
- Nonseasonal, Damped Trend
- Nonseasonal, Exponential Trend
- Nonseasonal, Linear Trend
- Nonseasonal, No Trend
- Normal Distribution
- Normal Distribution, Bivariate
- Normal Fit
- Normality Tests
- Normalization
- Normal Probability Plots
- Normal Probability Plots (Computation Note)
- n Point Moving Average Line

###### O

- ODBC
- Odds Ratio
- OLE DB
- On-Line Analytic Processing (OLAP)
- One-Off (in Neural Networks)
- One-of-N Encoding (in Neural Networks)
- One-Sample t-Test
- One-Sided Ranges Error Bars Range Plots
- One-Way Tables
- Operating Characteristic Curves
- Ordinal Multinomial Distribution
- Ordinal Scale
- Outer Arrays
- Outliers
- Outliers (in Box Plots)
- Overdispersion
- Overfitting
- Overlearning (in Neural Networks)
- Overparameterized Model

###### P

- Pairwise Del. Missing Data vs Mean Subst.
- Pairwise MD Deletion
- Parametric Curve
- Pareto Chart Analysis
- Pareto Distribution
- Part Correlation
- Partial Correlation
- Partial Least Squares Regression
- Partial Residuals
- Parzen Window
- Pearson Correlation
- Pearson Curves
- Pearson Residuals
- Penalty Functions
- Percentiles
- Perceptrons (in Neural Networks)
- Pie Chart
- Pie Chart - Counts
- Pie Chart - Multi-Pattern Bar
- Pie Chart - Values
- Pies (Icon Plots)
- PMML (Predictive Model Markup Language)
- PNG Files
- Poisson Distribution
- Polar Coordinates
- Polygons (Icon Plots)
- Polynomial
- Population Stability Report
- Portable Network Graphics Files
- Positive Correlation
- Post hoc Comparisons
- Post Synaptic Potential (PSP) Function
- Posterior Probability
- Power (Statistical)
- Power Goal
- Ppk, Pp, Pr
- Prediction Interval Ellipse
- Prediction Profiles
- Predictive Data Mining
- Predictive Mapping
- Predictive Model Markup Language (PMML)
- Predictors
- PRESS Statistic
- Principal Components Analysis
- Prior Probabilities
- Probability
- Probability Plots - Detrended
- Probability Plots - Normal
- Probability Plots - Half-Normal
- Probability-Probability Plots
- Probability-Probability Plots - Categorized
- Probability Sampling
- Probit Regression and Transformation
- PROCEED
- Process Analysis
- Process Capability Indices
- Process Performance Indices
- Profiles, Desirability
- Profiles, Prediction
- Profiles (Icon Plots)
- Pruning (in Classification Trees)
- Pseudo-Components
- Pseudo-Inverse Algorithm
- Pseudo-Inverse-Singular Val. Decomp. NN
- PSP (Post Synaptic Potential) Function
- Pure Error
- p-Value (Statistical Significance)

###### Q

###### R

- R Programming Language
- Radial Basis Functions
- Radial Sampling (in Neural Networks)
- Random Effects (in Mixed Model ANOVA)
- Random Forests
- Random Num. from Arbitrary Distributions
- Random Numbers (Uniform)
- Random Sub-Sampling in Data Mining
- Range Ellipse
- Range Plots - Boxes
- Range Plots - Columns
- Range Plots - Whiskers
- Rank
- Rank Correlation
- Ratio Scale
- Raw Data, 3D Scatterplot
- Raw Data Plots, 3D - Contour/Discrete
- Raw Data Plots, 3D - Spikes
- Raw Data Plots, 3D - Surface Plot
- Rayleigh Distribution
- Receiver Oper. Characteristic Curve
- Receiver Oper. Characteristic (in Neural Net)
- Rectangular Distribution
- Regression
- Regression (in Neural Networks)
- Regression, Multiple
- Regression Summary Statistics (in Neural Net)
- Regular Histogram
- Regular Line Plots
- Regular Scatterplot
- Regularization (in Neural Networks)
- Reject Inference
- Reject Threshold
- Relative Function Change Criterion
- Reliability
- Reliability and Item Analysis
- Representative Sample
- Resampling (in Neural Networks)
- Residual
- Resolution
- Response Surface
- Right Censoring
- RMS (Root Mean Squared) Error
- Robust Locally Weighted Regression
- ROC Curve
- ROC Curve (in Neural Networks)
- Root Cause Analysis
- Root Mean Square Stand. Effect RMSSE
- Rosenbrock Pattern Search
- Rotating Coordinates, Method of
- r (Pearson Correlation Coefficient)
- Runs Tests (in Quality Control)

###### S

- Sampling Fraction
- Scalable Software Systems
- Scaling
- Scatterplot, 2D
- Scatterplot, 2D-Categorized Ternary Graph
- Scatterplot, 2D - Double-Y
- Scatterplot, 2D - Frequency
- Scatterplot, 2D - Multiple
- Scatterplot, 2D - Regular
- Scatterplot, 2D - Voronoi
- Scatterplot, 3D
- Scatterplot, 3D - Raw Data
- Scatterplot, 3D - Ternary Graph
- Scatterplot Smoothers
- Scheffe's Test
- Score Statistic
- Scree Plot, Scree Test
- S.D. Ratio
- Semi-Partial Correlation
- SEMMA
- Sensitivity Analysis (in Neural Networks)
- Sequential Contour Plot, 3D
- Sequential/Stacked Plots, 2D
- Sequential/Stacked Plots, 2D - Area
- Sequential/Stacked Plots, 2D - Column
- Sequential/Stacked Plots, 2D - Lines
- Sequential/Stacked Plots, 2D - Mixed Line
- Sequential/Stacked Plots, 2D - Mixed Step
- Sequential/Stacked Plots, 2D - Step
- Sequential/Stacked Plots, 2D - Step Area
- Sequential Surface Plot, 3D
- Sets of Samples in Quality Control Charts
- Shapiro-Wilks' W test
- Shewhart Control Charts
- Short Run Control Charts
- Shuffle, Back Propagation (in Neural Net)
- Shuffle Data (in Neural Networks)
- Sigma Restricted Model
- Sigmoid Function
- Signal Detection Theory
- Simple Random Sampling (SRS)
- Simplex Algorithm
- Single and Multiple Censoring
- Singular Value Decomposition
- Six Sigma (DMAIC)
- Six Sigma Process
- Skewness
- Slicing (Categorizing)
- Smoothing
- SOFMs Self-Organizing Maps Kohonen Net
- Softmax
- Space Plots 3D
- SPC
- Spearman R
- Special Causes
- Spectral Plot
- Spikes (3D graphs)
- Spinning Data (in 3D space)
- Spline (2D graphs)
- Spline (3D graphs)
- Split Selection (for Classification Trees)
- Splitting (Categorizing)
- Spurious Correlations
- SQL
- Square Root of the Signal to Noise Ratio (f)
- Stacked Generalization
- Stacking (Stacked Generalization)
- Standard Deviation
- Standard Error
- Standard Error of the Mean
- Standard Error of the Proportion
- Standardization
- Standardized DFFITS
- Standardized Effect (Es)
- Standard Residual Value
- Stars (Icon Plots)
- Stationary Series (in Time Series)
- STATISTICA Advanced Linear/Nonlinear
- STATISTICA Automated Neural Networks
- STATISTICA Base
- STATISTICA Data Miner
- STATISTICA Data Warehouse
- STATISTICA Document Management System
- STATISTICA Enterprise
- STATISTICA Enterprise/QC
- STATISTICA Enterprise Server
- STATISTICA Enterprise SPC
- STATISTICA Monitoring and Alerting Server
- STATISTICA MultiStream
- STATISTICA Multivariate Stat. Process Ctrl
- STATISTICA PI Connector
- STATISTICA PowerSolutions
- STATISTICA Process Optimization
- STATISTICA Quality Control Charts
- STATISTICA Sequence Assoc. Link Analysis
- STATISTICA Text Miner
- STATISTICA Variance Estimation Precision
- Statistical Power
- Statistical Process Control (SPC)
- Statistical Significance (p-value)
- Steepest Descent Iterations
- Stemming
- Steps
- Stepwise Regression
- Stiffness Parameter (in Fitting Options)
- Stopping Conditions
- Stopping Conditions (in Neural Networks)
- Stopping Rule (in Classification Trees)
- Stratified Random Sampling
- Stub and Banner Tables
- Studentized Deleted Residuals
- Studentized Residuals
- Student's t Distribution
- Sum-Squared Error Function
- Sums of Squares (Type I, II, III (IV, V, VI))
- Sun Rays (Icon Plots)
- Supervised Learning (in Neural Networks)
- Support Value (Association Rules)
- Support Vector
- Support Vector Machine (SVM)
- Suppressor Variable
- Surface Plot (from Raw Data)
- Survival Analysis
- Survivorship Function
- Sweeping
- Symmetrical Distribution
- Symmetric Matrix
- Synaptic Functions (in Neural Networks)

###### T

- Tables
- Tapering
- t Distribution (Student's)
- Tau, Kendall
- Ternary Plots, 2D - Scatterplot
- Ternary Plots, 3D
- Ternary Plots, 3D - Categorized Scatterplot
- Ternary Plots, 3D - Categorized Space
- Ternary Plots, 3D - Categorized Surface
- Ternary Plots, 3D - Categorized Trace
- Ternary Plots, 3D - Contour/Areas
- Ternary Plots, 3D - Contour/Lines
- Ternary Plots, 3D - Deviation
- Ternary Plots, 3D - Space
- Text Mining
- THAID
- Threshold
- Time Series
- Time Series (in Neural Networks)
- Time-Dependent Covariates
- Tolerance (in Multiple Regression)
- Topological Map
- Trace Plots, 3D
- Trace Plot, Categorized (Ternary Graph)
- Training/Test Error/Classification Accuracy
- Transformation (Probit Regression)
- Trellis Graphs
- Trimmed Means
- t-Test (independent & dependent samples)
- Tukey HSD
- Tukey Window
- Two-State (in Neural Networks)
- Type I, II, III (IV, V, VI) Sums of Squares
- Type I Censoring
- Type II Censoring
- Type I Error Rate

###### U

###### V

###### W

###### X

###### Y

###### Z

g2 Inverse. A *g2 inverse *is a generalized inverse of a rectangular matrix of values *A* that satisfies both:

A^{-}AA = A and A^{-}AA^{-} = A^{-}

Gains Chart. The gains chart provides a visual summary of the usefulness of the information provided by one or more statistical models for predicting a binomial (categorical) outcome variable (dependent variable); for multinomial (multiple-category) outcome variables, gains charts can be computed for each category. Specifically, the chart summarizes the utility that one can expect by using the respective predictive models, as compared to using baseline information only.

The gains chart is applicable to most statistical methods that compute predictions (predicted classifications) for binomial or multinomial responses. This and similar summary charts (see Lift Chart) are commonly used in data mining projects when the dependent or outcome variable of interest is binomial or multinomial in nature.

Example. To illustrate how the gains chart is constructed, consider this example. Suppose you have a mailing list of previous customers of your business, and you want to offer to those customers an additional service by mailing an elaborate brochure and other materials describing the service. During previous similar mail-out campaigns, you collected useful information about your customers (e.g., demographic information, previous purchasing patterns) that you could relate to the response rate, i.e., whether the respective customers responded to your mail solicitation and the type of order they placed.

Given the baseline response rate and the cost of the mail-out, sending the offer to all customers would result in a net-loss. Hence, you want to use statistical analyses to help you identify the customers who are most likely to respond. Suppose you build such a model based on the data collected in the previous mail-out campaign. You can now select only the 10 percent of the customers from the mailing lists who, according to prediction from the model, are most likely to respond. Next you can compute the number of accurately predicted responses, relative to the total number of responses in the sample; this percentage is the gain due to using the model. Put another way, of those customers likely to respond in the current sample, you can accurately identify ("capture") y percent by selecting from the customer list the top 10% who were predicted by the model with the greatest certainty to respond (where y is the gains value).

Analogous values can be computed for each percentile of the population (customers on the mailing list). You could compute separate gains values for selecting the top 20% of customers who are predicted to be among likely responders to the mail campaign, the top 30%, etc. Hence, the gains values for different percentiles can be connected by a line that will typically ascend slowly and merge with the baseline if all customers (100%) were selected.

If more than one predictive model is used, multiple gains charts can be overlaid (as shown in the illustration above) to provide a graphical summary of the utility of different models.

Gamma Coefficient. The *Gamma* statistic is preferable to Spearman *R* or Kendall *tau* when the data contain many tied observations. In terms of the underlying assumptions, *Gamma* is equivalent to Spearman *R* or Kendall *tau*; in terms of its interpretation and computation, it is more similar to Kendall *tau* than Spearman *R*. In short, *Gamma* is also a *probability*; specifically, it is computed as the difference between the probability that the rank ordering of the two variables agree minus the probability that they disagree, divided by 1 minus the probability of ties. Thus, *Gamma* is basically equivalent to Kendall *tau*, except that ties are explicitly taken into account. Detailed discussions of the *Gamma* statistic can be found in Goodman and Kruskal (1954, 1959, 1963, 1972), Siegel (1956), and Siegel and Castellan (1988).

Gamma Distribution. The Gamma distribution (the term first used by Weatherburn, 1946) is defined as:

f(x) = (x/b)^{c-1} * e^{(-x/b)} * [1/b (c)]

0 x, b > 0, c > 0

where

(*gamma*) is the *Gamma* function

b is the scale parameter

a is the so-called shape parameter

e is the base of the natural logarithm, sometimes called Euler's e (2.71...)

The animation above shows the *gamma* distribution as the shape parameter changes from 1 to 6.

Gaussian Distribution. The normal distribution - a bell-shaped function.

Gauss-Newton Method. The *Gauss-Newton method* is a class of methods for solving nonlinear least-squares problems. In general, this method makes use of the Jacobian matrix J of first-order derivatives of a function F to find the vector of parameter values x that minimizes the residual sums of squares (sum of squared deviations of predicted values from observed values). An improved and efficient version of the method is the so-called Levenberg-Marquardt algorithm. For a detailed discussion of this class of methods, see Dennis & Schnabel (1983).

General ANOVA/MANOVA. The purpose of *analysis of variance* (*ANOVA*) is to test for significant differences between means by comparing (i.e., analyzing) variances. More specifically, by partitioning the total variation into different sources (associated with the different effects in the design), we are able to compare the variance due to the between-groups (or treatments) variability with that due to the within-group (treatment) variability. Under the null hypothesis (that there are no mean differences between groups or treatments in the population), the variance estimated from the within-group (treatment) variability should be about the same as the variance estimated from between-groups (treatments) variability. For more information, see ANOVA/MANOVA.

General Linear Model. The *general linear model* is a generalization of the *linear regression model*, such that effects can be tested (1) for *categorical predictor variables* as well as for effects for continuous predictor variables and (2) in designs with multiple dependent variables as well as in designs with a single dependent variable. For an overview of the *general linear model, *see the *General Linear Models* overview.

Generalization (in Neural Networks). The ability of a neural network to make accurate predictions when faced with data not drawn from the original training set (but drawn from the same source as the training set).

Generalized Additive Models. *Generalized Additive Models* are generalizations of generalized linear models. In generalized linear models, the transformed dependent variable values are predicted from (is linked to) a linear combination of predictor variables; the transformation is referred to as the link function; also, different distributions can be assumed for the dependent variable values. An example of a generalized linear model is the Logit Regression model, where the dependent variable is assumed to be binomial, and the link function is the logit transformation. In *generalized additive models*, the linear function of the predictor values is replaced by an unspecified (non-parametric) function, obtained by applying a scatterplot smoother to the scatterplot of partial residuals (for the transformed dependent variable values). See also, Hastie and Tibshirani, 1990, or Schimek, 2000.

Generalized Inverse. A *generalized inverse *(denoted by a superscript of -) of a rectangular matrix of values *A* is any matrix that satisfies

A^{-}AA=A

A *generalized inverse* of a *nonsingular matrix* is unique and is called the regular *matrix inverse*. See also, matrix singularity, matrix inverse.

Generalized Linear Model. The *generalized linear model* is a generalization of the linear regression model such that (1) nonlinear, as well as linear, effects can be tested (2) for *categorical predictor variables* as well as for continuous predictor variables, using (3) any dependent variable whose distribution follows several special members of the exponential family of distributions (e.g., gamma, Poisson, binomial, etc.), as well as for any normally-distributed dependent variable. For an overview of the generalized linear model, see *Generalized Linear Models*.

Genetic Algorithm. A search algorithm which locates optimal binary strings by processing an initially random population of strings using artificial mutation, crossover and selection operators, in an analogy with the process of natural selection (Goldberg, 1989). See also, Neural Networks.

Genetic Algorithm Input Selection. Application of a genetic algorithm to determine an "optimal" set of input variables, by constructing binary masks which indicate which inputs to retain and which to discard (Goldberg, 1989). This method is implemented in *STATISTICA Neural Networks* and can be used as part of a model building process where variables identified as the most "relevant" (in *STATISTICA Neural Networks*) are then used in a traditional model building stage of the analysis (e.g., using a linear regression or nonlinear estimation method).

Geometric Distribution. The geometric distribution (the term first used by Feller, 1950) is defined as:

f(x) = p*(1-p)^{x}

where

p is the probability that a particular event (e.g., success) will occur

Geometric Mean. The *Geometric Mean* is a "summary" statistic useful when the measurement scale is not linear; it is computed as:

G = (x_{1}*x_{2}*...*x_{n})^{1/n}

where

*n* is the sample size.

Gibbs Sampler. The Gibbs sampler is a popular method used for MCMC (Markov chain Monte Carlo) analyses. It provides an elegant way for sampling from the joint distributions of multiple variables, by applying the notion that: to sample from a joint distribution just sample repeatedly from its one-dimensional conditionals given whatever you've seen at the time.

For example, the values from the joint distribution of two random variables, X and Y, can be easily simulated by the Gibbs sampler that uses their conditional distributions rather than their joint distribution. Starting with an arbitrary choice of X and Y, X is simulated from the conditional distribution of X, given Y, and Y is simulated from conditional distribution of Y, given X. Alternating between two conditional distributions, in the subsequent steps, generates a sample from the correct joint distribution of X and Y; the approximation gets better and better as the length of the Gibbs sampler path increases.

Gini Measure of Node Impurity. According to Breiman, Friedman, Olshen, & Stone (1984), the Gini measure of node impurity at node (which *STATISTICA* uses by default in *GC&RT* and, therefore, Boosted Trees) is defined to be (pp. 28 & 38)

where

and

such that

*p* ( *j* | *t* ) is the estimated probability that an observation belongs to group *j* given that it is in node *t*,

*p* ( *j* , *t* ) is the estimated probability that an observation is in group *j* and at node *t*,

*p* ( *t* ) is the estimated probability that an observation is at node *t*, ,

is the prior probability for group *j*,

*N* *j* ( *t* ) is the number of group *j* members at node *t*,

and *N* _{j} is the size of group *j*.

Therefore, the prior probabilities play a role in every Gini Measure computation at every node. However, Breiman et al. also note that, when the prior probabilities are estimated from the data,

This fact can cause higher misclassification rates in under-represented groups.

Gompertz Distribution. The Gompertz distribution is a theoretical distribution of survival times. Gompertz (1825) proposed a probability model for human mortality, based on the assumption that the "average exhaustion of a man's power to avoid death to be such that at the end of equal infinitely small intervals of time he lost equal portions of his remaining power to oppose destruction which he had at the commencement of these intervals" (Johnson, Kotz, Blakrishnan, 1995, p. 25). The resultant hazard function:

**r(x)=Bc ^{x}, for x £ 0, B > 0, c £ 1**

is often used in survival analysis. See Johnson, Kotz, Blakrishnan (1995) for additional details.

Goodness of Fit. Various goodness-of-fit summary statistics can be computed for continuous and categorical dependent variables. Most of these statistics are discussed in greater detail in Witten and Frank (2000); in the context of forecasting; different statistics are also discussed in Makridakis and Wheelwright (1983). Goodness of fit statistics for regression problems (for continuous variables) include:

- Least squares deviation (LSD), mean square error
- Average deviation, mean absolute error
- Relative squared error, mean relative squared error
- Correlation coefficient (Pearson product moment correlation)

Goodness of fit statistics for classification problems (for categorical variables) include:

- Pearson
*Chi*-square - G-square (maximum likelihood
*Chi*-square) - Percent disagreement (misclassification rate)

Gradient. In Structural Equation Modeling the gradient is the vector of first partial derivatives of the discrepancy function with respect to the parameter values. At a local or global minimum, the discrepancy function should be at the bottom of a "valley," where all first partial derivatives are zero, so the elements of the gradient should all be near to zero when a minimum is obtained.

The elements of the gradient, by themselves, can, on occasion, be somewhat unreliable as indicators of when convergence has occurred, especially when the model fit is not good, and the discrepancy function value itself is quite large. For this reason, the gradient is not employed as a convergence criterion by this program.

Gradient Descent. Optimization techniques for non-linear functions (e.g. the error function of a neural network as the weights are varied) which attempt to move incrementally to successively lower points in search space, in order to locate a minimum.

Gradual Permanent Impact. In Time Series, the gradual permanent impact pattern implies that the increase or decrease due to the intervention is gradual, and that the final permanent impact becomes evident only after some time. This type of intervention can be summarized by the expression:

Impact _{t} = * Impact _{t-1} +

(for all t time of impact, else = 0).

Note that this impact pattern is defined by the two parameters (*delta*) and (*omega*). If is near 0 (zero), then the final permanent amount of impact will be evident after only a few more observations; if is close to 1, then the final permanent amount of impact will only be evident after many more observations. As long as the d parameter is greater than 0 and less than 1 (the bounds of system stability), the impact will be gradual and result in an asymptotic change (shift) in the overall mean by the quantity:

Asymptotic change in level = /(1-)

Group Charts. See Multiple Stream Group Charts.

Grouping (or Coding) Variable. A grouping (or coding) variable is used to identify group membership for individual cases in the data file. Typically, the grouping variable is categorical (i.e., contains either discrete values, e.g., *1, 2, 3*, ...,

Group | Score 1 | Score 2 |
---|---|---|

1 3 2 2 |
383.5 726.4 843.7 729.9 |
4568.4 6752.3 5384.7 6216.9 |

or a few text values, e.g.,

*MALE, FEMALE*)

Group | Score 1 | Score 2 |
---|---|---|

MALE FEMALE FEMALE MALE |
383.5 726.4 843.7 729.9 |
4568.4 6752.3 5384.7 6216.9 |

and the values are referred to as codes (they can be integer values or integer values with text value equivalents).

Groupware. Software intended to enable a group of users on a network to collaborate on specific projects. Groupware can provide services for communication (such as e-mail), collaborative document development, analysis, reporting, statistical data analysis, scheduling, or tracking. Documents can include text, images, or any other forms of information (e.g., multimedia).