Electronic statistics textbook banner

Glossary Index


Variance. The variance (this term was first used by Fisher, 1918a) of a population of values is computed as:

2 = (xi-µ)2/N

µ    is the population mean
N   is the population size.
The unbiased sample estimate of the population variance is computed as:

s2 = (xi-xbar)2/n-1

xbar   is the sample mean
n        is the sample size.

See also, Descriptive Statistics.

Variance Components (in Mixed Model ANOVA) The term variance components is used in the context of experimental designs with random effects, to denote the estimate of the (amount of) variance that can be attributed to those effects. For example, if we were interested in the effect that the quality of different schools has on academic proficiency, we could select a sample of schools to estimate the amount of variance in academic proficiency (component of variance) that is attributable to differences between schools.

See also, Analysis of Variance and Variance Components and Mixed Model ANOVA/ANCOVA.

Variance Inflation Factor (VIF). The diagonal elements of the inverse correlation matrix (i.e., -1 times the diagonal elements of the sweep matrix) for variables that are in the equation are also sometimes called variance inflation factors (VIF; e.g., see Neter, Wasserman, Kutner, 1985). This terminology denotes the fact that the variances of the standardized regression coefficients can be computed as the product of the residual variance (for the correlation transformed model) times the respective diagonal elements of the inverse correlation matrix. If the predictor variables are uncorrelated, then the diagonal elements of the inverse correlation matrix are equal to 1.0; thus, for correlated predictors, these elements represent an "inflation factor" for the variance of the regression coefficients, due to the redundancy of the predictors.

See also, Multiple Regression.

V-fold Cross-validation. In v-fold cross-validation, repeated (v) random samples are drawn from the data for the analysis, and the respective model or prediction method, etc. is then applied to compute predicted values, classifications, etc. Typically, summary indices of the accuracy of the prediction are computed over the v replications; thus, this technique allows the analyst to evaluate the overall accuracy of the respective prediction model or method in repeatedly drawn random samples. This method is customarily used in tree classification and regression. 

Vintage Analysis. A vintage is a group of credit accounts that all originated within a specific time period, usually a year. Vintage analysis is used in credit scoring and refers to the process of monitoring groups of accounts and comparing performance across past groups. The comparisons take place at similar loan ages, allowing for the detection of deviation from past performance. Typically, a graphical representation is used for this purpose, such as one showing the relationship between months on the books and the percentage of delinquent accounts across multiple vintages.

Voronoi. The Voronoi tessellation graph plots values of two variables X and Y in a scatterplot, then divides the space between individual data points into regions such that the boundaries surrounding each data point enclose an area that is closer to that data point than to any other neighboring points.

Voronoi Scatterplot. This specialized univariate scatterplot is more an analytic technique than just a method to graphically present data. The solutions it offers, help to model a variety of phenomena in natural and social sciences (e.g., Coombs, 1964; Ripley, 1981). The program divides the space between the individual data points represented by XY coordinates in 2D space. The division is such that each of the data points is surrounded by boundaries including only the area that is closer to its respective "center" data point than to any other data point.

The particular ways in which this method is used depends largely on specific research areas, however, in many of them, it is helpful to add additional dimensions to this plot by using categorization options (as shown in the example below).

See also, Data Reduction.

Voting. See Bagging.