 ### Glossary Index

###### Z

F Distribution. The F distribution (for x > 0) has density function (for = 1, 2, ...; = 1, 2, ...):

 f(x) = { [( + )/2]}/[ ( /2) * ( /2)]*( / ) /2 * x( /2)-1 * {1+[( / )*x]}-( + )/2

0 x <  = 1, 2, ..., = 1, 2, ...

where , are the degrees of freedom (gamma) is the Gamma function. The animation above shows various tail areas (p-values) for an F distribution with both degrees of freedom equal to 10.

FACT. FACT is a classification tree program developed by Loh and Vanichestakul (1988) that is a precursor of the QUEST program. For discussion of the differences of FACT from other classification tree programs, see A Brief Comparison of Classification Tree Programs.

Factor Analysis. The main applications of factor analytic techniques are: (1) to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify variables. Therefore, factor analysis is applied as a data reduction or (exploratory) structure detection method (the term factor analysis was first introduced by Thurstone, 1931).

For example, suppose we want to measure people's satisfaction with their lives. We design a satisfaction questionnaire with various items; among other things we ask our subjects how satisfied they are with their hobbies (item 1) and how intensely they are pursuing a hobby (item 2). Most likely, the responses to the two items are highly correlated with each other. Given a high correlation between the two items, we can conclude that they are quite redundant.

One can summarize the correlation between two variables in a scatterplot. A regression line can then be fitted that represents the "best" summary of the linear relationship between the variables. If we could define a variable that would approximate the regression line in such a plot, then that variable would capture most of the "essence" of the two items. Subjects' single scores on that new factor, represented by the regression line, could then be used in future data analyses to represent that essence of the two items. In a sense we have reduced the two variables to one factor.

Factor Analysis is an exploratory method; for information in Confirmatory Factor Analysis, see Structural Equation Modeling. For more information on Factor Analysis, see Factor Analysis.

Feature Extraction (vs. Feature Selection). The terms feature extraction and feature selection are used in the context of predictive data mining, when the goal is to find a good predictive model for some phenomenon of interest based on a large number of predictors. While feature selection methods will attempt to identify the best predictors among the (sometimes thousands of) available predictors, feature extraction techniques attempt to aggregate or combine the predictors in some way to extract the common information contained in them that is most useful for building the model. Typical methods for feature extraction are Factor Analysis and Principal Components Analysis, Correspondence Analysis, Multidimensional Scaling, Partial Least Squares methods, or singular value decomposition, as, for example, used in text mining.

Feature Selection. One of preliminary stages in the process of a Data Mining applicable when the data set includes more variables than could be included (or would be efficient to include) in the actual model building phase (or even in initial exploratory operations).

Feedforward Networks. Neural networks with a distinct layered structure, with all connections feeding forwards from inputs towards outputs. Sometimes used as a synonym for multilayer perceptrons.

Fisher LSD. This post hoc test (or multiple comparison test) can be used to determine the significant differences between group means in an analysis of variance setting. The Fisher LSD test is considered to be one of the least conservative post hoc tests (for a detailed discussion of different post hoc tests, see Winer, Michels, & Brown (1991). For more details, see General Linear Models. See also, Post Hoc Comparisons. For a discussion of statistical significance, see Elementary Concepts.

Fixed Effects (in ANOVA). The term fixed effects in the context of analysis of variance is used to denote factors in an ANOVA design with levels that are deliberately arranged by the experimenter, rather than randomly sampled from an infinite population of possible levels (those factors are called random effects). For example, if one were interested in conducting an experiment to test the hypothesis that higher temperature leads to increased aggression, one would probably expose subjects to moderate or high temperatures and then measure subsequent aggression. Temperature would be a fixed effect in this experiment, because the levels of temperature of interest to the experimenter were deliberately set, or fixed, by the experimenter.

A simple criterion for deciding whether or not an effect in an experiment is random or fixed is to ask how one would select (or arrange) the levels for the respective factor in a replication of the study. For example, if one wanted to replicate the study described in this example, one would choose the same levels of temperature from the population of levels of temperature. Thus, the factor "temperature" in this study would be a fixed factor. If instead, one's interest is in how much of the variation of aggressiveness is due to temperature, one would probably expose subjects to a random sample of temperatures from the population of levels of different temperatures. Levels of temperature in the replication study would likely be different from the levels of temperature in the first study, thus temperature would be considered a random effect.

Free Parameter. A numerical value in a structural model (see Structural Equation Modeling) that is part of the model, but is not fixed at any particular value by the model hypothesis. Free parameters are estimated by the program using iterative methods. Free parameters are indicated in the PATH1 language with integers placed between dashes on an arrow or a wire. For example, the following paths both have the free parameter 14.

(F1)-14->[X1]

(e1)-14-(e1)

If two different coefficients have the same free parameter number, as in the above example, then both will of necessity be assigned the same numerical value. Simple equality constraints on numerical coefficients are thus imposed by assigning them the same free parameter number.

Frequency Tables (One-way Tables). Frequency or one-way tables represent the simplest method for analyzing categorical (nominal) data (see also Elementary Concepts). They are often used as one of the exploratory procedures to review how different categories of values are distributed in the sample. For example, in a survey of spectator interest in different sports, we could summarize the respondents' interest in watching football in a frequency table as follows:

STATISTICA
BASIC
STATS
FOOTBALL: "Watching football"
Category Count Cumulatv
Count
Percent Cumulatv
Percent
ALWAYS : Always interested
USUALLY : Usually interested
SOMETIMS: Sometimes interested
NEVER : Never interested
Missing
39
16
26
19
0
39
55
81
100
100
39.00000
16.00000
26.00000
19.00000
0.00000
39.0000
55.0000
81.0000
100.0000
100.0000

The table above shows the number, proportion, and cumulative proportion of respondents who characterized their interest in watching football as either (1) Always interested, (2) Usually interested, (3) Sometimes interested, or (4) Never interested. For more information, see the Frequency Tables section of Basic Statistics.

Function Minimization Algorithms. Algorithms used (e.g., in Nonlinear Estimation) to guide the search for the minimum of a function. For example, in the process of nonlinear estimation, the currently specified loss function is being minimized.

g2 Inverse. A g2 inverse is a generalized inverse of a rectangular matrix of values A that satisfies both

AA`A=A

and

A`AA`=A

The g2 inverse is used to find a solution to the normal equations in the general linear model; refer to General Linear Models for additional details.