### Glossary Index

###### Z

Kendall Tau. Kendall tau is equivalent to the Spearman R statistic with regard to the underlying assumptions. It is also comparable in terms of its statistical power. However, Spearman R and Kendall tau are usually not identical in magnitude because their underlying logic, as well as their computational formulas are very different. Siegel and Castellan (1988) express the relationship of the two measures in terms of the inequality:

-1 < = 3 * Kendall tau - 2 * Spearman R < = 1

More importantly, Kendall tau and Spearman R imply different interpretations: While Spearman R can be thought of as the regular Pearson product-moment correlation coefficient as computed from ranks, Kendall tau rather represents a probability. Specifically, it is the difference between the probability that the observed data are in the same order for the two variables versus the probability that the observed data are in different orders for the two variables. Kendall (1948, 1975), Everitt (1977), and Siegel and Castellan (1988) discuss Kendall tau in greater detail. Two different variants of tau are computed, usually called taub and tauc. These measures differ only with regard as to how tied ranks are handled. In most cases these values will be fairly similar, and when discrepancies occur, it is probably always safest to interpret the lowest value.

Kernel Functions. Simple functions (typically Gaussians) that are added together, positioned at known data points, to approximate a sampled distribution (Parzen, 1962). See also, Neural Networks.

k-Means Algorithm (in Neural Networks). The k-means algorithm (Moody and Darkin, 1989; Bishop, 1995) assigns radial centers to the first hidden layer in the network if it consists of radial units.

k-means assigns each training case to one of K clusters (where k is the number of radial units), such that each cluster is represented by the centroid of its cases, and each case is nearer to the centroid of its cluster than to the centroids of any other cluster. It is the centroids that are copied to the radial units. The intention is to discover a set of cluster centers which best represent the natural distribution of the training cases.

Technical Details. K-means is an iterative algorithm. The clusters are first formed arbitrarily by choosing the first K cases, assigning each subsequent case to the nearest of the K, then calculating the centroids of each cluster.

Subsequently, each case is tested to see whether the center of another cluster is closer than the center of its own cluster; if so, the case is reassigned. If cases are reassigned, the centroids are recalculated and the algorithm repeats.

Caution. There is no formal proof of convergence for this algorithm, although in practice it usually converges reasonably quickly.

k-Nearest Algorithm. An algorithm to assign deviations to radial units. Each deviation is the mean distance to the k-nearest neighbors of the point. See also, Neural Networks.

Kohonen Algorithm (in Neural Networks). The Kohonen algorithm (Kohonen, 1982; Patterson, 1996; Fausett, 1994) assigns centers to a radial hidden layer by attempting to recognize clusters within the training cases. Cluster centers close to one another in pattern-space tend to be assigned to units that are close to each other in the network (topologically ordered).

The Kohonen training algorithm is the algorithm of choice for Self Organizing Feature Map networks. It can also be used to train the radial layer in other network types; specifically, radial basis function, cluster, and generalized regression neural networks.

SOFM networks are typically arranged with the radial layer laid out in two dimensions. From an initially random set of centers, the algorithm tests each training case and selects the nearest center. This center and its neighbors are then updated to be more like the training case.

Over the course of the algorithm, the learning rate (which controls the degree of adaptation of the centers to the training cases) and the size of the neighborhood are gradually reduced.  In the early phases, therefore, the algorithm assigns a rough topological map, with similar clusters of cases located in certain areas of the radial layer. In later phases the topological map is fine-tuned, with individual units responding to small clusters of similar cases.

If the neighborhood is set to zero throughout, the algorithm is a simple cluster-assignment technique. It can also be used on a one-dimensional layer with or without neighborhood definition.

If class labels are available for the training cases, then after Kohonen training, labels can be assigned using class labeling algorithms and Learned Vector Quantization used to improve the positions of the radial exemplars.

Technical Details. The Kohonen update rule is:

x is the training case,

h(t) is the learning rate.

Kohonen Networks. Neural networks based on the topological properties of the human brain, also known as self-organizing feature maps (SOFMs) (Kohonen, 1982; Fausett, 1994; Haykin, 1994; Patterson, 1996).

Kohonen Training. An algorithm which assigns cluster centers to a radial layer by iteratively submitting training patterns to the network, and adjusting the winning (nearest) radial unit center, and its neighbors, towards the training pattern (Kohonen, 1982; Fausett, 1994; Haykin, 1994; Patterson, 1996). See also, Neural Networks.

Kolmogorov-Smirnov Test. The Kolmogorov-Smirnov one-sample test for normality is based on the maximum difference between the sample cumulative distribution and the hypothesized cumulative distribution. If the D statistic is significant, then the hypothesis that the respective distribution is normal should be rejected. For many software programs, the probability values that are reported are based on those tabulated by Massey (1951); those probability values are valid when the mean and standard deviation of the normal distribution are known a-priori and not estimated from the data. However, usually those parameters are computed from the actual data. In that case, the test for normality involves a complex conditional hypothesis ("how likely is it to obtain a D statistic of this magnitude or greater, contingent upon the mean and standard deviation computed from the data"), and the Lilliefors probabilities should be interpreted (Lilliefors, 1967). Note that in recent years, the Shapiro-Wilks' W test has become the preferred test of normality because of its good power properties as compared to a wide range of alternative tests.

Kronecker Product. The Kronecker (direct) product of 2 matrices A, with p rows and q columns, and B, with m rows and n columns, is the matrix with pm rows and qn columns given by

A Ä B = a ij B

For example, if

and

then

Kronecker Product matrices have a number of useful properties (for a summary of these properties, see Hocking, 1985).

Kruskal-Wallis Test. The Kruskal-Wallis test is a non-parametric alternative to one-way (between-groups) ANOVA. It is used to compare three or more samples, and it tests the null hypothesis that the different samples in the comparison were drawn from the same distribution or from distributions with the same median. Thus, the interpretation of the Kruskal-Wallis test is basically similar to that of the parametric one-way ANOVA, except that it is based on ranks rather than means. For more details, see Siegel & Castellan, 1988. See also, Nonparametric Statistics.

Kurtosis. Kurtosis (the term first used by Pearson, 1905) measures the "peakedness" of a distribution. If the kurtosis is clearly different than 0, then the distribution is either flatter or more peaked than normal; the kurtosis of the normal distribution is 0. Kurtosis is computed as:

Kurtosis = [n*(n+1)*M4 - 3*M2*M2*(n-1)]/[(n-1)*(n-2)*(n-3)*4]

where:
Mj     is equal to: (xi-Meanx)j
n       is the valid number of cases
4     is the standard deviation (sigma) raised to the fourth power