Electronic statistics textbook banner

Statistical Advisor, Searching for Clusters or Natural Groups

Use the following to explore data and search for structure/patterns/factors/clusters.

Cluster Analysis

The Cluster Analysis chapter discusses techniques for performing tree clustering (joining) based on various distance measures and amalgamation (linkage) rules, k-means clustering, and two-way joining. A related procedure is discussed in the Classification Trees chapter.

In tree clustering or joining, 'objects' are linked together in successive steps, yielding a tree that ultimately joins all objects. The term 'objects' here refers to cases or variables; both can be analyzed with Cluster Analysis. The final tree diagram may reveal clear branches or groupings of objects that are more similar amongst themselves, than to objects not in the branch or grouping. Thus, some natural structure of objects may emerge from the joining analysis.

In k-means clustering, one specifies a priori how many clusters to expect. The program will then try to find the best division of objects into the requested number of clusters. Various statistics are provided to aid in the decision, whether an adequate clustering of objects was achieved, that is, whether the objects within each cluster are indeed more similar to each other than objects in different clusters.

In two-way joining, both cases and variables are clustered simultaneously. Specifically, the program will attempt to form clusters of similar data points (values).