14.7.2. Clusterization methods

<< Prev Next >>

A clustering method option determines, based on what principles an object is attributed to one group or another, according to which algorithm generation of clusters is performed.

The goal of any clustering algorithm is to:

Minimize the variability within clusters
Maximize the variability between clusters

Differences between the methods will be considered on the objects presented in the figure (see fig. 472).

Assume that the objects form two groups. The first one consists of objects 1, 2 and 3. The second group consists of objects 4, 5 and 6.

Fig. 472. Object groups

14.7.2.1. Nearest neighbor

The clustering method where the object joins the group for which the distance to the nearest object is minimal.

In this example, object 7 will be included in the group in which object 4 is located. The closest objects of the two groups are objects 4 and 3. The distance to object 4 is minimal.

14.7.2.2. Furthest neighbor

The clustering method where the object joins the group for which the distance to the furthest object is minimal.

In this example, object 7 will be included in the group in which object 5 is located. The closest objects of the two groups are objects 1 and 5. The distance to object 5 is less.

14.7.2.3. Centroid

The clustering method where the object joins the group for which the distance to the center of gravity is minimal:

Fig. 473. Object groups

In the example in the figure, object 7 will be added to the group containing objects 4, 5 and 6. The distance to the center of gravity (of some mythical object with average values of attributes) is minimal.

14.7.2.4. K-means

In this method, the objects that are first in the selection are selected. They are considered cluster centers. Next, the next object is selected and, in accordance with the distance to the centers of the clusters, it is attributed to a particular cluster. The center of the cluster which the object was added to is recalculated.

The procedure is repeated until the exhaustive search of all the objects. Then a new selection of objects is made again (starting with the first one). The procedure is repeated as long as the centers of the clusters change:

Fig. 474. Objects location example

Suppose that objects 1 and 2 are arbitrarily chosen as centers of clusters. Object 3 is added to a cluster which center is object 1. The center of the first cluster is recalculated (it is located between object 1 and 3). Object 4 is added to the second cluster (its center is also recalculated).

After iterating through all the analyzed objects, objects 1 and 3 belong to the first cluster, and other objects belong to the second cluster (its center might be located in the center of a triangle of objects 4, 7, 6).

Then again, objects are selected and distributed among clusters (relatively constantly calculated cluster centers).

Somewhere on the third sample of objects, most likely, object 2, which was originally the center of the second cluster, will be attributed to the first cluster.

At the end of the algorithm, objects 1, 2, 3 will be assigned to the first cluster. Objects 4, 5, 6, 7 will be assigned to the second cluster.

14.7.2.5. Data output to a dendrogram

When outputting cluster analysis data, if an algorithm other than the k-means is used, the results of cluster analysis are output in the form of a dendrogram (the analysis algorithm must provide for outputting the distribution of the analyzed objects by clusters):

Fig. 475. Dendrogram

<< Prev Next >>