14.7. "Clusterization" analysis type
Cluster analysis is a mathematical procedure for multidimensional analysis, which allows you to group objects into clusters based on a set of indicators characterizing the objects. Objects must be grouped so that objects in one cluster are more homogeneous and similar compared to objects in other clusters.
The basis of this analysis is the calculation of distance between objects. Based on the distances between the objects these are grouped into clusters. The distance can be determined in different ways (according to different metrics). The following metrics are available:
- Euclidean metric
- Squared Euclidean metric
- City block metric
- Maximum metric.
After determining the distances between objects, one of several algorithms for distributing objects among clusters can be used. The following clustering methods are available:
- Nearest neighbor
- Furthest neighbor
- K-means
- Centroid
Schematically, the functionality of cluster analysis can be presented as follows:

Fig. 468. Cluster analysis layout
The data source is passed to the DataAnalysis object. The data source can be the result of a query, a value table, a cell area of a spreadsheet document. Source columns are defined as input or unused. It should be noted that all the column values are contained in the DataAnalysisColumnTypeClusterization system enumeration. This enumeration contains more values (not only unused and input ones), but the other values are used when building forecasts.
The analysis is performed in accordance with the set analysis parameters.
We will use the following code fragment as an example illustrating the capability of cluster analysis:
&AtClient Procedure ClusterAnalysis(Command) Result = AnalysisClusterization(); EndProcedure &AtServerNoContext Function AnalysisClusterization() Analysis = New DataAnalysis; Analysis.AnalysisType = Type("DataAnalysisClusterization"); Group = Catalogs.Counterparties.FindByDescription("Legal entities"); Query = New Query; Query.Text = " |SELECT |Counterparties.Ref, |Counterparties.RetailShopsCount, |Counterparties.VehiclesCount, |Counterparties.CompanyOperationTime, |Counterparties.ContractSigningTime, |Counterparties.ContractType, |Counterparties.RelationsTermination |FROM |Catalog.Counterparties AS Counterparties |WHERE |(Not Counterparties.IsFolder AND Counterparties.Parent = &Parent)"; Query.SetParameter("Parent", Group); Analysis.DataSource = Query.Execute(); // Selecting metric. Analysis.Parameters.DistanceMeasure.Value = DataAnalysisDistanceMetricType.SquaredEuclidean; // Selecting clusterization method. Analysis.Parameters.ClusterizationMethod.Value = ClusterizationMethod.KMeans; AnalysisResult = Analysis.Execute(); Builder = New DataAnalysisReportBuilder(); Builder.Template = Undefined; Builder.AnalysisType = Type("DataAnalysisClusterization"); SpreadsheetDoc = New SpreadsheetDocument; Builder.Output(AnalysisResult, Spreadsheet); Return Spreadsheet; EndFunction
Query is performed by the Counterparties catalog. According to the query condition, only detailed catalog entries from the Legal entities group are selected.
Execution of the above code will result in the following values being defined as the initial data analysis settings. Some of them are set explicitly, some of them are set by default:

Fig. 469. Analysis parameters
The composition of the columns was determined based on the composition of the query selection fields. By default they are defined with equal weight. For the Number and Date types, the Contiguous data type is defined. For other types, the Discrete type is defined. If it is necessary to change the parameters of the columns, this can be done by analogy with the fragment below:
Analysis.ColumnsSetting.VehiclesCount.AdditionalParameters.Weight = 2;
In this line, the weight is increased for the VehiclesCount column.
The selection of data for which the analysis will be performed has the following content:
Counterparty |
Number of retail shops |
Number of vehicles |
Company operation time |
Contract signing time |
Contract type |
Relations condition |
Smith CJSC |
1 |
0 |
Less than a year |
Less than a year |
Dealer |
Contract violation |
Furniture CJSC |
15 |
4 |
From three to ten years |
Less than a year |
Distributor |
Terminated by counterparty |
Furniture CJSC |
1 |
10 |
From three to ten years |
From one to three years |
Distributor |
Terminated by counterparty |
Forest LLC |
1 |
1 |
From one to three years |
Less than a year |
Dealer |
Terminated by counterparty |
Shop No. 15 |
1 |
1 |
Over ten years |
From three to ten years |
Permanent partner |
Not terminated |
Gross LLC |
3 |
2 |
Less than a year |
Less than a year |
Permanent partner |
Not terminated |
Consultant LLC |
7 |
3 |
From three to ten years |
From one to three years |
Permanent partner |
Terminated by counterparty |
Trust LLC |
2 |
2 |
Over ten years |
From three to ten years |
Permanent partner |
Not terminated |
Individual Entrepreneur Taylor |
0 |
1 |
Less than a year |
Less than a year |
Dealer |
Not terminated |
The result of the analysis will be obtained in the following form:

Fig. 470. Cluster analysis result
Note that data is retrieved on the clusters found (their number, centers, distances between them) as a result of the analysis. The analysis does not result in obtaining the data on which objects (in our case, counterparties) are included in which clusters. This behavior is observed if the parameters of the analysis performed are not explicitly set (namely, the TableFillingType parameter).
In order to see the distribution of objects in clusters as a result of the analysis, it is necessary to define the following line of code before performing the analysis (but after determining its type):
Analysis.Parameters.TableFillType.Value = DataAnalysisResultTableFillType.UsedFields;