Maximize K-Clustering Spacing: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 12: Line 12:


=Algorithm=
=Algorithm=
The conceptual pseudocode to achieve k-clustering with maximum possible spacing is:
<font size=-1>
<font size=-1>
  initially each point is a separate cluster
  initially each point is a separate cluster
Line 18: Line 19:
   merge the clusters containing p and q into a single cluster
   merge the clusters containing p and q into a single cluster
</font>
</font>
This algorithm is a [[Clustering_Concepts#Single-Link_Clustering|single-link clustering]] algorithm. It is very similar in structure to the [[Kruskal's Algorithm|Kruskal's algorithm]].
This algorithm is a [[Clustering_Concepts#Single-Link_Clustering|single-link clustering]] algorithm, in that it refines the clustering one point at a time. It is very similar in structure to the [[Kruskal's Algorithm|Kruskal's algorithm]]. A practical implementation that sorts the distances in a pre-processing steps, in a similar manner to how the Kruskal's algorithm does it is:


=Correctness Proof=
=Correctness Proof=
{{External|https://www.coursera.org/learn/algorithms-greedy/lecture/7lWTf/correctness-of-clustering-algorithm}}
{{External|https://www.coursera.org/learn/algorithms-greedy/lecture/7lWTf/correctness-of-clustering-algorithm}}

Revision as of 04:02, 24 October 2021

External

Internal

Overview

This is a greedy algorithm that aims to maximize spacing between any two clusters. By spacing between two clusters we mean the minimum distance between any two separated points, which belong to two distinct clusters:

minseparated p, qd(p,q)

"Good" clustering means that all of the separated points should be as far apart as possible.

Algorithm

The conceptual pseudocode to achieve k-clustering with maximum possible spacing is:

initially each point is a separate cluster
repeat until only k clusters:
  let p,q = closest pair of separated points # determines the current spacing
  merge the clusters containing p and q into a single cluster

This algorithm is a single-link clustering algorithm, in that it refines the clustering one point at a time. It is very similar in structure to the Kruskal's algorithm. A practical implementation that sorts the distances in a pre-processing steps, in a similar manner to how the Kruskal's algorithm does it is:

Correctness Proof

https://www.coursera.org/learn/algorithms-greedy/lecture/7lWTf/correctness-of-clustering-algorithm