Clustering Concepts: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 12: Line 12:


The distance function is symmetric: d(p,q) = d(q,p).
The distance function is symmetric: d(p,q) = d(q,p).
Examples of distance functions: euclidian distance between two points in space, the penalty of the best alignment between two genome fragments, etc.


=The Clustering Problem=
=The Clustering Problem=

Revision as of 20:29, 23 October 2021

External

Internal

Overview

We talk about "clustering" when we have a set of n "points", which we may think about as points in space in geometrical sense. It is actually quite rare that the underlying problem we care about is intrinsically geometric. Usually we are representing something else we care about (web pages, genome sequence fragments, etc.) and we want to cluster them in coherent groups. In machine learning, the same problem is referred to as "unsupervised learning", meaning that the data is unlabeled and we are looking for patterns in data, when data is not annotated.

Distance Function

Clustering problems require a similarity measure, which is a function that for any two object returns a numerical result that expresses how similar (or dissimilar) those objects are. We refer to this function as the distance function. The distance function d(p,q) returns the distance between each point pair.

The distance function is symmetric: d(p,q) = d(q,p).

Examples of distance functions: euclidian distance between two points in space, the penalty of the best alignment between two genome fragments, etc.

The Clustering Problem

Organizatorium

  • Some well known clustering algorithms are greedy.