Update app.py
Browse files
app.py
CHANGED
@@ -56,6 +56,16 @@ title = '# Agglomerative Clustering on MNIST'
|
|
56 |
|
57 |
description = """
|
58 |
An illustration of various linkage option for [agglomerative clustering](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html) on the digits dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
"""
|
60 |
|
61 |
author = '''
|
|
|
56 |
|
57 |
description = """
|
58 |
An illustration of various linkage option for [agglomerative clustering](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html) on the digits dataset.
|
59 |
+
|
60 |
+
The goal of this example is to show intuitively how the metrics behave, and not to find good clusters for the digits.
|
61 |
+
|
62 |
+
What this example shows us is the behavior of "rich getting richer" in agglomerative clustering, which tends to create uneven cluster sizes.
|
63 |
+
|
64 |
+
This behavior is pronounced for the average linkage strategy, which ends up with a couple of clusters having few data points.
|
65 |
+
|
66 |
+
The case of single linkage is even more pathological, with a very large cluster covering most digits, an intermediate-sized (clean) cluster with mostly zero digits, and all other clusters being drawn from noise points around the fringes.
|
67 |
+
|
68 |
+
The other linkage strategies lead to more evenly distributed clusters, which are therefore likely to be less sensitive to random resampling of the dataset.
|
69 |
"""
|
70 |
|
71 |
author = '''
|