The goal of this part is to implement different spectral clustering algorithms and to analyze their behavior on a few simple graphs.
Reminder: The eigen()
function gives the spectrum of a
matrix.
Simulate different graphs and analyze the spectrum of the associated \(L_N\) and \(L_{\text{abs}}\) Laplacians. Does the eigengap method allow you to choose a reasonable number of clusters? Check which method finds the correct clustering. Besides, how do we compare the clusters obtained by two different classification algorithms? (We must pay attention to the problem of ``label switching’’).
Some ideas for graphs to simulate:
sample_gnp()
function);sample_sbm()
);Take one of the real graphs seen in the previous practicals (friends or karate graph for example) and apply the clustering algorithms on this data. Interpret your results.
In this part, we analyze the effect of the choice of the similarity graph as well as the effect of the choice of the method on the clustering obtained.
We simulate a dataset of \(n = 100\) points \(x_i\) in \(\mathbb R^2\) by the following instructions:
set.seed(111)
library(mlbench)
n <- 100
simu <- mlbench.spirals(100, 1, 0.025)
plot(simu)
We obtain a data set made up of two nested spirals of points. The challenge is to find the two groups.
To begin, apply the k-means algorithm directly to this data to make a clustering into two groups. Visualize the clustering by representing the points in color. Comment on the result.
Calculate Gaussian similarities \(s_{ij}=\exp\{-\|x_i-x_j\|^2/(2\sigma^2)\}\) with \(\sigma=1\).
Apply the normalized spectral clustering algorithm on this dense graph to search for two classes. Analyze the normalized Laplacian spectrum and visualize the clustering obtained.
Draw the cloud of the new points on which \(k\)-means is applied (these are the lines of the matrix T made up of the first two normalized eigenvectors) and add the initial classes in color. Interpret the results.
Apply the absolute spectral clustering algorithm with two classes. Analyze the spectrum and visualize the clustering obtained.
Do the same for the following similarity graphs: