Part 1: Spectral clustering algorithms

The goal of this part is to implement different spectral clustering algorithms and to analyze their behavior on a few simple graphs.

Exercise 1

Reminder: The eigen() function gives the spectrum of a matrix.

  1. Write an R function that performs normalized spectral clustering. This function takes as argument an adjacency matrix and the desired number of clusters. It can display the eigenvalues of the Laplacian matrix \(L_N\) and returns the resulting clustering.
  2. Simulate a small graph whose communities you know and test your function. We can take for example a graph with several connected components, each being \(G(n,p\) with \(p\) large enough. Then in a second step we make this graph connected.
  3. Do the same for absolute spectral clustering based on \(L_\text{abs}\).

Exercise 2

Simulate different graphs and analyze the spectrum of the associated \(L_N\) and \(L_{\text{abs}}\) Laplacians. Does the eigengap method allow you to choose a reasonable number of clusters? Check which method finds the correct clustering. Besides, how do we compare the clusters obtained by two different classification algorithms? (We must pay attention to the problem of ``label switching’’).

Some ideas for graphs to simulate:

  • a graph \(G(n,p)\) (with a single connected component) (use the sample_gnp() function);
  • a graph with two, three or four communities (try sample_sbm());
  • a star graph;
  • a bi-part graph (complete between the two parts);
  • two stars connected by an edge.

Exercise 3

Take one of the real graphs seen in the previous practicals (friends or karate graph for example) and apply the clustering algorithms on this data. Interpret your results.

Part 2: Similarity graphs

In this part, we analyze the effect of the choice of the similarity graph as well as the effect of the choice of the method on the clustering obtained.

We simulate a dataset of \(n = 100\) points \(x_i\) in \(\mathbb R^2\) by the following instructions:

set.seed(111)
library(mlbench)
n <- 100
simu <- mlbench.spirals(100, 1, 0.025)
plot(simu)

We obtain a data set made up of two nested spirals of points. The challenge is to find the two groups.

  1. To begin, apply the k-means algorithm directly to this data to make a clustering into two groups. Visualize the clustering by representing the points in color. Comment on the result.

  2. Calculate Gaussian similarities \(s_{ij}=\exp\{-\|x_i-x_j\|^2/(2\sigma^2)\}\) with \(\sigma=1\).

    • Apply the normalized spectral clustering algorithm on this dense graph to search for two classes. Analyze the normalized Laplacian spectrum and visualize the clustering obtained.

    • Draw the cloud of the new points on which \(k\)-means is applied (these are the lines of the matrix T made up of the first two normalized eigenvectors) and add the initial classes in color. Interpret the results.

    • Apply the absolute spectral clustering algorithm with two classes. Analyze the spectrum and visualize the clustering obtained.

  3. Do the same for the following similarity graphs:

    • We set \(\varepsilon=q_{0.75}\) the quantile at 75% of the similarities \(\{s_{ij}\}_{i<j}\). Construct the \(\varepsilon\)-neighborhood graph of the points \(\{x_i\}\).
    • We set \(\varepsilon=q_{0.95}\) the quantile at \(95\%\) of the similarities \(\{s_{ij}\}_{i<j}\). Construct the \(\varepsilon\)-neighborhood graph of the points \(\{x_i\}\).
    • We set \(p=2\lfloor \log(n)\rfloor\). Construct the valued graph of the \(p\) nearest mutual neighbors of the points \(\{x_i\}\).
    • We set \(p=2\). Construct the valued graph of the \(p\) nearest mutual neighbors of the points \(\{x_i\}\).
    • We set \(p=2\lfloor \log(n)\rfloor\). Construct the valued graph of the \(p\) nearest simple neighbors of the points \(\{x_i\}\).
    • We set \(p=2\). Construct the valued graph of the \(p\) nearest simple neighbors of the points \(\{x_i\}\).