Simple models and hypothesis testing

In this session we will use very simple random graph models (Erdös-Rényi, configuration model) to test hypotheses about graphs.

1. Adequacy of the Erdös-Rényi model

We start by asking the question of whether the Erdös-Rényi model is indeed appropriate for describing graphs.

We recall that the term ‘Erdös-Rényi model’ actually covers two slightly different versions of the same principle, namely the \(G(n,p)\) model which considers the set of graphs of \(n\) nodes in which each edge appears iid according to a law \(B(p)\) and the model \(G(n,M)\) which considers the set of graphs with \(n\) nodes and \(M\) edges provided with the uniform law. Here, we focus on the \(G(n,p)\) model.

Model \(G(n,p)\)

The sample_gnp() function of the igraph package allows you to simulate an Erdös-Rényi graph. We give as parameters the number \(n\) of nodes and the probability \(p\) of connection between two nodes.

Let’s simulate two different graphs: one weakly connected, and one very connected:

set.seed(1)
n <- 40
p1 <- 0.1
p2 <- 0.7
G1 <- sample_gnp(n, p1)
G2 <- sample_gnp(n, p2)
par(mfrow = c(1, 2))
plot(G1)
plot(G2)

Exercise 1

[Without R] Give an estimator \(\hat p\) of \(p\) in the model \(G(n,p)\), and recall the marginal distribution of the degree \(D_i\) of the node \(i\) in \(G(n,p)\).
Simulate graphs \(G(n,p)\) for different values of \(n\) and \(p\). For each simulated graph, evaluate the estimator \(\hat p\) and compare to \(p\). Additionally, plot the empirical degree distribution of the nodes and superimpose the expected degree distribution under the \(G(n,p)\) model. What do you observe?

Exercise 2

Take one of the real graphs analyzed during the last practical work. What is the estimated value of \(p\)? Plot the empirical distribution of the degrees of the nodes and superimpose the expected degree distribution under the model \(G(n,p)\). What do you notice?

2. Configuration models

This time we consider the family of configuration models.

Exercise 3

We start by fitting a power law to the graph used in the previous exercise. To do this, use the ‘fit_power_law’ function (start by reading the help to understand how it works). Then plot the empirical degree distribution and superimpose the power law with the estimated parameter. Pay attention to the normalization constant!

Exercise 4

We consider the random-degree model \(RD(\underline{d})\) where \(\underline{d}\) is the sequence of degrees of the observed graph. Compare the observed value of the degree \(d_i\) and the average value \(\mathbb{E}(D_i)\) under the model \(RD(\underline{d})\) (for different choices of the multiplicative constant).

3. Tests

We will use configuration models to implement simple tests on the statistics of a graph.

Exercise 4

Write two graph simulation functions under the fixed-degree model \(FD(\underline d)\) with matching and re-branching algorithms.
Take one of your real examples: use the two functions to generate graphs that have the same sequence of degrees as your real graph. We can compare the number of triangles in the two graphs, or any other statistic of your choice. Comment.
We want to test whether the number of occurrences of the triangle in the observed graph is significantly too large or too small compared to the expected value under the null model \(FD(\underline d)\). We use Monte Carlo simulations for this. More precisely, we generate a large number of graphs having the same sequence of degrees and we count the number of triangles for each of them. To generate these graphs, is it reasonable to use the matching and re-branching algorithms that you coded? If not use directly the igraph package. Draw a histogram of this distribution. What do you think of the value observed in your actual graph?

Simple models and hypothesis testing

Catherine Matias

October 2023

1. Adequacy of the Erdös-Rényi model

Model \(G(n,p)\)

Exercise 1

Exercise 2

2. Configuration models

Exercise 3

Exercise 4

3. Tests

Exercise 4