Re-posted from: https://bkamins.github.io/julialang/2024/05/31/waw2024.html
Introduction
Next week I organize WAW2024 conference. The event covers various aspects of theoretical and applied modeling of networks.
As an introduction I want to run a simulation of an example problem. Consider a random graph with a probability of an edge between two nodes equal to p
. Next, assume that we pick an edge uniformly at random from this graph and then remove two nodes forming this edge from the graph as matched. The question is what is the expected fraction of nodes that are going to be matched by this process.
Today, I will investigate this problem using simulation.
The post was written under Julia 1.10.1, Graphs.jl 1.11.0, and DataFrames.jl 1.6.1.
The simulation
Here is a simulator of our greedy matching process. In the simulation we traverse all edges of the graph in a random order.
In the matched
vector we keep track of which nodes have been already matched.
using Graphs
using Random
using Statistics
function run_sim(n::Integer, p::Real)
g = erdos_renyi(n, p)
matched = fill(false, n)
for e in shuffle!(collect(edges(g)))
n1, n2 = e.src, e.dst
if !(matched[n1] || matched[n2])
matched[n1] = true
matched[n2] = true
end
end
return mean(matched)
end
The experiment
Let us now test our simulator for a graph on 10000
nodes and p
varying from 0.00001
to 0.1
(on logarithmic scale).
julia> using DataFrames
julia> df = DataFrame(p=Float64[], rep=Int[], res=Float64[])
0×3 DataFrame
Row │ p rep res
│ Float64 Int64 Float64
─────┴─────────────────────────
julia> ps = [10.0^i for i in -5:-1]
5-element Vector{Float64}:
1.0e-5
0.0001
0.001
0.010000000000000002
0.1
julia> Random.seed!(1234);
julia> @time for p in ps, rep in 1:16
push!(df, (p, rep, run_sim(10_000, p)))
end
79.190585 seconds (438.02 M allocations: 14.196 GiB, 7.13% gc time, 0.36% compilation time)
julia> df
80×3 DataFrame
Row │ p rep res
│ Float64 Int64 Float64
─────┼─────────────────────────
1 │ 1.0e-5 1 0.0948
2 │ 1.0e-5 2 0.094
3 │ 1.0e-5 3 0.097
4 │ 1.0e-5 4 0.0892
5 │ 1.0e-5 5 0.0848
6 │ 1.0e-5 6 0.093
⋮ │ ⋮ ⋮ ⋮
76 │ 0.1 12 0.9992
77 │ 0.1 13 0.999
78 │ 0.1 14 0.999
79 │ 0.1 15 0.999
80 │ 0.1 16 0.9988
69 rows omitted
The simulation took a bit over 1 minute, mainly due to the p=0.1
case which generates a lot of edges in the graph.
Let us aggregate the obtained data to get the mean and standard error, and range of the results over all values of p
:
julia> combine(groupby(df, "p"),
"p" => (x -> 10_000 * first(x)) => "mean_degree",
"res" => mean,
"res" => (x -> std(x) / sqrt(length(x))) => "res_se",
"res" => extrema)
5×5 DataFrame
Row │ p mean_degree res_mean res_se res_extrema
│ Float64 Float64 Float64 Float64 Tuple…
─────┼───────────────────────────────────────────────────────────────
1 │ 1.0e-5 0.1 0.090975 0.000842986 (0.0848, 0.097)
2 │ 0.0001 1.0 0.499888 0.00190971 (0.4848, 0.5134)
3 │ 0.001 10.0 0.909425 0.000523729 (0.9062, 0.9126)
4 │ 0.01 100.0 0.990162 0.000257694 (0.9888, 0.992)
5 │ 0.1 1000.0 0.999 5.47723e-5 (0.9986, 0.9994)
We can see that the sharp increase of fraction of matched nodes happens around mean degree of 1 in the graph.
Additionally we see that even for high p
we do not match every node in the greedy matching process.
Finally the obtained results are relatively well concentrated around the mean.
Conclusions
If you want to see how this problem can be solved analytically I recommend you to read this paper.
Using the formulas derived there we can compare our simulation results with the asymptotic theory:
julia> (10_000 .* ps) ./ (10_000 .* ps .+ 1)
5-element Vector{Float64}:
0.09090909090909091
0.5
0.9090909090909091
0.9900990099009901
0.999000999000999
Indeed we see that the match is quite good.
If such problems are interesting for you I invite you to join us during WAW2024 conference.