The need for rand speed

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2020/11/20/rand.html

Introduction

Very often when I answer questions on Stack Overflow I learn something
new. Recently when discussing random number generation in this post
I have made an answer using a practice I knew worked from my experience,
but it turned out that I did not really understand why (and thanks to
rafak for a great comment).

Let us start with the conclusion from the discussion and then I will expand on it:

Always explicitly pass random number generator to the rand function in
performance-critical code.

Let us first see a simple example of this rule at work and next try to
understand the reason for this recommendation.

Estimating \(\pi\) using Monte Carlo simulation

Let us write a simple function that approximates \(\pi\) using Monte Carlo
simulation and uses the default global pseudo-random number generator.

function pi_global(n::Int)
    s = 0
    for _ in 1:n
        s += rand()^2 + rand()^2 < 1
    end
    return 4 * s / n
end

The code takes advantage from a well known fact that if we sample a point
\((x,y)\) uniformly from \([0,1,]^2\) square the probability that \(x^2+y^2\) is
less than \(1\) is equal to \(\pi/4\).

We check the runtime of this code:

julia> @time pi_global(10^9)
  6.998930 seconds (19 allocations: 20.188 KiB)
3.141615124

julia> @time pi_global(10^9)
  7.002321 seconds
3.141527116

as you can see on my laptop it is around 7 seconds.

Now let us write a function that takes a MersenneTwister
generator (this is the default pseudo-random number generator in Julia).

using Random

function pi_local(n::Int, rng::MersenneTwister)
    s = 0
    for _ in 1:n
        s += rand(rng)^2 + rand(rng)^2 < 1
    end
    return 4 * s / n
end

Here is its timing:

julia> mt = MersenneTwister();

julia> @time pi_local(10^9, mt)
  2.723634 seconds
3.141526412

julia> @time pi_local(10^9, mt)
  2.734530 seconds
3.141671232

Wow! I would not have expected this.

Now let me reveal that I am on Julia 1.5.3. Interestingly, when I built my
habits of working with rand it was Julia 1.0 time. Let us check these codes on
Julia 1.0.5 (that soon will stop being supported). Here are the results:

julia> function pi_global(n::Int)
           s = 0
           for _ in 1:n
               s += rand()^2 + rand()^2 < 1
           end
           return 4 * s / n
       end
pi_global (generic function with 1 method)

julia> @time pi_global(10^9)
  2.939260 seconds (44.35 k allocations: 2.366 MiB)
3.141632964

julia> @time pi_global(10^9)
  2.891349 seconds (6 allocations: 192 bytes)
3.14153098

julia> using Random

julia> function pi_local(n::Int, rng::MersenneTwister)
           s = 0
           for _ in 1:n
               s += rand(rng)^2 + rand(rng)^2 < 1
           end
           return 4 * s / n
       end
pi_local (generic function with 1 method)

julia> mt = MersenneTwister();

julia> @time pi_local(10^9, mt)
  3.129134 seconds (30.73 k allocations: 1.574 MiB)
3.141618824

julia> @time pi_local(10^9, mt)
  3.115317 seconds (6 allocations: 192 bytes)
3.141620408

We see that there is a huge regression in the performance of rand() between
versions of Julia. Let us understand what is the reason for this.

Digging down the rand() implementation

We switch back to Julia 1.5.3 and will stick to it till the end of this post.

First we do a quick benchmark (I am using the same Julia 1.5.3. session as
above):

julia> using BenchmarkTools

julia> @btime rand()
  4.784 ns (0 allocations: 0 bytes)
0.40836802665975824

julia> @btime rand($mt)
  2.890 ns (0 allocations: 0 bytes)
0.23541608567839556

There is a significant difference in performance indeed. So what does rand()
do that costs so much? Let us see the definition of relevant method for rand
(it is easy to get it by writing @edit rand()):

rand(rng::AbstractRNG=default_rng(), ::Type{X}=Float64) where {X} =
    rand(rng, Sampler(rng, X, Val(1)))

We can see that the only difference between rand() and rand(mt) is that the
former calls default_rng() function (it is from the Random module).

In a similar way as above we dig down to the relevant definition:

const THREAD_RNGs = MersenneTwister[]
@inline default_rng() = default_rng(Threads.threadid())
@noinline function default_rng(tid::Int)
    0 < tid <= length(THREAD_RNGs) || _rng_length_assert()
    if @inbounds isassigned(THREAD_RNGs, tid)
        @inbounds MT = THREAD_RNGs[tid]
    else
        MT = MersenneTwister()
        @inbounds THREAD_RNGs[tid] = MT
    end
    return MT
end
@noinline _rng_length_assert() =  @assert false "0 < tid <= length(THREAD_RNGs)"

function __init__()
    resize!(empty!(THREAD_RNGs), Threads.nthreads()) # ensures that we didn't save a bad object
end

And now we see the reason. In Julia 1.5.3 rand() is tread safe (it was not in
Julia 1.0, and that is the reason of the difference in performance between
versions). Ensuring thread safety must cost something. In this case even
although the code that extracts out the appropriate MersenneTwister instance
from the THREAD_RNGs vector is simple it has a noticeable cost (the reason is
that random number generation itself is extremely well optimized and fast).

Conclusion

Going back to the beginning of this post: remember not to use rand() in your
performance critical code.

Also I think this example shows very nicely how huge benefits we have from the
fact that Julia is mostly written in Julia — it is only a few keystrokes
and we could identify the root cause of the performance puzzle.