By: Tamás K. Papp
Re-posted from: https://tamaspapp.eu/post/common-random-variables/
Code used in this post was written for Julia v0.6.2 and, when applicable, reflects the state of packages used on 2018-03-23. You may need to modify it for different versions.
Motivation
When simulating models with random components, it is frequently advantageous to decompose the structure into
-
parameters \(\theta\) that we need to characterize structural relationships,
-
noise \(\varepsilon\) that is independent of parameters,
-
a function \(f(\theta, \varepsilon)\) that generates observed data or moments.
Doing this allows us to use common random variables (aka common random numbers), which is a technique which simply keeps \(\varepsilon\) fixed for different \(\theta\). This can help with making \(f\) differentiable, which allows the use of derivative-based optimization algorithms (eg for maximum likelihood or MAP) or derivative-based MCMC methods. It is also used to reduce the variance of simulated moments.
When implementing this technique in Julia, I frequently create a wrapper structure that saves the \(\varepsilon\), allocating an Array
which can be updated in place. Since a redesign of DynamicHMC.jl is coming up which will accommodate simulated likelihood methods in a more disciplined manner, and I wanted to explore other options, most importantly StaticArrays.jl.
Here I benchmark the alternatives for Julia v0.6.2
using a simple toy model. TL;DR for the impatient: StaticArrays.jl
is 150x faster, and this does not depend on using immutable or mutable StaticArray
s, or even reallocating every time new \(\varepsilon\)s are generated.
Problem setup
The setup is simple: we draw \(M\) observations, and our noise is
\[
\varepsilon_{i,j} \sim N(0, 1)
\qquad
\text{for } i = 1, \dots, M; j = 1, \dots, 7.
\]
Our parameters are \(\mu\) and \(\sigma\), and for each \(i\) we calculate
\[
A_i = \frac17 \sum_{j=1}^7 \exp(\mu + \sigma \varepsilon_{i,j})
\]
which is the sample average after a nonlinear transformation. The \(7\) is a bit accidental, it comes from simplifying an actual problem I am working on. We are interested in the sample average for \(A_i\). I deliberately refrain micro-optimizing each version, to reflect how I would approach a real-life problem.
We code the common interface as
using BenchmarkTools
using StaticArrays
using Parameters
# common interface
"Dimension of noise ``ϵ`` for each observation."
const EDIM = 7
"""
Common random variables. The user needs to define
1. `observation_moments`, which should use `observation_moment`,
2. `newcrv = update!(crv)`, which returns new common random variables,
potentially (but not necessarily) overwriting `crv`.
"""
abstract type CRVContainer end
observation_moment(ϵ, μ, σ) = mean(@. exp(μ + σ * ϵ))
average_moment(crv::CRVContainer, μ, σ) = mean(observation_moments(crv, μ, σ))
"Calculate statistics, making `N` draws, updating every `L`th time."
function stats(crv, μ, σ, N, L)
_stat() = (N % L == 0 && (crv = update!(crv)); average_moment(crv, μ, σ))
[_stat() for _ in 1:N]
end
The way I wrote stats
is representative of how I use HMC/NUTS: simulated moments on the same trajectory are calculated with the same \(\varepsilon\)s, which are updated for each trajectory. Of course, the parameters would change along the trajectory; here they don’t, but this does not affect the benchmarks.
The semantics of update!
allows both in-place modifications and a functional style.
Using a preallocated matrix
This is the standard way I would write this. \(\varepsilon\)s are in columns of a matrix, which is preferable for mapping them as slices, then they are mapped using observation_moment
.
update!
overwrites the contents.
"""
Common random variables are stored in columns of a matrix.
"""
struct PreallocatedMatrix{T} <: CRVContainer
ϵ::Matrix{T}
end
PreallocatedMatrix(M::Int) = PreallocatedMatrix(randn(EDIM, M))
update!(p::PreallocatedMatrix) = (randn!(p.ϵ); p)
observation_moments(p::PreallocatedMatrix, μ, σ) =
vec(mapslices(ϵ -> observation_moment(ϵ, μ, σ), p.ϵ, 1))
Using static vectors
We share the following between various static vector implementations:
"Common random variables as a vector of vectors, in the `ϵs`."
abstract type CRVVectors <: CRVContainer end
observation_moments(p::CRVVectors, μ, σ) =
map(ϵ -> observation_moment(ϵ, μ, σ), p.ϵs)
I find the use of map
more intuitive here than mapslices
above.
Static vectors, container preallocated
In the first version using static vectors, we keep SVector
in a Vector
, and update in place.
struct PreallocatedStaticCRV{K, T} <: CRVVectors
ϵs::Vector{SVector{K, T}}
end
PreallocatedStaticCRV(M::Int) =
PreallocatedStaticCRV([@SVector(randn(EDIM)) for _ in 1:M])
function update!(p::PreallocatedStaticCRV)
@unpack ϵs = p
@inbounds for i in indices(ϵs, 1)
ϵs[i] = @SVector(randn(EDIM))
end
p
end
Mutable static vectors, overwritten
We modify this to use mutable vectors — this should not make a difference.
struct MutableStaticCRV{K, T} <: CRVVectors
ϵs::Vector{MVector{K, T}}
end
MutableStaticCRV(M::Int) =
MutableStaticCRV([@MVector(randn(EDIM)) for _ in 1:M])
function update!(p::MutableStaticCRV)
@unpack ϵs = p
@inbounds for i in indices(ϵs, 1)
randn!(ϵs[i])
end
p
end
Static vectors, allocated each time
Finally, for the third solution,
struct GeneratedStaticCRV{K, T} <: CRVVectors
ϵs::Vector{SVector{K, T}}
end
GeneratedStaticCRV(M::Int) =
GeneratedStaticCRV([@SVector(randn(EDIM)) for _ in 1:M])
update!(p::GeneratedStaticCRV{K, T}) where {K, T} =
GeneratedStaticCRV([@SVector(randn(T, K)) for _ in indices(p.ϵs, 1)])
Results
Running
@btime mean(stats(PreallocatedMatrix(100), 1.0, 0.1, 100, 10))
@btime mean(stats(PreallocatedStaticCRV(100), 1.0, 0.1, 100, 10))
@btime mean(stats(MutableStaticCRV(100), 1.0, 0.1, 100, 10))
@btime mean(stats(GeneratedStaticCRV(100), 1.0, 0.1, 100, 10))
we obtain
solution |
time |
allocations |
PreallocatedMatrix |
230 ms |
22 MiB |
PreallocatedStaticCRV |
1.5 ms |
102 KiB |
MutableStaticCRV |
1.5 ms |
104 KiB |
GeneratedStaticCRV |
1.5 ms |
666 KiB |
As a preview of future improvements, I tried PreallocatedMatrix
on current master
(which will become Julia v0.7
, obtaining 3.5 ms
(2.46 MiB
), which is really promising.
The conclusion is that StaticArrays
simplifies and speeds up my code at the same time. I especially like the last version (GeneratedStaticCRV
), because it obviates the need to think about types in advance. While here the example is simple, in practice I would use automatic differentiation, which makes it more challenging to determine buffer types in advance. I expect I will transition to a more “buffer-free” style in the future, and design the interface for DynamicHMC.jl accordingly.
PS: From now on, my blog posts with Julia code will have a banner about version information.