Re-posted from: https://bkamins.github.io/julialang/2020/10/16/gctime.html
Introduction
It is a well known performance recommendation in the Julia language that
avoiding allocations matters and using immutable objects is a good practice. In
this post, in the context of creation of agent based models, I want to show two
examples of a toy codes that highlight the key aspects of this issue.
All the examples are tested under Julia 1.5.2. The Setfield.jl package version
used is 0.7.
Avoiding allocations
Consider the following code:
using Statistics
struct AgentI
loc::Int
end
mutable struct AgentM
loc::Int
end
fI(n) = mean(x -> x.loc, [AgentI(i) for i in 1:n])
fM(n) = mean(x -> x.loc, [AgentM(i) for i in 1:n])
The only difference between functions fI
and fM
is that they respectively
work on immutable and mutable objects. I use mean
for aggeration to make sure
that the compiler does not optimize out too much.
Let us benchmark these codes:
julia> fI(1); fM(1); GC.gc(); # force compilation and collect garbage
julia> @time fI(10^8)
0.385592 seconds (2 allocations: 762.940 MiB, 0.73% gc time)
5.00000005e7
julia> @time GC.gc()
0.091111 seconds, 100.00% gc time
julia> @time fM(10^8)
3.498990 seconds (100.00 M allocations: 2.235 GiB, 60.79% gc time)
5.00000005e7
julia> @time GC.gc()
0.295949 seconds, 100.00% gc time
We see that working with mutable objects was ten times slower and also
it lead to a higher garbage collection cost after fM
terminated.
In particular note that over 60% of run time of fM
was spent on garbage
collection.
The reason is the following. Allocating objects has three costs:
- cost of actual allocation
- cost of having to store and use object references instead of objects directly
in the container - cost of cleaning-up (i.e. running garbage collector that frees unused memory)
The cost of data movement
A huge advantage of mutable objects is, well, that they are mutable. This makes
it simple to update only their selected fields. One might wonder if the cost
of having to create the immutable object anew each time in such cases is important.
Let us investigate. First we set up our experiment:
using Statistics
using Setfield
struct Agent2I
loc::Int
junk::NTuple{100, Int}
end
mutable struct Agent2M
loc::Int
junk::NTuple{100, Int}
end
const REF_TUP = ntuple(i -> 0, 100)
function gI(n)
x = [Agent2I(i, REF_TUP) for i in 1:n]
for i in 1:n
xi = x[i]
x[i] = @set xi.loc += 1
end
return mean(x -> x.loc, x)
end
function gM(n)
x = [Agent2M(i, REF_TUP) for i in 1:n]
for i in 1:n
x[i].loc += 1
end
return mean(x -> x.loc, x)
end
Note that we used REF_TUP
to make the Agent2I
object have a larger memory
footprint. The size of agent state I used is usually more than enough in
practice. As you can see in the code the Setfield.jl package makes it easy to
update only selected field of an immutable object using the @set
macro.
It is time to start benchmarking:
julia> gI(1); gM(1); GC.gc();
julia> @time gI(10^7)
2.714677 seconds (2 allocations: 7.525 GiB, 0.10% gc time)
5.0000015e6
julia> @time GC.gc()
0.277477 seconds, 100.00% gc time
julia> @time gM(10^7)
7.287978 seconds (10.00 M allocations: 7.674 GiB, 59.53% gc time)
5.0000015e6
julia> @time GC.gc()
0.965113 seconds, 100.00% gc time
So as you can see the time difference has narrowed down, but still the immutable
code is faster.
Conclusions
In general working with mutable code is easier than working with immutable code.
However, if you are working with performance critical code, avoiding using
mutable objects is one of the first recommendations to consider when trying to
optimize it. Also Setfield.jl (and upcoming Accessors.jl) make working with
immutable objects relatively smooth.