By: Steven Whitaker
Re-posted from: https://blog.glcs.io/staticarrays
The Julia programming languageis known for being a high-level languagethat can still compete with Cin terms of performance.As such,Julia already has performant data structures built-in,such as arrays.But what if arrays could be even faster?That’s where the StaticArrays.jl package comes in.
StaticArrays.jl provides drop-in replacements for Array
,the standard Julia array type.These StaticArray
s work just like Array
s,but they provide one additional piece of informationin the type:the size of the array.Consequently,you can’t insert or remove elements of a StaticArray
;they are statically sized arrays(hence the name).However,this restriction allows more informationto be given to Julia’s compiler,which in turn results in more efficient machine code(for example, via loop unrolling and SIMD operations).The resulting speed-up can often be 10x or more!
In this post,we will learn how to use StaticArrays.jland compare the performance of StaticArray
sto that of regular Array
sfor several different operations.
Note that the code examples in this postassume StaticArrays.jl has been installed and loaded:
# Press ] to enter the package prompt.pkg> add StaticArrays# Press Backspace to return to the Julia prompt.julia> using StaticArrays
(Check out our post on the Julia REPLfor more details about the package promptand navigating the REPL.)
How to Use StaticArrays.jl
When working with StaticArrays.jl,typically one will use the SVector
typeor the SMatrix
type.(There is also the SArray
type for N-dimensional arrays,but we will focus on 1D and 2D arrays in this post.)SVector
s and SMatrix
es have both static sizeand static data,meaning the data contained in such objectscannot be modified.For statically sized arrayswhose contents can be modified,StaticArrays.jl provides MVector
and MMatrix
(and MArray
).We will stick with SVector
s and SMatrix
es in this postunless we specifically need mutability.
Constructors
There are three ways to construct StaticArray
s.
-
Convenience constructor
SA
:julia> SA[1, 2, 3]3-element SVector{3, Int64} with indices SOneTo(3): 1 2 3julia> SA[1 2; 3 4]22 SMatrix{2, 2, Int64, 4} with indices SOneTo(2)SOneTo(2): 1 2 3 4
-
Normal constructor functions:
julia> SVector(1, 2)2-element SVector{2, Int64} with indices SOneTo(2): 1 2julia> SMatrix{2,3}(1, 2, 3, 4, 5, 6)23 SMatrix{2, 3, Int64, 6} with indices SOneTo(2)SOneTo(3): 1 3 5 2 4 6
-
Macros:
julia> @SVector [1, 2, 3]3-element SVector{3, Int64} with indices SOneTo(3): 1 2 3julia> @SMatrix [1 2; 3 4]22 SMatrix{2, 2, Int64, 4} with indices SOneTo(2)SOneTo(2): 1 2 3 4
Note that using macrosalso enables a convenient wayto create
StaticArray
s from common array-creation functions(eliminating the need to create anArray
firstjust to convert it immediately to aStaticArray
):@SVector [10 * i for i = 1:10]@SVector zeros(5)@SVector rand(7)@SMatrix [(i, j) for i = 1:2, j = 1:3]@SMatrix zeros(2, 2)@SMatrix randn(6, 6)
Conversion to/from Array
It may occasionally be necessaryto convert to or from Array
s.To convert from an Array
to a StaticArray
,use the appropriate constructor function.However, because Array
s do not have size information in the type,we ourselves must provide the size to the constructor:
SVector{3}([1, 2, 3])SMatrix{4,4}(zeros(4, 4))
To convert back to an Array
, use the collect
function:
julia> collect(SVector(1, 2))2-element Vector{Int64}: 1 2
Comparing StaticArray
s to Array
s
Once a StaticArray
is created,it can be operated on in the same wayas an Array
.To illustrate,we will run a simple benchmark,both to compare the run-time speedsof the two types of arraysand to show that the same code can workwith either type of array.
Here’s the benchmark code,inspired by StaticArrays.jl’s benchmark:
using BenchmarkTools, StaticArrays, LinearAlgebra, Printfadd!(C, A, B) = C .= A .+ Bfunction run_benchmarks(N) A = rand(N, N); A = A' * A B = rand(N, N) C = Matrix{eltype(A)}(undef, N, N) D = rand(N) SA = SMatrix{N,N}(A) SB = SMatrix{N,N}(B) MA = MMatrix{N,N}(A) MB = MMatrix{N,N}(B) MC = MMatrix{N,N}(C) SD = SVector{N}(D) speedup = [ @belapsed($A + $B) / @belapsed($SA + $SB), @belapsed(add!($C, $A, $B)) / @belapsed(add!($MC, $MA, $MB)), @belapsed($A * $B) / @belapsed($SA * $SB), @belapsed(mul!($C, $A, $B)) / @belapsed(mul!($MC, $MA, $MB)), @belapsed(norm($D)) / @belapsed(norm($SD)), @belapsed(det($A)) / @belapsed(det($SA)), @belapsed(inv($A)) / @belapsed(inv($SA)), @belapsed($A \ $D) / @belapsed($SA \ $SD), @belapsed(eigen($A)) / @belapsed(eigen($SA)), @belapsed(map(abs, $A)) / @belapsed(map(abs, $SA)), @belapsed(sum($D)) / @belapsed(sum($SD)), @belapsed(sort($D)) / @belapsed(sort($SD)), ] return speedupendfunction main() benchmarks = [ "Addition", "Addition (in-place)", "Multiplication", "Multiplication (in-place)", "L2 Norm", "Determinant", "Inverse", "Linear Solve (A \\ b)", "Symmetric Eigendecomposition", "`map`", "Sum of Elements", "Sorting", ] N = [3, 5, 10, 30] speedups = map(run_benchmarks, N) fmt_header = Printf.Format("%-$(maximum(length.(benchmarks)))s" * " | %7s"^length(N)) header = Printf.format(fmt_header, "Benchmark", string.("N = ", N)...) println(header) println("="^length(header)) fmt = Printf.Format("%-$(maximum(length.(benchmarks)))s" * " | %7.1f"^length(N)) for i = 1:length(benchmarks) println(Printf.format(fmt, benchmarks[i], getindex.(speedups, i)...)) endendmain()
Notice that all the functions calledwhen creating the array speedup
in run_benchmarks
are the same whether using Array
s or StaticArray
s,illustrating that StaticArray
sare drop-in replacements for standard Array
s.
Running the above codeprints the following results on my laptop(the numbers indicate the speedupof StaticArray
s over normal Array
s;e.g., a value of 17.7 meansusing StaticArray
s was 17.7 times fasterthan using Array
s):
Benchmark | N = 3 | N = 5 | N = 10 | N = 30====================================================================Addition | 17.7 | 14.5 | 7.9 | 2.0Addition (in-place) | 1.6 | 1.3 | 1.4 | 0.7Multiplication | 8.2 | 7.0 | 4.2 | 2.6Multiplication (in-place) | 1.9 | 5.9 | 3.0 | 1.0L2 Norm | 4.2 | 4.0 | 5.4 | 9.7Determinant | 66.6 | 2.5 | 1.3 | 0.9Inverse | 54.8 | 5.9 | 1.8 | 0.9Linear Solve (A \ b) | 65.5 | 3.7 | 1.8 | 0.9Symmetric Eigendecomposition | 3.7 | 1.0 | 1.0 | 1.0`map` | 10.6 | 8.2 | 4.9 | 2.1Sum of Elements | 1.5 | 1.1 | 1.7 | 2.1Sorting | 7.1 | 2.9 | 1.5 | 1.1
There are two main conclusions from this table.First,using StaticArray
s instead of Array
scan result in some nice speed-ups!Second,the gains from using StaticArray
s tend to diminishas the sizes of the arrays increase.So,you can’t expect StaticArrays.jlto always magically make your code faster,but if your arrays are small enough(the recommendation being fewer than about 100 elements)then you can expect to see some good speed-ups.
Of course,the above code timed just individual operations;how much faster a particular application would beis a different matter.
For example,consider a physical simulationwhere many 3D vectorsare manipulated over several time steps.Since 3D vectors are static in size(i.e., are 1D arrays with exactly three elements),such a situation is a prime exampleof where StaticArrays.jl is useful.To illustrate,here is an example(taken from the field of magnetic resonance imaging)of a physical simulationusing Array
s vs using StaticArrays
:
using BenchmarkTools, StaticArrays, LinearAlgebrafunction sim_arrays(N) M = Matrix{Float64}(undef, 3, N) M[1,:] .= 0.0 M[2,:] .= 0.0 M[3,:] .= 1.0 M2 = similar(M) (sin, cos) = sincosd(30) R = [1 0 0; 0 cos sin; 0 -sin cos] E1 = exp(-0.01) E2 = exp(-0.1) (sin, cos) = sincosd(1) F = [E2 * cos E2 * sin 0; -E2 * sin E2 * cos 0; 0 0 E1] FR = F * R C = [0, 0, 1 - E1] # Run for 100 time steps (each loop iteration does 2 time steps). for t = 1:50 mul!(M2, FR, M) M2 .+= C mul!(M, FR, M2) M .+= C end total = sum(M; dims = 2) return complex(total[1], total[2])endfunction sim_staticarrays(N) M = fill(SVector(0.0, 0.0, 1.0), N) (sin, cos) = sincosd(30) R = @SMatrix [1 0 0; 0 cos sin; 0 -sin cos] E1 = exp(-0.01) E2 = exp(-0.1) (sin, cos) = sincosd(1) F = @SMatrix [E2 * cos E2 * sin 0; -E2 * sin E2 * cos 0; 0 0 E1] FR = F * R C = @SVector [0, 0, 1 - E1] # Run for 100 time steps (each loop iteration does 1 time step). for t = 1:100 # Apply simulation dynamics to each 3D vector. for i = 1:length(M) M[i] = FR * M[i] + C end end total = sum(M) return complex(total[1], total[2])endfunction main(N) r1 = @btime sim_arrays($N) r2 = @btime sim_staticarrays($N) @assert r1 r2 # Make sure the results are the same.end
The speed-ups on my laptopfor different values of N
were as follows:
N = 10
: 14.6x fasterN = 100
: 7.1x fasterN = 1000
: 5.2x faster
(Here, N
is the number of 3D vectors in the simulation,not the size of the StaticArray
s.)
Note also that I wrote sim_arrays
to be as performant as possibleby doing in-place operations(like mul!
),which has the unfortunate side effectof making the code a bit harder to read.Therefore,sim_staticarrays
is both faster and easier to read!
As another exampleof how StaticArrays.jlcan speed up a more involved application,see the DifferentialEquations.jl docs.
Summary
In this post,we discussed StaticArrays.jl.We saw that StaticArray
s are drop-in replacementsfor regular Julia Array
s.We also saw that using StaticArray
scan result in some nice speed-upsover using Array
s,at least when the sizes of the arraysare not too big.
Are array operations a bottleneck in your code?Try out StaticArrays.jland then comment below how it helps!
Additional Links
- StaticArrays.jl Docs
- Documentation for StaticArrays.jl.
Cover image background fromhttps://openverse.org/image/875bf026-11ef-47a8-a63c-ee1f1877c156?q=circuit%20board%20array.