By: Posts | vladium
Re-posted from: https://vladium.com/post/20191016/checking-bounds-checking-in-julia/
Ever since my knapsack benchmark I’ve been curious why Julia was still \(\approx 50\%\) slower than either Java or C++ on what essentially were a couple of nested for-loops over a pair of arrays. I’ve done some exploring and now believe that my first guess was correct: much of the remaining Julia overhead is due to its bounds checking.
Eliding bounds checking unconditionally with @inbounds
My first test was to isolate all opt_value()
lines with array access into a begin...end
block and annotate it with @inbounds:
function opt_value(W ::Int64, items ::Array{Item}) ::Int64
...
@inbounds(
begin
V[items[1].weight:end] .= items[1].value
...
for j in 2 : n
V, V_prev = V_prev, V
itemⱼ = items[j]
for w in 1 : W
V_without_itemⱼ = V_prev[w]
...
end
end
end
)
...
Here is the data from the original benchmark with added new timing measurements of the above version, labeled julia.ib
:
Nice! With bounds checks elided Julia performance is now in line with Java and C++: its underperformance has decreased to a maximum of \(\approx 15\%\). And Julia is now even the champion for small problem sizes.
Peeking at differences in generated code via @code_native
Another neat facility of Julia allows me to examine JIT-compiled code very easily and right from my REPL session:
julia> d = Knapsack.make_random_data(5_000, 12345)
(5000, Main.Knapsack.Item[Main.Knapsack.Item(609, 6198), ...
julia> @code_native debuginfo=:none Knapsack.opt_value(d[1],d[2])
.text
pushq %rbp
movq %rsp, %rbp
...
L1165:
movl $2, %edx
jmp L825
nopw (%rax,%rax)
I don’t show the full asm listing because that’s not particularly useful unless you’re a low-level programmer. Comparing it with and without @inbounds
does confirm, however, that the former version is shorter: about 200 instructions instead of 320.
It is more instructive to look at the compiled version(s) of a simpler function that simply extracts a single slot from an input array:
function getit(a ::Array{Int64})
return a[123]
end
julia> @code_native getit(rand(Int64, 1000))
.text
; ┌ @ work.jl:185 within `getit'
; │┌ @ work.jl:185 within `getindex'
cmpq $122, 8(%rdi)
jbe L18
movq (%rdi), %rax
movq 976(%rax), %rax
; │└
retq
L18:
pushq %rbp
movq %rsp, %rbp
; │ @ work.jl:185 within `getit'
; │┌ @ array.jl:728 within `getindex'
movq %rsp, %rax
leaq -16(%rax), %rsi
movq %rsi, %rsp
movq $123, -16(%rax)
movabsq $jl_bounds_error_ints, %rax
movl $1, %edx
callq *%rax
nopl (%rax)
; └└
Note the instructions to check the fixed (and 0-based) index against the array input length (cmpq $122, 8(%rdi)) followed by a conditional jump to native C routine jl_bounds_error_ints() that creates and throws a BoundsError
exception object. The following version of getit()
, with the only array indexing expression annotated with @inbounds
, is much shorter:
function getit(a ::Array{Int64})
return @inbounds a[123]
end
julia> @code_native getit(rand(Int64, 1000))
.text
; ┌ @ work.jl:185 within `getit'
; │┌ @ work.jl:185 within `getindex'
movq (%rdi), %rax
movq 976(%rax), %rax
; │└
retq
nopl (%rax,%rax)
; └
Disabling bounds checking from command line
It seems that I can get disable bounds checking summarily using –check-bounds=no option:
>julia --check-bounds=no Knapsack.jl
The documentation is somewhat sparse, but it appears that when this option is used it will completely override all @inbounds
annotations in the source code, i.e. either all checks are done in a mandatory fashion (“yes”) or none are (“no”). This could be used for “debug” vs “release” calculation runs. However, since bounds checking affects code as it’s being generated, some more documentation is needed to understand how this option works with “pre-compiled” packages and Julia “system” code. I also would like to understand the functionality and use cases for the @propagate_inbounds macro.