Author Archives: Tamás K. Papp

CPU pipelines: when more is less

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/branch_prediction/

I have been working on micro-optimizations for some simulation code, and was reminded of a counter-intuitive artifact of modern CPU architecture, which is worth a short post.

Consider (just for the sake of example) a very simple (if not particularly meaningful) function,

\[
f(x) = \begin{cases}
(x+2)^2 & \text{if } x \ge 0,\\
1-x & \text{otherwise}
\end{cases}
\]

with implementations

f1(x) = ifelse(x  0, abs2(x+2), 1-x)
f2(x) = x  0 ? abs2(x+2) : 1-x

f1 calculates both possibilities before choosing between them with ifelse, while f2 will only calculate values on demand. So, intuitively, it should be faster.

But it isn’t…

julia> x = randn(1_000_000);

julia> using BenchmarkTools

julia> @btime f1.($x);
  664.228 μs (2 allocations: 7.63 MiB)

julia> @btime f2.($x);
  6.519 ms (2 allocations: 7.63 MiB)

…it is about 10x slower.

This can be understood as an artifact of the instruction pipeline: your x86 CPU likes to perform similar operations in staggered manner, and it does not like branches (jumps) because they break the flow.

Comparing the native code reveals that while f1 is jump-free, the if in f2 results in a jump (jae):

julia> @code_native f1(1.0)
        .text
Filename: REPL[2]
        pushq   %rbp
        movq    %rsp, %rbp
        movabsq $139862498743472, %rax  # imm = 0x7F34468E14B0
        movsd   (%rax), %xmm2           # xmm2 = mem[0],zero
Source line: 1
        addsd   %xmm0, %xmm2
        mulsd   %xmm2, %xmm2
        movabsq $139862498743480, %rax  # imm = 0x7F34468E14B8
        movsd   (%rax), %xmm3           # xmm3 = mem[0],zero
        subsd   %xmm0, %xmm3
        xorps   %xmm1, %xmm1
        cmpnlesd        %xmm0, %xmm1
        andpd   %xmm1, %xmm3
        andnpd  %xmm2, %xmm1
        orpd    %xmm3, %xmm1
        movapd  %xmm1, %xmm0
        popq    %rbp
        retq
        nopw    %cs:(%rax,%rax)

julia> @code_native f2(1.0)
        .text
Filename: REPL[3]
        pushq   %rbp
        movq    %rsp, %rbp
Source line: 1
        xorps   %xmm1, %xmm1
        ucomisd %xmm1, %xmm0
        jae     L37
        movabsq $139862498680736, %rax  # imm = 0x7F34468D1FA0
        movsd   (%rax), %xmm1           # xmm1 = mem[0],zero
        subsd   %xmm0, %xmm1
        movapd  %xmm1, %xmm0
        popq    %rbp
        retq
L37:
        movabsq $139862498680728, %rax  # imm = 0x7F34468D1F98
        addsd   (%rax), %xmm0
        mulsd   %xmm0, %xmm0
        popq    %rbp
        retq
        nopl    (%rax)

In my application the speed gain was more modest, but still sizeable. Benchmarking a non-branching version of your code is sometimes worth it, especially if it the change is simple and both branches of the conditional can be run error-free. If, for example, we had to calculate

g(x) = x  0 ? (x+2) : 1-x

then we could not use ifelse without restricting the domain, since √(x+2) would fail whenever x < -2.

Julia Base contains many optimizations like this: for a particularly nice example see functions that use Base.null_safe_op.

Blog redesign 2.0

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/blog-redesign-201709/

I have redesigned my blog (again), mainly tweaking the CSS and
hopefully achieving better support on small screens (which are still
not ideal for math, but now you should get a red warning float at the
bottom).

I also re-did the feed code so that it would render better on
juliabloggers.com. Now the whole
content of each post should be scraped seamlessly, and appear
correctly. However, the only way to test the whole toolchain is to do
it live, and I apologize if something is still not perfect and you get
bogus updates (BTW, this post should not show up in the Julia feed).

Highlights of the changes:

  1. responsive design,

  2. nicer fonts,

  3. line breaks in MathJax when necessary and supported,

  4. better highlighting (Julia now looks especially nice),

  5. embedded code blocks with a download link,

  6. better image placement,

  7. Emacs screenshots now use a branch of emacs-htmlize, which hopefully gets merged soon.

As always, the source code for the whole site is available.

Automatic differentiation of discontinuous integrals

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/discontinuous_integral_ad/

This is a much simplified writeup of a problem I encountered, in a self-contained blog post.

You want to approximate the integral

\[
I(\theta) = \int 1(g(x) > 0) f(x,\theta) dx
\]

where \(g\) is a continuous function, and \(f(x,\theta)\) is a parametric distribution over \(x\). Everthing is continous, and thus \(I\) is, too.

You face the following constraints:

  1. \(g\) is a black box. We will pretend that you can’t invert it (except for checking our results, of course).

  2. You can calculate the probability density function \(f\) and even draw \(x\)‘s for a particular \(\theta\), but that’s pretty much it. You don’t even get a cdf! (Again, except for checking our results.)

Using Monte Carlo methods, you can do the following:

  1. draw \(x_i \sim F(\cdot, \theta)\), \(i=1,\dots,N\),

  2. approximate

\[
I(\theta) \approx \frac{1}{N}\sum_i 1(g(x_i) > 0)
\]

You could code this in Julia as

d = distr(θ)   # suppose this returns some distribution that supports Base.rand
x = rand(d, N)
mean(g.(x) > 0)

So far, neat and simple. Now, the fly in the ointment: you need the derivative \(I'(\theta)\) for optimization or Hamiltonian Monte Carlo. The problem is that you cannot ForwardDiff your way through the code above: AD’ing a discontinuous step function will just give you \(0\), and rand does not work with ForwardDiff.Duals anyway (which is very sensible).

However, there is a solution: rewrite the approximation as

\[
I(\theta; \theta_0) \approx \frac{1}{N}\sum_i 1(g(x_i) > 0) \frac{f(x_i,\theta)}{f(x_i,\theta_0)}
\]

where \(\theta_0\) is the parameter used to simulate the \(x_i\). Differentiate the above at \(\theta = \theta_0\). This approximates

\[
I'(\theta) = \int 1(g(x) > 0) \frac{\partial f(x,\theta)/\partial \theta}{f(x,\theta)}f(x,\theta) dx = \int 1(g(x) > 0) \partial f(x,\theta)/\partial \theta dx
\]

which is exactly what you want.

Assume that the actual calculation is very complicated, so we would rather avoid implementing it for the integral and the derivative separately. It turns out that this is very simple to do with ForwardDiff.Dual values: the code is literally a one-liner and a fallback method:

elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))

which you can use in a function like

integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))

I demonstrate this with \(g(x) = x\) and \(x \sim \text{Normal}(\theta, 1)\), for which of course we know that the analytical values for \(I\) and \(I’\) are the right tail probability and the pdf at 0, respectively.

Graphs below show that the approximation is reasonable — we could make it much better with low-discrepancy sequences, but that is an orthogonal issue.

integral

derivative

It is amazing how much you can accomplish with two lines of code in Julia! The problem that motivated this blog post is multivariate with irregular regions over which \(\{ x: g(x) > 0 \}\), but I used elasticity as above.

Self-contained code for everything is available below.

download as code.jl

using ForwardDiff
import ForwardDiff: Dual, value, partials, Tag
using Distributions
using Plots
using LaTeXStrings
gr()

######################################################################
# elasticity calculation
######################################################################

"""
    elasticity(x)

Calculate `x/x`, stripping `x` of partial derivative information. Useful for
calculating elasticities of the form `∂f/f` using ForwardDiff.
"""
elasticity(x::Real) = one(x)

elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))

######################################################################
# example application
######################################################################

integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))

"Helper function that returns the value and derivative at the same time."
function value_and_derivative(f::F, x::R) where {F,R<:Real}
    T = typeof(Tag(F, R))
    y = f(Dual{T}(x, one(x)))
    value(y), partials(y, 1)
end

distr(θ) = Normal(θ, 1)

g(x) = x

function ID_analytical(θ)
    d = distr(θ)
    ccdf(d, 0), pdf(d, 0)
end

function ID_approx(θ)
    x = rand(distr(θ), 1000)
    value_and_derivative(θ->integral_approx(g, distr(θ), x), θ)
end

θs = linspace(-2, 2, 51)
ID0s = ID_analytical.(θs)
ID1s = ID_approx.(θs)

scatter(θs, first.(ID0s), xlab = "\\theta", ylab = "integral",
        label = "analytical", legend = :topleft)
scatter!(θs, first.(ID1s), label = "approximation")
savefig("integral.svg")

scatter(θs, last.(ID0s), xlab = "\\theta", ylab = "derivative",
        label = "analytical", legend = :topleft)
scatter!(θs, last.(ID1s), label = "approximation")
savefig("derivative.svg")