Automatic differentiation of discontinuous integrals

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/discontinuous_integral_ad/

This is a much simplified writeup of a problem I encountered, in a self-contained blog post.

You want to approximate the integral

I(θ)=1(g(x)>0)f(x,θ)dxI(θ)=1(g(x)>0)f(x,θ)dx

where gg is a continuous function, and f(x,θ)f(x,θ) is a parametric distribution over xx. Everthing is continous, and thus II is, too.

You face the following constraints:

  1. gg is a black box. We will pretend that you can’t invert it (except for checking our results, of course).

  2. You can calculate the probability density function ff and even draw xx‘s for a particular θθ, but that’s pretty much it. You don’t even get a cdf! (Again, except for checking our results.)

Using Monte Carlo methods, you can do the following:

  1. draw xiF(,θ)xiF(,θ), i=1,,Ni=1,,N,

  2. approximate

I(θ)1Ni1(g(xi)>0)I(θ)1Ni1(g(xi)>0)

You could code this in Julia as

d = distr(θ)   # suppose this returns some distribution that supports Base.rand
x = rand(d, N)
mean(g.(x) > 0)

So far, neat and simple. Now, the fly in the ointment: you need the derivative I(θ)I(θ) for optimization or Hamiltonian Monte Carlo. The problem is that you cannot ForwardDiff your way through the code above: AD’ing a discontinuous step function will just give you 00, and rand does not work with ForwardDiff.Duals anyway (which is very sensible).

However, there is a solution: rewrite the approximation as

I(θ;θ0)1Ni1(g(xi)>0)f(xi,θ)f(xi,θ0)I(θ;θ0)1Ni1(g(xi)>0)f(xi,θ)f(xi,θ0)

where θ0θ0 is the parameter used to simulate the xixi. Differentiate the above at θ=θ0θ=θ0. This approximates

I(θ)=1(g(x)>0)f(x,θ)/θf(x,θ)f(x,θ)dx=1(g(x)>0)f(x,θ)/θdxI(θ)=1(g(x)>0)f(x,θ)/θf(x,θ)f(x,θ)dx=1(g(x)>0)f(x,θ)/θdx

which is exactly what you want.

Assume that the actual calculation is very complicated, so we would rather avoid implementing it for the integral and the derivative separately. It turns out that this is very simple to do with ForwardDiff.Dual values: the code is literally a one-liner and a fallback method:

elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))

which you can use in a function like

integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))

I demonstrate this with g(x)=xg(x)=x and xNormal(θ,1)xNormal(θ,1), for which of course we know that the analytical values for II and II are the right tail probability and the pdf at 0, respectively.

Graphs below show that the approximation is reasonable — we could make it much better with low-discrepancy sequences, but that is an orthogonal issue.

integral

derivative

It is amazing how much you can accomplish with two lines of code in Julia! The problem that motivated this blog post is multivariate with irregular regions over which {x:g(x)>0}{x:g(x)>0}, but I used elasticity as above.

Self-contained code for everything is available below.

download as code.jl

using ForwardDiff
import ForwardDiff: Dual, value, partials, Tag
using Distributions
using Plots
using LaTeXStrings
gr()

######################################################################
# elasticity calculation
######################################################################

"""
    elasticity(x)

Calculate `x/x`, stripping `x` of partial derivative information. Useful for
calculating elasticities of the form `∂f/f` using ForwardDiff.
"""
elasticity(x::Real) = one(x)

elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))

######################################################################
# example application
######################################################################

integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))

"Helper function that returns the value and derivative at the same time."
function value_and_derivative(f::F, x::R) where {F,R<:Real}
    T = typeof(Tag(F, R))
    y = f(Dual{T}(x, one(x)))
    value(y), partials(y, 1)
end

distr(θ) = Normal(θ, 1)

g(x) = x

function ID_analytical(θ)
    d = distr(θ)
    ccdf(d, 0), pdf(d, 0)
end

function ID_approx(θ)
    x = rand(distr(θ), 1000)
    value_and_derivative(θ->integral_approx(g, distr(θ), x), θ)
end

θs = linspace(-2, 2, 51)
ID0s = ID_analytical.(θs)
ID1s = ID_approx.(θs)

scatter(θs, first.(ID0s), xlab = "\\theta", ylab = "integral",
        label = "analytical", legend = :topleft)
scatter!(θs, first.(ID1s), label = "approximation")
savefig("integral.svg")

scatter(θs, last.(ID0s), xlab = "\\theta", ylab = "derivative",
        label = "analytical", legend = :topleft)
scatter!(θs, last.(ID1s), label = "approximation")
savefig("derivative.svg")