By: Tamás K. Papp
Re-posted from: https://tamaspapp.eu/post/discontinuous_integral_ad/
This is a much simplified writeup of a problem I encountered, in a self-contained blog post.
You want to approximate the integral
I(θ)=∫1(g(x)>0)f(x,θ)dxI(θ)=∫1(g(x)>0)f(x,θ)dx
where gg is a continuous function, and f(x,θ)f(x,θ) is a parametric distribution over xx. Everthing is continous, and thus II is, too.
You face the following constraints:
-
gg is a black box. We will pretend that you can’t invert it (except for checking our results, of course).
-
You can calculate the probability density function ff and even draw xx‘s for a particular θθ, but that’s pretty much it. You don’t even get a cdf! (Again, except for checking our results.)
Using Monte Carlo methods, you can do the following:
-
draw xi∼F(⋅,θ)xi∼F(⋅,θ), i=1,…,Ni=1,…,N,
-
approximate
I(θ)≈1N∑i1(g(xi)>0)I(θ)≈1N∑i1(g(xi)>0)
You could code this in Julia as
d = distr(θ) # suppose this returns some distribution that supports Base.rand
x = rand(d, N)
mean(g.(x) > 0)
So far, neat and simple. Now, the fly in the ointment: you need the derivative I′(θ)I′(θ) for optimization or Hamiltonian Monte Carlo. The problem is that you cannot ForwardDiff
your way through the code above: AD’ing a discontinuous step function will just give you 00, and rand
does not work with ForwardDiff.Dual
s anyway (which is very sensible).
However, there is a solution: rewrite the approximation as
I(θ;θ0)≈1N∑i1(g(xi)>0)f(xi,θ)f(xi,θ0)I(θ;θ0)≈1N∑i1(g(xi)>0)f(xi,θ)f(xi,θ0)
where θ0θ0 is the parameter used to simulate the xixi. Differentiate the above at θ=θ0θ=θ0. This approximates
I′(θ)=∫1(g(x)>0)∂f(x,θ)/∂θf(x,θ)f(x,θ)dx=∫1(g(x)>0)∂f(x,θ)/∂θdxI′(θ)=∫1(g(x)>0)∂f(x,θ)/∂θf(x,θ)f(x,θ)dx=∫1(g(x)>0)∂f(x,θ)/∂θdx
which is exactly what you want.
Assume that the actual calculation is very complicated, so we would rather avoid implementing it for the integral and the derivative separately. It turns out that this is very simple to do with ForwardDiff.Dual
values: the code is literally a one-liner and a fallback method:
elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))
which you can use in a function like
integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))
I demonstrate this with g(x)=xg(x)=x and x∼Normal(θ,1)x∼Normal(θ,1), for which of course we know that the analytical values for II and I′I′ are the right tail probability and the pdf at 0, respectively.
Graphs below show that the approximation is reasonable — we could make it much better with low-discrepancy sequences, but that is an orthogonal issue.
It is amazing how much you can accomplish with two lines of code in Julia! The problem that motivated this blog post is multivariate with irregular regions over which {x:g(x)>0}{x:g(x)>0}, but I used elasticity
as above.
Self-contained code for everything is available below.
download as code.jl
using ForwardDiff
import ForwardDiff: Dual, value, partials, Tag
using Distributions
using Plots
using LaTeXStrings
gr()
######################################################################
# elasticity calculation
######################################################################
"""
elasticity(x)
Calculate `x/x`, stripping `x` of partial derivative information. Useful for
calculating elasticities of the form `∂f/f` using ForwardDiff.
"""
elasticity(x::Real) = one(x)
elasticity(x::Dual{T,V,N}) where {T,V,N} = Dual{T}(one(V), partials(x) / value(x))
######################################################################
# example application
######################################################################
integral_approx(g, d, x) = mean((g.(x) .> 0) .* elasticity.(pdf.(d, x)))
"Helper function that returns the value and derivative at the same time."
function value_and_derivative(f::F, x::R) where {F,R<:Real}
T = typeof(Tag(F, R))
y = f(Dual{T}(x, one(x)))
value(y), partials(y, 1)
end
distr(θ) = Normal(θ, 1)
g(x) = x
function ID_analytical(θ)
d = distr(θ)
ccdf(d, 0), pdf(d, 0)
end
function ID_approx(θ)
x = rand(distr(θ), 1000)
value_and_derivative(θ->integral_approx(g, distr(θ), x), θ)
end
θs = linspace(-2, 2, 51)
ID0s = ID_analytical.(θs)
ID1s = ID_approx.(θs)
scatter(θs, first.(ID0s), xlab = "\\theta", ylab = "integral",
label = "analytical", legend = :topleft)
scatter!(θs, first.(ID1s), label = "approximation")
savefig("integral.svg")
scatter(θs, last.(ID0s), xlab = "\\theta", ylab = "derivative",
label = "analytical", legend = :topleft)
scatter!(θs, last.(ID1s), label = "approximation")
savefig("derivative.svg")