Author Archives: Julia on μβ

Differentiating the discrete: Automatic Differentiation meets Integer Optimization

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2020-01-23-discrete-diff/

Table of Contents

HAHAHUGOSHORTCODE-TOC0-HBHB

In continuous convex optimization, duality is often the theoretical foundation for
computing the sensibility of the optimal value of a problem to
one of its parameters. In the non-linear domain, it is fairly standard to assume
one can compute at any point of the domain the function $f(x)$ and gradient
$\nabla f(x)$.

What about discrete optimization?
The first thought would be that differentiating
the resolution of a discrete problem does not make sense, the information it yields
since infinitesimal variations in the domain of the variables do not make sense.

However, three cases come to mind for which asking for gradients makes perfect sense:

  1. In mixed-integer linear problems, some variables take continuous values.
    All linear expressions are differentiable, and every constraint coefficient,
    right-hand-side and objective coefficient can have an attached partial derivative.

  2. Even in pure-integer problems, the objective value will be a continuous
    function of the coefficients, possibly locally smooth, for which one can get
    the partial derivative associated with each weight.

  3. We might be interested in computing the derivative of some expression
    of the variables with respect to some parameters, without this expression
    being the objective.

For these points, some duality-based techniques and reformulations can be used,
sometimes very expensive when the input size grows.
One common approach is to first
solve the problem, then fixing the integer variables and re-solving the
continuous part of the problem to compute the dual values associated with
each constraint, and the reduced cost coefficients.
This leads to solving a NP-hard problem, followed by a second solution from
scratch of a linear optimization problem, still, it somehow works.

More than just solving the model and computing results, one major use case
is embarking the result of an optimization problem into another more complete
program. The tricks developed above cannot be integrated with an automated way
of computing derivatives.

Automatic Differentiation

Automatic Differentiation is far from new, but has known a gain in attention
in the last decade with its used in ML, increasing the usability of the available
libraries. It consists in getting an augmented information out of a function.

If a function has a type signature f: a -> b, the goal is, without modifying
the function, to compute a derivative, which is also a function, which to every
point in the domain, yields a linear map from domain to co-domain df: a -> (a -o b),
where a -o b denotes a linear map, regardless of underlying representation (matrix, function, …).
See the talk and paper1 for a type-based formalism of AD if you are ok with programming language formalism.

Automatic differentiation on a pure-Julia solver

ConstraintSolver.jl is a recent
project by Wikunia. As the name indicates, it is a
constraint programming
solver, a more Computer-Science-flavoured approach to integer optimization.
As a Julia solver, it can leverage both multiple dispatch and the type system
to benefit from some features for free. One example of such
feature is automatic differentiation: if your function is generic enough
(not relying on a specific implementation of number types, such as Float64),
gradients with respect to some parameters can be computed by calling the function
just once (forward-mode automatic differentiation).

Example problem: weighted independent set

Let us consider a classical problem in combinatorial optimization, given an undirected graph
$G = (V, E)$, finding a subset of the vertices, such that no two vertices in the
subset are connected by an edge, and that the total weight of the chosen vertices
is maximized.

Optimization model of the weighted independent set

Formulated as an optimization problem, it looks as follows:

$$\begin{align}
(\mathcal{P}): \max_{x} & \sum_{i \in V} w_i x_i \\\
\text{s.t.} \\\
& x_i + x_j \leq 1 \,\, \forall (i,j) \in E \\\
& x \in \mathbb{B}^{|V|}
\end{align}
$$

Translated to English, this would be maximizing the weighted sum of picked
vertices, which are decisions living in the $|V|$-th dimensional binary space,
such that for each edge, no two vertices can be chosen.
The differentiable function here is the objective value of such optimization
problem, and the parameters we differentiate with respect to are the weights
attached to each vertex $w_i$. We will denote it $f(w) = \max_x (\mathcal{P}_w)$.

If a vertex $i$ is not chosen in a solution, there are two cases:

  • the vertex has the same weight as at least one other, say $j$, such that
    swapping $i$ and $j$ in the selected subset does not change the optimal value.
    of $\mathcal{P}$.
    In that case, there is a kink in the function, a discontinuity of the derivative,
    which may not be computed correctly by automatic differentiation.
    This is related to the phenomenon of degeneracy in the simplex algorithm,
    multiple variables could be chosen equivalently to enter the base.
  • there is no other vertex with the same weight, such that swapping the two
    maintains the same objective value. In that case, the derivative is $0$,
    small enough variations of the weight does not change the solution nor the objective.

If a vertex $i$ is chosen in a solution, then $x_i = 1$, and the corresponding
partial derivative of the weight is $\frac{\partial f(w)}{\partial w_i} = 1$.

A Julia implementation

We will import a few packages, mostly MathOptInterface.jl (MOI), the foundation for
constrained optimization, the solver itself, the Test standard lib, and ForwardDiff.jl
for automatic differentiation.

using Test
import ConstraintSolver
const CS = ConstraintSolver

import MathOptInterface
const MOI = MathOptInterface

import ForwardDiff

Let us first write an implementation for the max-weight independent set problem.
We will use a 4-vertex graph, looking as such:

Weighted graph

The optimal answer here is to pick vertices 1 and 4 (in orange).

@testset "Max independent set MOI" begin
    matrix = [
        0 1 1 0
        1 0 1 0
        1 1 0 1
        0 0 1 0
    ]
    model = CS.Optimizer()
    x = [MOI.add_constrained_variable(model, MOI.ZeroOne()) for _ in 1:4]
    for i in 1:4, j in 1:4
        if matrix[i,j] == 1 && i < j
            (z, _) = MOI.add_constrained_variable(model, MOI.GreaterThan(0.0))
            MOI.add_constraint(model, z, MOI.Integer())
            MOI.add_constraint(model, z, MOI.LessThan(1.0))
            f = MOI.ScalarAffineFunction(
                [
                    MOI.ScalarAffineTerm(1.0, x[i][1]),
                    MOI.ScalarAffineTerm(1.0, x[j][1]),
                    MOI.ScalarAffineTerm(1.0, z),
                ], 0.0
            )
            MOI.add_constraint(model, f, MOI.EqualTo(1.0))
        end
    end
    weights = [0.2, 0.1, 0.2, 0.1]
    terms = [MOI.ScalarAffineTerm(weights[i], x[i][1]) for i in eachindex(x)]
    objective = MOI.ScalarAffineFunction(terms, 0.0)
    MOI.set(model, MOI.ObjectiveFunction{typeof(objective)}(), objective)
    MOI.set(model, MOI.ObjectiveSense(), MOI.MAX_SENSE)
    MOI.optimize!(model)
    # add some tests
end

Why the additional code with(z, _) = MOI.add_constrained_variable(model, MOI.GreaterThan(0.0))?
ConstraintSolver.jl does not yet support constraints of the type a x + b y <= c,
but linear equality constraints are fine, so we can derive equivalent formulations by adding a
slack variable z.

For this problem, the tests could be on both the solution and objective value, as follows:

@test MOI.get(model, MOI.VariablePrimal(), x[4][1]) == 1
@test MOI.get(model, MOI.VariablePrimal(), x[1][1]) == 1
@test MOI.get(model, MOI.ObjectiveValue()) ≈ 0.3

An equivalent JuMP version would look look this:

@testset""
matrix = [
    0 1 1 0
    1 0 1 0
    1 1 0 1
    0 0 1 0
]
m = Model(with_optimizer(CS.Optimizer))
x = @variable(m, x[1:4], Bin)
for i in 1:4, j in i+1:4
    if matrix[i,j] == 1
        zcomp = @variable(m)
        JuMP.set_binary(zcomp)
        @constraint(m, x[i] + x[j] + zcomp == 1)
    end
end
w = [0.2, 0.1, 0.2, 0.1]
@objective(m, Max, dot(w, x))
optimize!(m)

Why are we not using JuMP, which is much more concise and closer to the
mathematical formulation?

JuMP uses Float64 for all value types, which means we do not get the benefit of
generic types, while MathOptInterface types are parameterized by the numeric type used.
To be fair, maintaining type genericity on a project as large as JuMP
is hard without making performance compromises. JuMP is not built of functions, but
of a model object which contains a mutable state of the problem being constructed,
and building an Algebraic Modelling Language without this incremental build of the
model has not proved successful till now. One day, we may get a powerful declarative
DSL for mathematical optimization, but it has not come yet.

Back to our problem, we now have a way to compute the optimal value and solution.
Let us implement our function $f(w)$:


function weighted_stable_set(w)
    matrix = [
        0 1 1 0
        1 0 1 0
        1 1 0 1
        0 0 1 0
    ]
    model = CS.Optimizer(solution_type = Real)
    x = [MOI.add_constrained_variable(model, MOI.ZeroOne()) for _ in 1:4]
    for i in 1:4, j in 1:4
        if matrix[i,j] == 1 && i < j
            (z, _) = MOI.add_constrained_variable(model, MOI.GreaterThan(0.0))
            MOI.add_constraint(model, z, MOI.Integer())
            MOI.add_constraint(model, z, MOI.LessThan(1.0))
            f = MOI.ScalarAffineFunction(
                [
                    MOI.ScalarAffineTerm(1.0, x[i][1]),
                    MOI.ScalarAffineTerm(1.0, x[j][1]),
                    MOI.ScalarAffineTerm(1.0, z),
                ], 0.0
            )
            MOI.add_constraint(model, f, MOI.EqualTo(1.0))
        end
    end
    terms = [MOI.ScalarAffineTerm(w[i], x[i][1]) for i in eachindex(x)]
    objective = MOI.ScalarAffineFunction(terms, zero(eltype(w)))
    MOI.set(model, MOI.ObjectiveFunction{typeof(objective)}(), objective)
    MOI.set(model, MOI.ObjectiveSense(), MOI.MAX_SENSE)
    MOI.optimize!(model)
    return MOI.get(model, MOI.ObjectiveValue())
end

We can now compute the gradient in one function call with ForwardDiff:

@testset "Differentiating stable set" begin
    weights = [0.2, 0.1, 0.2, 0.1]
    ∇w = ForwardDiff.gradient(weighted_stable_set, weights)
    @test ∇w[1] ≈ 1
    @test ∇w[4] ≈ 1
    @test ∇w[2] ≈ ∇w[3] ≈ 0
end

To understand how this derivative computation can work with just few
function calls (proportional to the size of the input), one must dig
a bit deeper in Dual Numbers.
I will shamelessly refer to my slides
at the Lambda Lille meetup for an example implementation in Haskell.

Why not reverse-mode?

I mentioned that the cost of computing the value & derivatives is proportional
to the size of the input, which can increase rapidly for real-world problems.
This is specific to so-called forward mode automatic differentiation.
We will not go over the inner details of forward versus reverse.
As a rule of thumb, forward-mode has less overhead, and is better when the
dimension of the output far exceeds the dimension of the input, while
reverse-mode is better when the dimension of the input exceeds the one
of the output.

Giving reverse with Zygote a shot

Getting back to our question, the answer is rather down-to-earth,
the reverse-mode I tried simply did not work there.
Reverse-mode requires tracing the normal function call, building a
“tape”, this means that it needs a representation of the function
(as a graph or other).
I gave Zygote.jl
a try, which can be done by replacing ForwardDiff.gradient(f,x) with
Zygote.gradient(f, x) in the snippet above.
Building a representation of the function means Zygote must have a
representation of all operations performed. For the moment,
this is still restricted to a subset of the Julia language
(which is far more complex than commonly encountered mathematical functions
built as a single expression). This subset still excludes throwing and
handling exceptions, which is quite present in both ConstraintSolver.jl
and MathOptInterface.

I have not tried the other reverse tools for the sake of conciseness (and time),
so feel free to check out Nabla.jl,
ReverseDiff.jl
and Tracker.jl.

How could this be improved?

A first solution could be to move the idiom of Julia from throw/try/catch
to handling errors as values, using something like the Result/Either type
in Scala / Haskell / Rust and corresponding libraries.

Another alternative, currently happening is to keep pushing Zygote to support
more features from Julia, going in the direction of supporting differentiation
of any program, as dynamic as it gets.

One last option for the particular problem of exception handling would be
to be able to opt-out of input validation, with some @validate expr,
with expr potentially throwing or handling an error, and a @nocheck
or @nothrows macro in front of the function call, considering the function
will remain on the happy path and not guaranteeing validity or error messages
otherwise. This works exactly like the @boundscheck, @inbounds pair for
index validation.

Conclusion, speculation, prospect

This post is already too long so we’ll stop there.
The biggest highlights here are that:

  • In discrete problems, we also have some continuous parts.
  • Julia’s type system allows AD to work almost out of the box in most cases.
  • With JuMP and MOI, solving optimization problems is just another algorithmic building block in your Julia program, spitting out results, and derivatives if you make them.
  • I believe that’s why plugging in solvers developed in C/C++ is fine, but not always what we want. I would be ready to take a performance hit on the computation time of my algorithms to have some hackable, type-generic MILP solver in pure Julia.2

Special mentions

Thanks a lot to Wikunia, first for developing ConstraintSolver.jl,
without which none of this would have been possible, and for the open discussion on the multiple
issues I posted. Don’t hesitate to check out his blog,
where the whole journey from 0 to a constraint solver is documented.


  1. The simple essence of automatic differentiation, Conal Elliott, Proceedings of the ACM on Programming Languages (ICFP), 2018
    ^
  2. I believe a pure-Julia solver could be made as fast as a C/C++ solver, but developing solvers is an enormous amount of work and micro-optimizations, tests on industrial cases. The new HiGHS solver however shows that one can get pretty good results by developing a linear solver from scratch with all modern techniques already baked in.
    ^

Differentiating the discrete: Automatic Differentiation meets Integer Optimization

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2020-01-23-discrete-diff/

In continuous convex optimization, duality is often the theoretical foundation for
computing the sensibility of the optimal value of a problem to
one of its parameters. In the non-linear domain, it is fairly standard to assume
one can compute at any point of the domain the function $f(x)$ and gradient
$\nabla f(x)$.

What about discrete optimization?
The first thought would be that differentiating
the resolution of a discrete problem does not make sense, the information it yields
since infinitesimal variations in the domain of the variables do not make sense.

However, three cases come to mind for which asking for gradients makes perfect sense:

  1. In mixed-integer linear problems, some variables take continuous values.
    All linear expressions are differentiable, and every constraint coefficient,
    right-hand-side and objective coefficient can have an attached partial derivative.

  2. Even in pure-integer problems, the objective value will be a continuous
    function of the coefficients, possibly locally smooth, for which one can get
    the partial derivative associated with each weight.

  3. We might be interested in computing the derivative of some expression
    of the variables with respect to some parameters, without this expression
    being the objective.

For these points, some duality-based techniques and reformulations can be used,
sometimes very expensive when the input size grows.
One common approach is to first
solve the problem, then fixing the integer variables and re-solving the
continuous part of the problem to compute the dual values associated with
each constraint, and the reduced cost coefficients.
This leads to solving a NP-hard problem, followed by a second solution from
scratch of a linear optimization problem, still, it somehow works.

More than just solving the model and computing results, one major use case
is embarking the result of an optimization problem into another more complete
program. The tricks developed above cannot be integrated with an automated way
of computing derivatives.

Automatic Differentiation

Automatic Differentiation is far from new, but has known a gain in attention
in the last decade with its used in ML, increasing the usability of the available
libraries. It consists in getting an augmented information out of a function.

If a function has a type signature f: a -> b, the goal is, without modifying
the function, to compute a derivative, which is also a function, which to every
point in the domain, yields a linear map from domain to co-domain df: a -> (a -o b),
where a -o b denotes a linear map, regardless of underlying representation (matrix, function, …).
See the talk and paper1 for a type-based formalism of AD if you are ok with programming language formalism.

Automatic differentiation on a pure-Julia solver

ConstraintSolver.jl is a recent
project by Wikunia. As the name indicates, it is a
constraint programming
solver, a more Computer-Science-flavoured approach to integer optimization.
As a Julia solver, it can leverage both multiple dispatch and the type system
to benefit from some features for free. One example of such
feature is automatic differentiation: if your function is generic enough
(not relying on a specific implementation of number types, such as Float64),
gradients with respect to some parameters can be computed by calling the function
just once (forward-mode automatic differentiation).

Example problem: weighted independent set

Let us consider a classical problem in combinatorial optimization, given an undirected graph
$G = (V, E)$, finding a subset of the vertices, such that no two vertices in the
subset are connected by an edge, and that the total weight of the chosen vertices
is maximized.

Optimization model of the weighted independent set

Formulated as an optimization problem, it looks as follows:

$$\begin{align}
(\mathcal{P}): \max_{x} & \sum_{i \in V} w_i x_i \\\\
\text{s.t.} \\\\
& x_i + x_j \leq 1 \,\, \forall (i,j) \in E \\\\
& x \in \mathbb{B}^{|V|}
\end{align}
$$

Translated to English, this would be maximizing the weighted sum of picked
vertices, which are decisions living in the $|V|$-th dimensional binary space,
such that for each edge, no two vertices can be chosen.
The differentiable function here is the objective value of such optimization
problem, and the parameters we differentiate with respect to are the weights
attached to each vertex $w_i$. We will denote it $f(w) = \max_x (\mathcal{P}_w)$.

If a vertex $i$ is not chosen in a solution, there are two cases:

  • the vertex has the same weight as at least one other, say $j$, such that
    swapping $i$ and $j$ in the selected subset does not change the optimal value.
    of $\mathcal{P}$.
    In that case, there is a kink in the function, a discontinuity of the derivative,
    which may not be computed correctly by automatic differentiation.
    This is related to the phenomenon of degeneracy in the simplex algorithm,
    multiple variables could be chosen equivalently to enter the base.
  • there is no other vertex with the same weight, such that swapping the two
    maintains the same objective value. In that case, the derivative is $0$,
    small enough variations of the weight does not change the solution nor the objective.

If a vertex $i$ is chosen in a solution, then $x_i = 1$, and the corresponding
partial derivative of the weight is $\frac{\partial f(w)}{\partial w_i} = 1$.

A Julia implementation

We will import a few packages, mostly MathOptInterface.jl (MOI), the foundation for
constrained optimization, the solver itself, the Test standard lib, and ForwardDiff.jl
for automatic differentiation.

using Test
import ConstraintSolver
const CS = ConstraintSolver

import MathOptInterface
const MOI = MathOptInterface

import ForwardDiff

Let us first write an implementation for the max-weight independent set problem.
We will use a 4-vertex graph, looking as such:

Weighted graph

The optimal answer here is to pick vertices 1 and 4 (in orange).

@testset "Max independent set MOI" begin
    matrix = [
        0 1 1 0
        1 0 1 0
        1 1 0 1
        0 0 1 0
    ]
    model = CS.Optimizer()
    x = [MOI.add_constrained_variable(model, MOI.ZeroOne()) for _ in 1:4]
    for i in 1:4, j in 1:4
        if matrix[i,j] == 1 && i < j
            (z, _) = MOI.add_constrained_variable(model, MOI.GreaterThan(0.0))
            MOI.add_constraint(model, z, MOI.Integer())
            MOI.add_constraint(model, z, MOI.LessThan(1.0))
            f = MOI.ScalarAffineFunction(
                [
                    MOI.ScalarAffineTerm(1.0, x[i][1]),
                    MOI.ScalarAffineTerm(1.0, x[j][1]),
                    MOI.ScalarAffineTerm(1.0, z),
                ], 0.0
            )
            MOI.add_constraint(model, f, MOI.EqualTo(1.0))
        end
    end
    weights = [0.2, 0.1, 0.2, 0.1]
    terms = [MOI.ScalarAffineTerm(weights[i], x[i][1]) for i in eachindex(x)]
    objective = MOI.ScalarAffineFunction(terms, 0.0)
    MOI.set(model, MOI.ObjectiveFunction{typeof(objective)}(), objective)
    MOI.set(model, MOI.ObjectiveSense(), MOI.MAX_SENSE)
    MOI.optimize!(model)
    # add some tests
end

Why the additional code with(z, _) = MOI.add_constrained_variable(model, MOI.GreaterThan(0.0))?
ConstraintSolver.jl does not yet support constraints of the type a x + b y <= c,
but linear equality constraints are fine, so we can derive equivalent formulations by adding a
slack variable z.

For this problem, the tests could be on both the solution and objective value, as follows:

@test MOI.get(model, MOI.VariablePrimal(), x[4][1]) == 1
@test MOI.get(model, MOI.VariablePrimal(), x[1][1]) == 1
@test MOI.get(model, MOI.ObjectiveValue())  0.3

An equivalent JuMP version would look look this:

matrix = [
    0 1 1 0
    1 0 1 0
    1 1 0 1
    0 0 1 0
]
m = Model(with_optimizer(CS.Optimizer))
x = @variable(m, x[1:4], Bin)
for i in 1:4, j in i+1:4
    if matrix[i,j] == 1
        zcomp = @variable(m)
        JuMP.set_binary(zcomp)
        @constraint(m, x[i] + x[j] + zcomp == 1)
    end
end
w = [0.2, 0.1, 0.2, 0.1]
@objective(m, Max, dot(w, x))
optimize!(m)

Why are we not using JuMP, which is much more concise and closer to the
mathematical formulation?

JuMP uses Float64 for all value types, which means we do not get the benefit of
generic types, while MathOptInterface types are parameterized by the numeric type used.
To be fair, maintaining type genericity on a project as large as JuMP
is hard without making performance compromises. JuMP is not built of functions, but
of a model object which contains a mutable state of the problem being constructed,
and building an Algebraic Modelling Language without this incremental build of the
model has not proved successful till now. One day, we may get a powerful declarative
DSL for mathematical optimization, but it has not come yet.

Back to our problem, we now have a way to compute the optimal value and solution.
Let us implement our function $f(w)$:


function weighted_stable_set(w)
    matrix = [
        0 1 1 0
        1 0 1 0
        1 1 0 1
        0 0 1 0
    ]
    model = CS.Optimizer(solution_type = Real)
    x = [MOI.add_constrained_variable(model, MOI.ZeroOne()) for _ in 1:4]
    for i in 1:4, j in 1:4
        if matrix[i,j] == 1 && i < j
            (z, _) = MOI.add_constrained_variable(model, MOI.GreaterThan(0.0))
            MOI.add_constraint(model, z, MOI.Integer())
            MOI.add_constraint(model, z, MOI.LessThan(1.0))
            f = MOI.ScalarAffineFunction(
                [
                    MOI.ScalarAffineTerm(1.0, x[i][1]),
                    MOI.ScalarAffineTerm(1.0, x[j][1]),
                    MOI.ScalarAffineTerm(1.0, z),
                ], 0.0
            )
            MOI.add_constraint(model, f, MOI.EqualTo(1.0))
        end
    end
    terms = [MOI.ScalarAffineTerm(w[i], x[i][1]) for i in eachindex(x)]
    objective = MOI.ScalarAffineFunction(terms, zero(eltype(w)))
    MOI.set(model, MOI.ObjectiveFunction{typeof(objective)}(), objective)
    MOI.set(model, MOI.ObjectiveSense(), MOI.MAX_SENSE)
    MOI.optimize!(model)
    return MOI.get(model, MOI.ObjectiveValue())
end

We can now compute the gradient in one function call with ForwardDiff:

@testset "Differentiating stable set" begin
    weights = [0.2, 0.1, 0.2, 0.1]
    ∇w = ForwardDiff.gradient(weighted_stable_set, weights)
    @test ∇w[1]  1
    @test ∇w[4]  1
    @test ∇w[2]  ∇w[3]  0
end

To understand how this derivative computation can work with just few
function calls (proportional to the size of the input), one must dig
a bit deeper in Dual Numbers.
I will shamelessly refer to my slides
at the Lambda Lille meetup for an example implementation in Haskell.

Why not reverse-mode?

I mentioned that the cost of computing the value & derivatives is proportional
to the size of the input, which can increase rapidly for real-world problems.
This is specific to so-called forward mode automatic differentiation.
We will not go over the inner details of forward versus reverse.
As a rule of thumb, forward-mode has less overhead, and is better when the
dimension of the output far exceeds the dimension of the input, while
reverse-mode is better when the dimension of the input exceeds the one
of the output.

Giving reverse with Zygote a shot

Getting back to our question, the answer is rather down-to-earth,
the reverse-mode I tried simply did not work there.
Reverse-mode requires tracing the normal function call, building a
“tape”, this means that it needs a representation of the function
(as a graph or other).
I gave Zygote.jl
a try, which can be done by replacing ForwardDiff.gradient(f,x) with
Zygote.gradient(f, x) in the snippet above.
Building a representation of the function means Zygote must have a
representation of all operations performed. For the moment,
this is still restricted to a subset of the Julia language
(which is far more complex than commonly encountered mathematical functions
built as a single expression). This subset still excludes throwing and
handling exceptions, which is quite present in both ConstraintSolver.jl
and MathOptInterface.

I have not tried the other reverse tools for the sake of conciseness (and time),
so feel free to check out Nabla.jl,
ReverseDiff.jl
and Tracker.jl.

How could this be improved?

A first solution could be to move the idiom of Julia from throw/try/catch
to handling errors as values, using something like the Result/Either type
in Scala / Haskell / Rust and corresponding libraries.

Another alternative, currently happening is to keep pushing Zygote to support
more features from Julia, going in the direction of supporting differentiation
of any program, as dynamic as it gets.

One last option for the particular problem of exception handling would be
to be able to opt-out of input validation, with some @validate expr,
with expr potentially throwing or handling an error, and a @nocheck
or @nothrows macro in front of the function call, considering the function
will remain on the happy path and not guaranteeing validity or error messages
otherwise. This works exactly like the @boundscheck, @inbounds pair for
index validation.

Conclusion, speculation, prospect

This post is already too long so we’ll stop there.
The biggest highlights here are that:

  • In discrete problems, we also have some continuous parts.
  • Julia’s type system allows AD to work almost out of the box in most cases.
  • With JuMP and MOI, solving optimization problems is just another algorithmic building block in your Julia program, spitting out results, and derivatives if you make them.
  • I believe that’s why plugging in solvers developed in C/C++ is fine, but not always what we want. I would be ready to take a performance hit on the computation time of my algorithms to have some hackable, type-generic MILP solver in pure Julia.2

Special mentions

Thanks a lot to Wikunia, first for developing ConstraintSolver.jl,
without which none of this would have been possible, and for the open discussion on the multiple
issues I posted. Don’t hesitate to check out his blog,
where the whole journey from 0 to a constraint solver is documented.


  1. The simple essence of automatic differentiation, Conal Elliott, Proceedings of the ACM on Programming Languages (ICFP), 2018 ↩︎

  2. I believe a pure-Julia solver could be made as fast as a C/C++ solver, but developing solvers is an enormous amount of work and micro-optimizations, tests on industrial cases. The new HiGHS solver however shows that one can get pretty good results by developing a linear solver from scratch with all modern techniques already baked in. ↩︎

Working with binary libraries for optimization in Julia

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2019-11-04-binary-julia/

Unlike other ecosystems in the scientific programming world, scientists
and engineers working with Julia usually prefer a whole stack in Julia for many
reasons. The compiler is doing way better when able to infer what
is going on in a piece of code; when an error is thrown, the stack trace looks
much nicer when only pure Julia code is involved, functions and types can be
defined as generic as wanted without hard-coded container or number types for instance.

Sometimes however, inter-operability with native code is needed to use some
external native libraries. By that I mean natively built libraries
(*.so files on Linux systems, *.dylib on OSX, *.dll on Windows).
In this post, we will explore some tools to work with native libraries in Julia.
In the last couple weeks, I tinkered a bit with the HiGHS
solver developed at the University of Edinburgh, which I will use as an example
throughout this post. It is still work in progress, but has nice promises as the
next-generation linear optimization solver in the COIN-OR suite.

What does a native lib look like?

Looking at the repository,
it is a pretty standard CMake-based C++ project producing both an executable and
library which can be called through a C interface.
The two initial components are:

  1. The source code producing the library, this can be written in any language producing native code (C, C++, Rust)
  2. The header file defining the C API to call the library from other programs.

This interface is defined in a single header file src/interfaces/highs_c_api.h,
header files may define a bunch of types (structs, unions, enums) but most
importantly they define function prototypes looking like:

int preprocess_variables(int* values, double offset, float coefficient);

When using the function from Julia, the call to the native library looks like
the following:

ccall(
  (my_library_name, :preprocess_variables),
  CInt, # return type
  (Ptr{Cint}, Cdouble, Cfloat), # tuple of argument types
  (pointer(my_array), 3.5, 4.5f) # tuple of arguments
)

Let us dive in.

Solution 1: build and link

For this approach, the first step is to build the HiGHS library and have the
library available. Following the documentation, the instructions are:

cd HiGHS # where HiGHS is installed
mkdir build
cd build
cmake .. # generate makefiles
make # build everything here in the build directory

Like often with native packages, some dependencies might be implicitly assumed,
here is a Dockerfile building the project on an alpine machine, you should be
able to reproduce this with Docker installed.

FROM alpine:3.7

RUN apk add git cmake g++ gcc clang make
WORKDIR /optpreprocess_variables
RUN git clone https://github.com/ERGO-Code/HiGHS.git
RUN mkdir -p HiGHS/build
WORKDIR /opt/HiGHS/build
RUN cmake .. && make
RUN make test
RUN make install # optional

Now back to the Julia side, say we assume the library is available at a given
path, one can write the Julia functions corresponding to the interface. It is
preferable not to expose error-prone C calls to the user. In the example of
the preprocess_variables function defined above, a Julia wrapper would look
like:

function preprocess_variables(my_array::Vector{Cint}, offset::Cdouble, coefficient::Cfloat)
    result = ccall(
        (:preprocess_variables, my_library_name),
        Cint, (Ptr{Cint}, Cdouble, Cfloat),
        (pointer(my_array), 3.5, 4.5f)
    )
    return result
end

Once these wrapper functions are defined, users can convert their values to the
corresponding expected argument types and call them. The last thing needed is my_library_name,
which must be the path to the library object. Hard-coding or assuming paths
should be avoided, it makes software harder to install on some systems.
One thing that can be done is asking the user to pass the library path as an
environment variable:

ENV["HIGHS_DIR"] # should contain the path to the HIGHS directory
joinpath(ENV["HIGHS_DIR"], "build", "lib", "libhighs.so")

Doing this every time is however not convenient. Since library paths are not
changing at every call, one can check for this path at the installation of the
package. For this purpose, a file deps/build.jl can be added in every package
and will be run at the installation of the package or when the Pkg.build
command is called. A build.jl for our purpose could look like:

const highs_location = ENV["HIGHS_DIR"]
const libhighs = joinpath(highs_location, "build", "lib", "libhighs.so")
const depsfile = joinpath(@__DIR__, "deps.jl")
open(depsfile, "w") do f
    print(f, "const libhighs = ")
    print(f, libhighs)
    println(f)
end

The snippet above looks for the libhighs.so library, using the environment
variable as location of the base directory of HiGHS. Placed in build.jl,
the script will create a deps.jl file in the deps folder of the Julia
package, and write const libhighs = "/my/path/to/highs/lib/libhighs.so".
This is more or less what happens with the
SCIP.jl wrapper v0.9.
Once the build step succeeds, one can add in the main module in /src:

module HiGHS

const deps_file = joinpath(dirname(@__FILE__),"..","deps","deps.jl")
if isfile(deps_file)
    include(deps_file)
else
    error("HiGHS not properly installed. Please run import Pkg; Pkg.build(\"HiGHS\")")
end

# other things

end # module

The global constant libhighs can then be used for ccall.
We now have a functional package wrapping a native library downloaded and
built separately. Summing up what we have, the Julia wrapper package looks as
follows:

$ tree
.
├── Project.toml
├── README.md
├── deps
│   ├── build.jl
│   ├── build.log
│   ├── deps.jl
├── src
│   └── HiGHS.jl
└── test
    └── runtests.jl

deps/build.log and deps/deps.jl are not committed in the repository but
generated when installing and/or building the Julia package.

Lifting maintainers’ burden: generating wrapper functions with Clang.jl

One time-consuming task in the previous steps is going from the C header file
describing the API to Julia functions wrapping the ccall. The task is mostly
repetitive and can be automated using Clang.jl.
This package will generate the appropriate functions from a header file,
a reduced example looks like:

import Clang

# HIGHS_DIR = "path/to/highs/dir"
const header_file = joinpath(HIGHS_DIR, "include", "interfaces", "highs_c_api.h")
const LIB_HEADERS = [header_file]

const ctx = Clang.DefaultContext()

Clang.parse_headers!(ctx, LIB_HEADERS,
    includes=[Clang.CLANG_INCLUDE],
)

ctx.libname = "libhighs"
ctx.options["is_function_strictly_typed"] = true
ctx.options["is_struct_mutable"] = false

const api_file = joinpath(@__DIR__, "../src/wrapper", "$(ctx.libname)_api.jl")

open(api_file, "w") do f
    # write each generated function
    # ...
end

This snippet can be placed in a /gen folder of the Julia wrapper package and
writes to src/wrapper all the functions wrapping C calls.
It is less error-prone compared to manually writing the Julia interface and can
save a great deal of time when managing updates of the native library.
Again, the SCIP.jl wrapper uses this method and can be used as example.
Since the wrapper generation has different requirements than the package itself,
we can provide it a Project.toml.
Our package structure now looks like this:

$ tree
.
├── Project.toml
├── README.md
├── deps
│   ├── build.jl
│   ├── build.log
│   ├── deps.jl
├── gen
│   ├── Project.toml
│   └── gen.jl
├── src
│   ├── HiGHS.jl
│   └── wrapper
│       ├── libhighs_api.jl
│       └── libhighs_common.jl
└── test
    └── runtests.jl

Lifting the user’s burden: BinaryBuilder & BinaryProvider

For non-open-source software, what we did up to here this is the best you can get:
let users download and install the library, pass the path once at build time and
partly generate the Julia wrapper for ccall through Clang.jl.
For open-source libraries however, could we go a step further and do everything
for the user when they install the Julia package?

That’s where BinaryBuilder
and BinaryProvider come in.
See the Docker file above, BinaryBuilder uses the same technology and arcane
tricks to cross-compile the binary artifacts (executables and libraries) natively.
It does so by letting you install the library as you would on your own machine,
using cmake, make, make install, etc. The result of running BinaryBuilder is a
single Julia script build_tarballs.jl describing the commands run to produce
the artifacts.
This is placed in a repository with Continuous Integration support, which creates
releases for all specified architectures, OS, compilers.
You can see examples for the Clp solver here
and for HiGHS there.

Back to the Julia package, we can now modify the deps/build.jl script to use
BinaryProvider, fetching the binaries corresponding to the current system.
Without knowing anything about what’s going under the hood and how the library
is built, users can simply perform Pkg.add("ThePackage") which will build
automatically and explicitly specify when a given OS or architecture is not
supported. Take a look at the modified
build file using BinaryProvider.

They don’t need to guarantee that they have the same compiler, make and cmake
version to have a repeatable & smooth installation of the package.

Wrapping up

The process from 0 to a fully ready Julia package built on top of a binary
library is still not straightforward. Special appreciation goes to the
BinaryBuilder developers and contributors who helped me figure out some tricky
bits. But the key take-away of this is that once the pipeline is built, updating
the binary version or Julia wrapper is the same workflow one is used to with
standard Julia packages. Keep building pure Julia software for all its benefits,
but these tools I presented make it as great as possible to work with binaries.