Author Archives: Julia on μβ

A take on Benders decomposition in JuMP

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2019-05-08-simple-benders/

Last Friday was a great seminar of the Combinatorial Optimization group in
Paris, celebrating the 85th birthday of Jack Edmonds, one of the founding
researchers of combinatorial optimization, with the notable Blossom matching algorithm.

Laurence Wolsey and Ivana Ljubic were both giving talks on applications and
developments in Benders decompositions. It also made me want to refresh my
knowledge of the subject and play a bit with a simple implementation.

Table of Contents

High-level idea

Problem decompositions are used on large-scale optimization problems with a
particular structure. The decomposition turns a compact, hard-to-solve
formulation into an easier one but of great size. In the case of Benders,
great size means a number of constraints growing exponentially
with the size of the input problem. Adding all constraints upfront would be too
costly. Furthermore, in general, only a small fraction of these constraints will be
active in a final solution, the associated algorithm is to generate them incrementally,
re-solve the problem with the new constraint until no relevant constraint can
be found anymore.

We can establish a more general pattern of on-the-fly addition of
information to an optimization problem, which entails two components:

  1. An incrementally-built problem, called Restricted Master Problem (RMP) in decomposition.
  2. An oracle or sub-problem, taking the problem state and building the new required structure (here a new constraint).

Sounds familiar? Benders can be seen as the “dual twin” of the Dantzig-Wolfe
decomposition I had played with in a previous post.

Digging into the structure

Now that we have a general idea of the problem at hand, let’s see the specifics.
Consider a problem such as:
$$ \min_{x,y} f(y) + c^T x $$
s.t. $$ G(y) \in \mathcal{S}$$
$$ A x + D y \geq b $$
$$ x \in \mathbb{R}^{n_1}_{+}, y \in \mathcal{Y} $$

We will not consider the constraints specific to $y$ (the first row) nor the
$y$-component of the objective. The key assumption of Benders is that if the $y$
are fixed, the problem on the $x$ variables is fast to solve.
Lots of heuristics use this idea of “fix-and-optimize” to avoid incorporating
the “hard” variables in the problem, Benders leverages several properties to
bring the idea to exact methods (exact in the sense of proven optimality).

Taking the problem above, we can simplify the structure by abstracting away
(i.e. projecting out) the $x$ part:
$$ \min_{y} f(y) + \phi(y) $$
s.t. $$ G(y) \in \mathcal{S}$$
$$ y \in \mathcal{Y} $$

Where:
$$ \phi(y) = \min_{x} \{c^T x, Ax \geq b – Dy, x \geq 0 \} $$

$\phi(y)$ is a non-smooth function, with $\, dom\ \phi \,$ the feasible domain
of the problem. If you are familiar with bilevel optimization, this could
remind you of the optimal value function used to describe lower-level problems.
We will call $SP$ the sub-problem defined in the function $\phi$.

The essence of Benders is to start from an outer-approximation (overly optimistic)
by replacing $\phi$ with a variable $\eta$ which might be higher than the min value,
and then add cuts which progressively constrain the problem.
The initial outer-approximation is:

$$ \min_{y,\eta} f(y) + \eta $$
s.t. $$ G(y) \in \mathcal{S}$$
$$ y \in \mathcal{Y} $$

Of course since $\eta$ is unconstrained, the problem will start unbounded.
What are valid cuts for this? Let us define the dual of the sub-problem $SP$,
which we will name $DSP$:
$$ \max_{\alpha} (b – Dy)^T \alpha $$
s.t. $$ A^T \alpha \leq c $$
$$ \alpha \geq 0 $$

Given that $\eta \geq min SP$, by duality, $\eta \geq max DSP$.
Furthermore, by strong duality of linear problems, if $\eta = \min \max_{y} DSP$,
it is exactly equal to the minimum of $\phi(y)$ and yields the optimal solution.

One thing to note about the feasible domain of $DSP$, it does not depend on
the value of $y$. This means $z$ feasible for all values of the dual is
equivalent to being feasible for all extreme points and rays of the dual
polyhedron. Each of these can yield a new cut to add to the relaxed problem.
For the sake of conciseness, I will not go into details on the case when
the sub-problem is not feasible for a $y$ solution. Briefly, this is equivalent
to the dual being unbounded, it thus defines an extreme ray which must be cut
out. For more details, you can check these lecture notes.

A JuMP implementation

We will define a simple implementation using JuMP,
a generic optimization modeling library on top of Julia, usable with various
solvers. Since the master and sub-problem resolutions are completely independent,
they can be solved in separated software components, even with different solvers.
To highlight this, we will use SCIP
to solve the master problem and COIN-OR’s Clp
to solve the sub-problem.

We can start by importing the required packages:

using JuMP
import SCIP
import Clp
using LinearAlgebra: dot

Defining and solving dual sub-problems

Let us store static sub-problem data in a structure:

struct SubProblemData
    b::Vector{Float64}
    D::Matrix{Float64}
    A::Matrix{Float64}
    c::Vector{Float64}
end

And the dual sub-problem is entirely contained in another structure:

struct DualSubProblem
    data::SubProblemData
    α::Vector{VariableRef}
    m::Model
end

function DualSubProblem(d::SubProblemData, m::Model)
    α = @variable(m, α[i = 1:size(d.A, 1)] >= 0)
    @constraint(m, dot(d.A, α) .<= d.c)
    return DualSubProblem(d, α, m)
end

The DualSubProblem is constructed from the static data and a JuMP model.
We mentioned that the feasible space of the sub-problem is independent of the
value of $y$, thus we can add the constraint right away. Only to optimize it
do we require the $\hat{y}$ value, which is used to set the objective.
We can then either return a feasibility cut or optimality cut depending on
the solution status of the dual sub-problem:

function JuMP.optimize!(sp::DualSubProblem, yh)
    obj = sp.data.b .- sp.data.D * yh
    @objective(sp.m, Max, dot(obj, sp.α))
    optimize!(sp.m)
    st = termination_status(sp.m)
    if st == MOI.OPTIMAL
        α = JuMP.value.(sp.α)
        return (:OptimalityCut, α)
    elseif st == MOI.DUAL_INFEASIBLE
        return (:FeasibilityCut, α)
    else
        error("DualSubProblem error: status $status")
    end
end

Iterating on the master problem

The main part of the resolution holds here in three steps.

  1. Initialize a master problem with variables $(y,\eta)$
  2. Optimize and pass the $\hat{y}$ value to the sub-problem.
  3. Get back a dual value $\alpha$ from the dual sub-problem
  4. Is the constraint generated by the $\alpha$ value already respected?
  5. If yes, the solution is optimal.
  6. If no, add the corresponding cut to the master problem, return to 2.
function benders_optimize!(m::Model, y::Vector{VariableRef}, sd::SubProblemData, sp_optimizer, f::Union{Function,Type}; eta_bound::Real = -1000.0)
    subproblem = Model(with_optimizer(sp_optimizer))
    dsp = DualSubProblem(sd, subproblem)
    @variable(m, η >= eta_bound)
    @objective(m, Min, f(y) + η)
    optimize!(m)
    st = MOI.get(m, MOI.TerminationStatus())
    # restricted master has a solution or is unbounded
    nopt_cons, nfeas_cons = (0, 0)
    @info "Initial status $st"
    cuts = Tuple{Symbol, Vector{Float64}}[]
    while (st == MOI.DUAL_INFEASIBLE) || (st == MOI.OPTIMAL)
        optimize!(m)
        st = MOI.get(m, MOI.TerminationStatus())
        ŷ = JuMP.value.(y)
        η0 = JuMP.value(η)
        (res, α) = optimize!(dsp, ŷ)
        if res == :OptimalityCut
            @info "Optimality cut found"
            if η0  dot(α, (dsp.data.b - dsp.data.D * ŷ))
                break
            else
                nopt_cons += 1
                @constraint(m, η  dot(α, (dsp.data.b - dsp.data.D * y)))
            end
        else
            @info "Feasibility cut found"
            nfeas_cons += 1
            @constraint(m, 0  dot(α, (dsp.data.b - dsp.data.D * y)))
        end
        push!(cuts, (res, α))
    end
    return (m, y, cuts, nopt_cons, nfeas_cons)
end

Note that we pass the function an already-built model with variable $y$ defined.
This allows for a prior flexible definition of constraints of the type:
$$y \in \mathcal{Y}$$
$$G(y) \in \mathcal{S}$$

Also, we return the $\alpha$ values found by the sub-problems and the number of
cuts of each type. Finally, one “hack” I’m using is to give an arbitrary lower
bound on the $\eta$ value, making it (almost) sure to have a bounded initial
problem and thus a defined initial solution $y$.

We will re-use the small example from the lecture notes above:

function test_data()
    c = [2., 3.]
    A = [1 2;2 -1]
    D = zeros(2, 1) .+ [1, 3]
    b = [3, 4]
    return SimpleBenders.SubProblemData(b, D, A, c)
end

data = test_data()
# objective function on y
f(v) = 2v[1]
# initialize the problem
m = Model(with_optimizer(SCIP.Optimizer))
@variable(m, y[j=1:1] >= 0)
# solve and voilà
(m, y, cuts, nopt_cons, nfeas_cons) = SimpleBenders.benders_optimize!(m, y, data, () -> Clp.Optimizer(LogLevel = 0), f)

The full code is available on
Github, run it, modify it
and don’t hesitate to submit pull requests and issues, I’m sure there are 🙂

Benders is a central pillar for various problems in optimization, research is
still very active to bring it to non-linear convex or non-convex sub-problems
where duality cannot be used. If you liked this post or have questions,
don’t hesitate to react or ping me on Twitter.


A take on Benders decomposition in JuMP

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2019-05-08-simple-benders/

Last Friday was a great seminar of the Combinatorial Optimization group in
Paris, celebrating the 85th birthday of Jack Edmonds, one of the founding
researchers of combinatorial optimization, with the notable Blossom matching algorithm.

Laurence Wolsey and Ivana Ljubic were both giving talks on applications and
developments in Benders decompositions. It also made me want to refresh my
knowledge of the subject and play a bit with a simple implementation.

High-level idea

Problem decompositions are used on large-scale optimization problems with a
particular structure. The decomposition turns a compact, hard-to-solve
formulation into an easier one but of great size. In the case of Benders,
great size means a number of constraints growing exponentially
with the size of the input problem. Adding all constraints upfront would be too
costly. Furthermore, in general, only a small fraction of these constraints will be
active in a final solution, the associated algorithm is to generate them incrementally,
re-solve the problem with the new constraint until no relevant constraint can
be found anymore.

We can establish a more general pattern of on-the-fly addition of
information to an optimization problem, which entails two components:

  1. An incrementally-built problem, called Restricted Master Problem (RMP) in decomposition.
  2. An oracle or sub-problem, taking the problem state and building the new required structure (here a new constraint).

Sounds familiar? Benders can be seen as the “dual twin” of the Dantzig-Wolfe
decomposition I had played with in a previous post.

Digging into the structure

Now that we have a general idea of the problem at hand, let’s see the specifics.
Consider a problem such as:
$$ \min_{x,y} f(y) + c^T x $$
s.t. $$ G(y) \in \mathcal{S}$$
$$ A x + D y \geq b $$
$$ x \in \mathbb{R}^{n_1}_{+}, y \in \mathcal{Y} $$

We will not consider the constraints specific to $y$ (the first row) nor the
$y$-component of the objective. The key assumption of Benders is that if the $y$
are fixed, the problem on the $x$ variables is fast to solve.
Lots of heuristics use this idea of “fix-and-optimize” to avoid incorporating
the “hard” variables in the problem, Benders leverages several properties to
bring the idea to exact methods (exact in the sense of proven optimality).

Taking the problem above, we can simplify the structure by abstracting away
(i.e. projecting out) the $x$ part:
$$ \min_{y} f(y) + \phi(y) $$
s.t. $$ G(y) \in \mathcal{S}$$
$$ y \in \mathcal{Y} $$

Where:
$$ \phi(y) = \min_{x} \{c^T x, Ax \geq b – Dy, x \geq 0 \} $$

$\phi(y)$ is a non-smooth function, with $, dom\ \phi ,$ the feasible domain
of the problem. If you are familiar with bilevel optimization, this could
remind you of the optimal value function used to describe lower-level problems.
We will call $SP$ the sub-problem defined in the function $\phi$.

The essence of Benders is to start from an outer-approximation (overly optimistic)
by replacing $\phi$ with a variable $\eta$ which might be higher than the min value,
and then add cuts which progressively constrain the problem.
The initial outer-approximation is:

$$ \min_{y,\eta} f(y) + \eta $$
s.t. $$ G(y) \in \mathcal{S}$$
$$ y \in \mathcal{Y} $$

Of course since $\eta$ is unconstrained, the problem will start unbounded.
What are valid cuts for this? Let us define the dual of the sub-problem $SP$,
which we will name $DSP$:
$$ \max_{\alpha} (b – Dy)^T \alpha $$
s.t. $$ A^T \alpha \leq c $$
$$ \alpha \geq 0 $$

Given that $\eta \geq min SP$, by duality, $\eta \geq max DSP$.
Furthermore, by strong duality of linear problems, if $\eta = \min \max_{y} DSP$,
it is exactly equal to the minimum of $\phi(y)$ and yields the optimal solution.

One thing to note about the feasible domain of $DSP$, it does not depend on
the value of $y$. This means $z$ feasible for all values of the dual is
equivalent to being feasible for all extreme points and rays of the dual
polyhedron. Each of these can yield a new cut to add to the relaxed problem.
For the sake of conciseness, I will not go into details on the case when
the sub-problem is not feasible for a $y$ solution. Briefly, this is equivalent
to the dual being unbounded, it thus defines an extreme ray which must be cut
out. For more details, you can check these lecture notes.

A JuMP implementation

We will define a simple implementation using JuMP,
a generic optimization modeling library on top of Julia, usable with various
solvers. Since the master and sub-problem resolutions are completely independent,
they can be solved in separated software components, even with different solvers.
To highlight this, we will use SCIP
to solve the master problem and COIN-OR’s Clp
to solve the sub-problem.

We can start by importing the required packages:

using JuMP
import SCIP
import Clp
using LinearAlgebra: dot

Defining and solving dual sub-problems

Let us store static sub-problem data in a structure:

struct SubProblemData
    b::Vector{Float64}
    D::Matrix{Float64}
    A::Matrix{Float64}
    c::Vector{Float64}
end

And the dual sub-problem is entirely contained in another structure:

struct DualSubProblem
    data::SubProblemData
    α::Vector{VariableRef}
    m::Model
end

function DualSubProblem(d::SubProblemData, m::Model)
    α = @variable(m, α[i = 1:size(d.A, 1)] >= 0)
    @constraint(m, dot(d.A, α) .<= d.c)
    return DualSubProblem(d, α, m)
end

The DualSubProblem is constructed from the static data and a JuMP model.
We mentioned that the feasible space of the sub-problem is independent of the
value of $y$, thus we can add the constraint right away. Only to optimize it
do we require the $\hat{y}$ value, which is used to set the objective.
We can then either return a feasibility cut or optimality cut depending on
the solution status of the dual sub-problem:

function JuMP.optimize!(sp::DualSubProblem, yh)
    obj = sp.data.b .- sp.data.D * yh
    @objective(sp.m, Max, dot(obj, sp.α))
    optimize!(sp.m)
    st = termination_status(sp.m)
    if st == MOI.OPTIMAL
        α = JuMP.value.(sp.α)
        return (:OptimalityCut, α)
    elseif st == MOI.DUAL_INFEASIBLE
        return (:FeasibilityCut, α)
    else
        error("DualSubProblem error: status $status")
    end
end

Iterating on the master problem

The main part of the resolution holds here in three steps.

  1. Initialize a master problem with variables $(y,\eta)$
  2. Optimize and pass the $\hat{y}$ value to the sub-problem.
  3. Get back a dual value $\alpha$ from the dual sub-problem
  4. Is the constraint generated by the $\alpha$ value already respected?
  • If yes, the solution is optimal.
  • If no, add the corresponding cut to the master problem, return to 2.
function benders_optimize!(m::Model, y::Vector{VariableRef}, sd::SubProblemData, sp_optimizer, f::Union{Function,Type}; eta_bound::Real = -1000.0)
    subproblem = Model(with_optimizer(sp_optimizer))
    dsp = DualSubProblem(sd, subproblem)
    @variable(m, η >= eta_bound)
    @objective(m, Min, f(y) + η)
    optimize!(m)
    st = MOI.get(m, MOI.TerminationStatus())
    # restricted master has a solution or is unbounded
    nopt_cons, nfeas_cons = (0, 0)
    @info "Initial status $st"
    cuts = Tuple{Symbol, Vector{Float64}}[]
    while (st == MOI.DUAL_INFEASIBLE) || (st == MOI.OPTIMAL)
        optimize!(m)
        st = MOI.get(m, MOI.TerminationStatus())
        ŷ = JuMP.value.(y)
        η0 = JuMP.value(η)
        (res, α) = optimize!(dsp, ŷ)
        if res == :OptimalityCut
            @info "Optimality cut found"
            if η0  dot(α, (dsp.data.b - dsp.data.D * ŷ))
                break
            else
                nopt_cons += 1
                @constraint(m, η  dot(α, (dsp.data.b - dsp.data.D * y)))
            end
        else
            @info "Feasibility cut found"
            nfeas_cons += 1
            @constraint(m, 0  dot(α, (dsp.data.b - dsp.data.D * y)))
        end
        push!(cuts, (res, α))
    end
    return (m, y, cuts, nopt_cons, nfeas_cons)
end

Note that we pass the function an already-built model with variable $y$ defined.
This allows for a prior flexible definition of constraints of the type:
$$y \in \mathcal{Y}$$
$$G(y) \in \mathcal{S}$$

Also, we return the $\alpha$ values found by the sub-problems and the number of
cuts of each type. Finally, one “hack” I’m using is to give an arbitrary lower
bound on the $\eta$ value, making it (almost) sure to have a bounded initial
problem and thus a defined initial solution $y$.

We will re-use the small example from the lecture notes above:

function test_data()
    c = [2., 3.]
    A = [1 2;2 -1]
    D = zeros(2, 1) .+ [1, 3]
    b = [3, 4]
    return SimpleBenders.SubProblemData(b, D, A, c)
end

data = test_data()
# objective function on y
f(v) = 2v[1]
# initialize the problem
m = Model(with_optimizer(SCIP.Optimizer))
@variable(m, y[j=1:1] >= 0)
# solve and voilà
(m, y, cuts, nopt_cons, nfeas_cons) = SimpleBenders.benders_optimize!(m, y, data, () -> Clp.Optimizer(LogLevel = 0), f)

The full code is available on
Github, run it, modify it
and don’t hesitate to submit pull requests and issues, I’m sure there are 🙂

Benders is a central pillar for various problems in optimization, research is
still very active to bring it to non-linear convex or non-convex sub-problems
where duality cannot be used. If you liked this post or have questions,
don’t hesitate to react or ping me on Twitter.


Variables are not values: types and expressions in mathematical optimization

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2019-04-14-optimization-function-evaluation/

This week, I came across Richard Oberdieck’s post,
“Why ‘evaluate’ is the feature I am missing the most from commercial MIP solvers”.
It would indeed be practical to have for the reasons listed by the author, but
some barriers stand to have it as it is expressed in the snippets presented.

Table of Contents

Initial problem statement

The author first tests the optimization of a non-linear function through scipy
as such:

func = lambda x: np.cos(14.5 * x - 0.3) + (x + 0.2) * x
func(5) # 25.001603108415402

So far so good, we are defining a scalar function, passing it a scalar value
at which it evaluates and returns the value, which is what it is
supposed to do.

Now the real gripe comes when moving on to developing against a black box
solver (often commercial, closed-source), commonly used for linear,
mixed-integer problems:

import xpress as xp

# Define the model and variables
Model = xp.problem()
x = xp.var(lb=0, ub=10)
Model.addVariable(x)

# Define the objective and solve
test_objective = 5*x
Model.setObjective(test_objective)
Model.solve()
# test_objective(5) does not work

One first problem to notice here is that test_objective
is at best an expression, not a function, meaning it does
not depend on an input argument but on decision variables declared globally.
That is one point why it cannot be called.

Now, the rest of this article will be some thoughts on how optimization problems
could be structured and represented in a programming language.

One hack that could be used is being able to set the values of x, but this
needs to be done at the global level:

x = xp.var(lb=0, ub=10)
Model.addVariable(x)

# Define the objective
test_objective = 5*x

x.set(5)
# evaluates test_objective with the set value of x
xp.evaluale(test_objective)

Having to use the global scope, with an action on one
object (the variable x) modifying another
(the test_objective expression) is called a side-effect and quickly makes
things confusing as your program grows in complexity. You have to contain the
state in some way and keep track. Keeping track of value changes is
more or less fine, but the hardest part is keeping track
of value definitions. Consider the following example:

x = xp.var(lb=0, ub=10)
Model.addVariable(x)
y = xp.var(lb=0, ub=10)
Model.addVariable(y)

# Define the objective and solve
test_objective = 5*x + 2*y
xp.evaluale(test_objective) # no variable set, what should this return?

x.set(5)
xp.evaluale(test_objective) # y is not set, what should this return?

A terminology problem

We are touching a more fundamental problem here, variables are not values
and cannot be considered as such. Merging the term “variable” for variables
of your Python/Julia/other program with the decision variables from an
optimization problem creates a great confusion.
Just like variables, the term function is confusing here:
most optimization techniques exploit the problem structure,
think linear, disciplined convex, semi-definite; anything beyond non-linear
differentiable or black-box optimization will use the specific structure
in a specialized algorithm.
If standard functions from your programming language are used, no structure
can be leveraged by the solver, which only sees a function pointer it can pass
values to. So working with mathematical optimization forces you to re-think
what you call “variables” and what you call “functions”.

There is something we can do for the function part, which is defining
arithmetic rules over variables and expressions, which is for instance what
the JuMP modelling framework does:

using JuMP
m = Model()
@variable(m, x1 >= 0)
@variable(m, x2 >= 0)

# random affine function
f(a, b) = π + 3a + 2b

f(x1, x2) # returns a JuMP.GenericAffExpr{Float64,VariableRef}

@variable(m, y)
f(x1, x2) + y  # also builds a JuMP.GenericAffExpr{Float64,VariableRef}

This works especially well with affine functions because composing affine
expressions builds other affine expressions but gets more complex any time
other types of constraints are added. For some great resource on types and
functions for mathematical optimization, watch Prof. Madeleine Udell’s
talk at JuliaCon17 (the Julia
syntax is from a pre-1.0 version, it may look funny).

Encoding possibilities as sum-types

Getting back to evaluation, to make this work, you need to know what
values variables hold. What if the model hasn’t been optimized yet?
You could take:
1. A numerical approach and return NaN (floating point value for Not-A-Number)
2. An imperative approach and throw an error when we evaluate an expression without values set or the model optimized
3. A typed functional approach and describe the possibility of presence/absence of a value through types

The first approach was JuMP 0.18 and prior, the second is JuMP 0.19 and onward,
the third is the one of interest to us, if we want to describe what is happening
through types.

If you show these three options to a developer used to statically-typed
functional programming, they would tell you that the first option coming to mind
is an option, a type which can be either some value or nothing.
In the case of an optimization model, it would be some numerical value
if we have a value to return (that is, we optimized the model and found a
solution).
The problem is, there are many reasons for which you may have or not a value.
What you could do in that case is get more advanced information from your model.
This is the approach JuMP is taking with a bunch of model attributes you
can query at any time, see the documentation
for things you can query at any time.

The problem is that querying information on the status of the problem (solved,
unsolved, impossible to solve…) and getting values attached to variables can
be unrelated.

m = Model()
@variable(m, x >= 0)
@variable(m, y >= 0)

# getting status: nothing because not optimized
termination_status(m)
# OPTIMIZE_NOT_CALLED::TerminationStatusCode = 0

primal_status(m) # NO_SOLUTION::ResultStatusCode = 0

JuMP.value(x) # ERROR: NoOptimizer()
# woops, we forgot that we hadn't optimized yet

This is indeed because x does not exist by itself, there is
a “magic bridge” between the variable x and the model m.
The computer science term for this “magic bridge” is a
side-effect,
the same kind as mentioned earlier when we set the value of a variable at the
global scope. Again, they are fine at a small scale but are often the parts
making a program confusing. Every time I’m reviewing some code by researchers
starting out, the first thing I encourage them to do is to create self-contained
bits of code within functions and remove mutable global state.

A typed solution for describing mathematical problems

We stated that the variables and model are bound together. In that case, let
us not split them but describe them as one thing and since this one thing
accepts different possible states, we will use
tagged unions, which you can
think of as C enumerations with associated values. Other synonyms for this
construct are sum types (as in OCaml and Haskell).

We can think of the solution process of an optimization problem at a high level
as a function:

solve(Model(Variables, Constraints, Objective)) -> OptimizationResult

Where OptimizationResult is a sum type:

OptimizationResult = Infeasible(info) | Unbounded(info) | Optimal(info) | NearOptimal(info) ...

In this case, everything can stay immutable, expressions including objective
and constraints are only used to build the model in input, they can be
evaluated at any points and just describe some expressions of variables.
The value of the variables resulting from the optimization are on
available in cases where it makes sense. If the results are stored in the
solution info structure, we can query values where it makes sense only,
here in the Optimal and NearOptimal cases, with a syntax like:

match OptimizationResult {
    Optimal(info) -> value(info, x) # or info.value(x)
    Infeasible(info) -> ...
    Unbounded(info)  -> ...
}

Internally, info would keep an association from variables to corresponding
values. No more confusion on what binding of your computer program represents
what symbolic variable of your problem.

So why would we keep using these bindings associated with variables, if they
have never been independent from the problem in the first place? The obvious
reason that comes to mind is practical syntax, we can write expressions in
a quasi-mathematical way (here in JuMP):

@expression(m, 3x + 2x^2 <= 4y)

While if variables were attached to the model, the required syntax would be
in the flavour of:

@expression(m, 3m[:x] + 2m[:x]^2 <= 4m[:y])

Which quickly becomes hard to read. Can we do better?

Stealing a solution elsewhere

I stumbled upon an interesting solution to such problem while reading the
documentation for various probabilistic programming languages built on top
of Julia. Here is one example from Turing.jl

@model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
end

# sample from the model using an algorithm
chn = sample(gdemo(1.5, 2), HMC(1000, 0.1, 5))

It’s just one step away from imagining the same for optimization:

@optim_model linmodel(a, b) = begin
  x[1:10] >= 0
  5 <= y <= 10
  z  𝔹
  cons1: y - 5 <= 5z
  cons2: x + y >= 3
  Min x
end

result = optimize(linmodel)

Naming the constraints would be necessary to retrieve associated dual values.
Retrieving values associated with variables could be done in an associative
structure (think a dictionary/hash map). This structure removes any confusion as
to what belongs where in an optimization model. The variables x, y, z are
indeed defined within a given model and explicitly belong to it.

Why are interfaces not built this way? Warning, speculative opinions below:

One reason is the ubiquity of C & C++ in optimization.
The vast majority of commonly used solvers is built
in either of these, supporting limited programming constructs and based on
passing pointers around to change the values pointed to. Because the solvers are
built like this, interfaces follow the same constructions. Once a dominant
number of interfaces are identical, building something widely different is a
disadvantage with a steeper learning curve.

Another more nuanced reason is that declarative software is hard to get right.
One often has to build everything upfront, here in the @optim_model block.
Getting meaningful errors is much harder, and debugging optimization models
is already a tricky business.

Lastly, lots of algorithms are based on incremental modifications of models
(think column and row generation), or combinations with other bricks. This
requires some “hackability” of the model. If one looks at Algebraic Modelling
Languages, everything seems to fall apart once you try to implement
decompositions. Usually it involves a completely different syntax for the
decomposition scheme (the imperative part) and for the model declaration
(the declarative part).

So overall, even though side-effects are a central part of the barrier to
the expression of mathematical optimization in a mathematical, type-based
declarative way, they are needed because of the legacy of solvers and some
algorithms which become hairy to express without it.

Further resources

As pointed above, Prof. Madeleine Udell’s talk
gives some great perspectives on leveraging types for expressive optimization
modelling. For the brave and avid readers, this
PhD thesis tackles
the semantics of a formal language for optimization problems.
If you have further resources on the subject, please reach out.

Thanks Richard for the initial post and the following discussion which led to
this post. For shorter and nicely written posts on optimization, go read his
blog.

Note: I try never to use the terms “mathematical programming” and
“mathematical program” which are respectively synonyms for
“mathematical optimization” and “mathematical optimization problem” respectively.
We can see why in this post: this kind of context where the term “program”
could refer to a computer program or a mathematical problem becomes very
confusing. We are in 2019 and the term “program” is now universally understood
as a computer program. Moreover, “mathematical programming” merely refers to
a problem specification, it is very confusing to say that
“linear/semi-definite/convex programming” is merely meant as putting together
a bunch of equations, not at all about how to tackle these.