Author Archives: Julia on μβ

Variables are not values: types and expressions in mathematical optimization

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2019-04-14-optimization-function-evaluation/

This week, I came across Richard Oberdieck’s post,
“Why ‘evaluate’ is the feature I am missing the most from commercial MIP solvers”.
It would indeed be practical to have for the reasons listed by the author, but
some barriers stand to have it as it is expressed in the snippets presented.

Initial problem statement

The author first tests the optimization of a non-linear function through scipy
as such:

func = lambda x: np.cos(14.5 * x - 0.3) + (x + 0.2) * x
func(5) # 25.001603108415402

So far so good, we are defining a scalar function, passing it a scalar value
at which it evaluates and returns the value, which is what it is
supposed to do.

Now the real gripe comes when moving on to developing against a black box
solver (often commercial, closed-source), commonly used for linear,
mixed-integer problems:

import xpress as xp

# Define the model and variables
Model = xp.problem()
x = xp.var(lb=0, ub=10)
Model.addVariable(x)

# Define the objective and solve
test_objective = 5*x
Model.setObjective(test_objective)
Model.solve()
# test_objective(5) does not work

One first problem to notice here is that test_objective
is at best an expression, not a function, meaning it does
not depend on an input argument but on decision variables declared globally.
That is one point why it cannot be called.

Now, the rest of this article will be some thoughts on how optimization problems
could be structured and represented in a programming language.

One hack that could be used is being able to set the values of x, but this
needs to be done at the global level:

x = xp.var(lb=0, ub=10)
Model.addVariable(x)

# Define the objective
test_objective = 5*x

x.set(5)
# evaluates test_objective with the set value of x
xp.evaluale(test_objective)

Having to use the global scope, with an action on one
object (the variable x) modifying another
(the test_objective expression) is called a side-effect and quickly makes
things confusing as your program grows in complexity. You have to contain the
state in some way and keep track. Keeping track of value changes is
more or less fine, but the hardest part is keeping track
of value definitions. Consider the following example:

x = xp.var(lb=0, ub=10)
Model.addVariable(x)
y = xp.var(lb=0, ub=10)
Model.addVariable(y)

# Define the objective and solve
test_objective = 5*x + 2*y
xp.evaluale(test_objective) # no variable set, what should this return?

x.set(5)
xp.evaluale(test_objective) # y is not set, what should this return?

A terminology problem

We are touching a more fundamental problem here, variables are not values
and cannot be considered as such. Merging the term “variable” for variables
of your Python/Julia/other program with the decision variables from an
optimization problem creates a great confusion.
Just like variables, the term function is confusing here:
most optimization techniques exploit the problem structure,
think linear, disciplined convex, semi-definite; anything beyond non-linear
differentiable or black-box optimization will use the specific structure
in a specialized algorithm.
If standard functions from your programming language are used, no structure
can be leveraged by the solver, which only sees a function pointer it can pass
values to. So working with mathematical optimization forces you to re-think
what you call “variables” and what you call “functions”.

There is something we can do for the function part, which is defining
arithmetic rules over variables and expressions, which is for instance what
the JuMP modelling framework does:

using JuMP
m = Model()
@variable(m, x1 >= 0)
@variable(m, x2 >= 0)

# random affine function
f(a, b) = π + 3a + 2b

f(x1, x2) # returns a JuMP.GenericAffExpr{Float64,VariableRef}

@variable(m, y)
f(x1, x2) + y  # also builds a JuMP.GenericAffExpr{Float64,VariableRef}

This works especially well with affine functions because composing affine
expressions builds other affine expressions but gets more complex any time
other types of constraints are added. For some great resource on types and
functions for mathematical optimization, watch Prof. Madeleine Udell’s
talk at JuliaCon17 (the Julia
syntax is from a pre-1.0 version, it may look funny).

Encoding possibilities as sum-types

Getting back to evaluation, to make this work, you need to know what
values variables hold. What if the model hasn’t been optimized yet?
You could take:

  1. A numerical approach and return NaN (floating point value for Not-A-Number)
  2. An imperative approach and throw an error when we evaluate an expression without values set or the model optimized
  3. A typed functional approach and describe the possibility of presence/absence of a value through types

The first approach was JuMP 0.18 and prior, the second is JuMP 0.19 and onward,
the third is the one of interest to us, if we want to describe what is happening
through types.

If you show these three options to a developer used to statically-typed
functional programming, they would tell you that the first option coming to mind
is an option, a type which can be either some value or nothing.
In the case of an optimization model, it would be some numerical value
if we have a value to return (that is, we optimized the model and found a
solution).
The problem is, there are many reasons for which you may have or not a value.
What you could do in that case is get more advanced information from your model.
This is the approach JuMP is taking with a bunch of model attributes you
can query at any time, see the documentation
for things you can query at any time.

The problem is that querying information on the status of the problem (solved,
unsolved, impossible to solve…) and getting values attached to variables can
be unrelated.

m = Model()
@variable(m, x >= 0)
@variable(m, y >= 0)

# getting status: nothing because not optimized
termination_status(m)
# OPTIMIZE_NOT_CALLED::TerminationStatusCode = 0

primal_status(m) # NO_SOLUTION::ResultStatusCode = 0

JuMP.value(x) # ERROR: NoOptimizer()
# woops, we forgot that we hadn't optimized yet

This is indeed because x does not exist by itself, there is
a “magic bridge” between the variable x and the model m.
The computer science term for this “magic bridge” is a
side-effect,
the same kind as mentioned earlier when we set the value of a variable at the
global scope. Again, they are fine at a small scale but are often the parts
making a program confusing. Every time I’m reviewing some code by researchers
starting out, the first thing I encourage them to do is to create self-contained
bits of code within functions and remove mutable global state.

A typed solution for describing mathematical problems

We stated that the variables and model are bound together. In that case, let
us not split them but describe them as one thing and since this one thing
accepts different possible states, we will use
tagged unions, which you can
think of as C enumerations with associated values. Other synonyms for this
construct are sum types (as in OCaml and Haskell).

We can think of the solution process of an optimization problem at a high level
as a function:

solve(Model(Variables, Constraints, Objective)) -> OptimizationResult

Where OptimizationResult is a sum type:

OptimizationResult = Infeasible(info) | Unbounded(info) | Optimal(info) | NearOptimal(info) ...

In this case, everything can stay immutable, expressions including objective
and constraints are only used to build the model in input, they can be
evaluated at any points and just describe some expressions of variables.
The value of the variables resulting from the optimization are on
available in cases where it makes sense. If the results are stored in the
solution info structure, we can query values where it makes sense only,
here in the Optimal and NearOptimal cases, with a syntax like:

match OptimizationResult {
    Optimal(info) -> value(info, x) # or info.value(x)
    Infeasible(info) -> ...
    Unbounded(info)  -> ...
}

Internally, info would keep an association from variables to corresponding
values. No more confusion on what binding of your computer program represents
what symbolic variable of your problem.

So why would we keep using these bindings associated with variables, if they
have never been independent from the problem in the first place? The obvious
reason that comes to mind is practical syntax, we can write expressions in
a quasi-mathematical way (here in JuMP):

@expression(m, 3x + 2x^2 <= 4y)

While if variables were attached to the model, the required syntax would be
in the flavour of:

@expression(m, 3m[:x] + 2m[:x]^2 <= 4m[:y])

Which quickly becomes hard to read. Can we do better?

Stealing a solution elsewhere

I stumbled upon an interesting solution to such problem while reading the
documentation for various probabilistic programming languages built on top
of Julia. Here is one example from Turing.jl

@model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
end

# sample from the model using an algorithm
chn = sample(gdemo(1.5, 2), HMC(1000, 0.1, 5))

It’s just one step away from imagining the same for optimization:

@optim_model linmodel(a, b) = begin
  x[1:10] >= 0
  5 <= y <= 10
  z  𝔹
  cons1: y - 5 <= 5z
  cons2: x + y >= 3
  Min x
end

result = optimize(linmodel)

Naming the constraints would be necessary to retrieve associated dual values.
Retrieving values associated with variables could be done in an associative
structure (think a dictionary/hash map). This structure removes any confusion as
to what belongs where in an optimization model. The variables x, y, z are
indeed defined within a given model and explicitly belong to it.

Why are interfaces not built this way? Warning, speculative opinions below:

One reason is the ubiquity of C & C++ in optimization.
The vast majority of commonly used solvers is built
in either of these, supporting limited programming constructs and based on
passing pointers around to change the values pointed to. Because the solvers are
built like this, interfaces follow the same constructions. Once a dominant
number of interfaces are identical, building something widely different is a
disadvantage with a steeper learning curve.

Another more nuanced reason is that declarative software is hard to get right.
One often has to build everything upfront, here in the @optim_model block.
Getting meaningful errors is much harder, and debugging optimization models
is already a tricky business.

Lastly, lots of algorithms are based on incremental modifications of models
(think column and row generation), or combinations with other bricks. This
requires some “hackability” of the model. If one looks at Algebraic Modelling
Languages, everything seems to fall apart once you try to implement
decompositions. Usually it involves a completely different syntax for the
decomposition scheme (the imperative part) and for the model declaration
(the declarative part).

So overall, even though side-effects are a central part of the barrier to
the expression of mathematical optimization in a mathematical, type-based
declarative way, they are needed because of the legacy of solvers and some
algorithms which become hairy to express without it.

Further resources

As pointed above, Prof. Madeleine Udell’s talk
gives some great perspectives on leveraging types for expressive optimization
modelling. For the brave and avid readers, this
PhD thesis tackles
the semantics of a formal language for optimization problems.
If you have further resources on the subject, please reach out.

Thanks Richard for the initial post and the following discussion which led to
this post. For shorter and nicely written posts on optimization, go read his
blog.

Note: I try never to use the terms “mathematical programming” and
“mathematical program” which are respectively synonyms for
“mathematical optimization” and “mathematical optimization problem” respectively.
We can see why in this post: this kind of context where the term “program”
could refer to a computer program or a mathematical problem becomes very
confusing. We are in 2019 and the term “program” is now universally understood
as a computer program. Moreover, “mathematical programming” merely refers to
a problem specification, it is very confusing to say that
“linear/semi-definite/convex programming” is merely meant as putting together
a bunch of equations, not at all about how to tackle these.


Picking different names with integer optimization

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2019-04-07-name_distances/


I must admit I am not always the most talented at social events.
One point I am especially bad at is remembering names, and it gets
even harder when lots of people have similar or similar-sounding names.
What if we could select a list of people with names as different from each
other as possible?

First some definitions, different here is meant with respect to the
Hamming distance of any two names.
This is far from ideal since Ekaterina would be quite far from Katerina, but
it will do the trick for now.

Graph-based mental model

This sounds like a problem representable as a complete graph.
The names are the vertices, and the weight associated with each edge $(i,j)$
is the distance between the names of the nodes. We want to take a subset
of $k$ nodes, such that the sum of edge weights for the induced sub-graph
is maximum. This is therefore a particular case of maximum (edge) weight clique
problem over a complete graph, which has been investigated in [1, 2] among others.

A mathematical optimization approach

This model can be expressed in a pretty compact way:

$$ \max_{x,y} \sum_{(i,j)\in E} c_{ij} \cdot y_{ij} $$
subject to: $$ 2y_{ij} \leq x_i + x_j \,\, \forall (i,j) \in E$$
$$ \sum_{i} x_i \leq k $$
$$x_i, y_{ij} \in \mathbb{B} $$

The graph is complete and undirected, so the set of edges is:
$ E = $ {$ (i,j) | i \in $ {$ 1..|V| $}$, j \in ${$ 1..i-1 $}}

It’s an integer problem with a quadratic number of variables and constraints.
Some other formulations have been proposed, and there may be a specific structure
to exploit given that we have a complete graph.
For the moment though, this generic formulation will do.

A Julia implementation

What we want is a function taking a collection of names and returning which
are selected. The first thing to do is build this distance matrix.
We will be using the
StringDistances.jl
package not to have to re-implement the Hamming distance.

import StringDistances

hamming(s1, s2) = StringDistances.evaluate(StringDistances.Hamming(), s1, s2)

function build_dist(vstr, dist = hamming)
    return [dist(vstr[i], vstr[j]) for i in eachindex(vstr), j in eachindex(vstr)]
end

We keep the option to change the distance function with something else later.
The optimization model can now be built, using the distance function and $k$,
the maximum number of nodes to take.

using JuMP
import SCIP

function max_clique(dist, k)
    m = Model(with_optimizer(SCIP.Optimizer))
    n = size(dist)[1]
    @variable(m, x[1:n], Bin)
    @variable(m, y[i=1:n,j=1:i-1], Bin)
    @constraint(m, sum(x) <= k)
    @constraint(m, [i=1:n,j=1:i-1], 2y[i,j] <= x[i] + x[j])
    @objective(m, Max, sum(y[i,j] * dist[i,j] for i=1:n,j=1:i-1))
    return (m, x, y)
end

I’m using SCIP as an integer solver to avoid proprietary software,
feel free to switch it for your favourite one.
Note that we don’t optimize the model yet but simply build it.
It is a useful pattern when working with JuMP, allowing users
to inspect the build model or add constraints to it before starting the resolution.
The last steps are straightforward:

dist = build_dist(vstr)
(m, x, y) = max_clique(dist, k)
optimize!(m) # solve the problem

# get the subset of interest
diverse_names = [vstr[i] for i in eachindex(vstr) if JuMP.value(x[i]) ≈ 1.]

And voilà.

Trying out the model

I will use 50 real names taken from
the list of random names website, which you
can find here.
The problem becomes large enough to be interesting, but reasonable enough for
a decent laptop. If you want to invite 4 of these people and get the most
different names, Christian, Elizbeth, Beulah and Wilhelmina are the ones you
are looking for.

Bonus and random ideas

It is computationally too demanding for now, but it would be interesting
to see how the total sum of distances evolves as you add more people.

Also, we are using the sum of distances as an objective to maximize.
One interesting alternative would be to maximize the smallest distance between
any two nodes in the subset. This changes the model, since we need to encode
the smallest distance using constraints. We will use an indicator constraint
to represent this:

$$\max_{x,y} d $$
subject to:
$$ y_{ij} \Rightarrow d \leq c_{ij} \,\, \forall (i,j) \in E$$
$$ 2y_{ij} \leq x_i + x_j \forall (i,j) \in E $$
$$ \sum_{(i,j) \in E} y_{ij} = k\cdot (k-1) $$

Depending on the solver support, the indicator constraint can be modelled directly,
with big M or SOS1 constraints. This remains harder than the initial model.

Special thanks to Yuan for bringing out the discussion which led to this
post, and to BYP for the feedback.


Sources

[1] Alidaee, Bahram, et al. “Solving the maximum edge weight clique problem via unconstrained quadratic programming.” European Journal of Operational Research 181.2 (2007): 592-597.

[2] Park, Kyungchul, Kyungsik Lee, and Sungsoo Park. “An extended formulation approach to the edge-weighted maximal clique problem.” European Journal of Operational Research 95.3 (1996): 671-682.

Picking different names with integer optimization

By: Julia on μβ

Re-posted from: https://matbesancon.xyz/post/2019-04-07-name_distances/


I must admit I am not always the most talented at social events.
One point I am especially bad at is remembering names, and it gets
even harder when lots of people have similar or similar-sounding names.
What if we could select a list of people with names as different from each
other as possible?

First some definitions, different here is meant with respect to the
Hamming distance of any two names.
This is far from ideal since Ekaterina would be quite far from Katerina, but
it will do the trick for now.

Graph-based mental model

This sounds like a problem representable as a complete graph.
The names are the vertices, and the weight associated with each edge $(i,j)$
is the distance between the names of the nodes. We want to take a subset
of $k$ nodes, such that the sum of edge weights for the induced sub-graph
is maximum. This is therefore a particular case of maximum (edge) weight clique
problem over a complete graph, which has been investigated in [1, 2] among others.

A mathematical optimization approach

This model can be expressed in a pretty compact way:

$$ \max_{x,y} \sum_{(i,j)\in E} c_{ij} \cdot y_{ij} $$
subject to: $$ 2y_{ij} \leq x_i + x_j ,, \forall (i,j) \in E$$
$$ \sum_{i} x_i \leq k $$
$$x_i, y_{ij} \in \mathbb{B} $$

The graph is complete and undirected, so the set of edges is:
$ E = $ {$ (i,j) | i \in $ {$ 1..|V| $}$, j \in ${$ 1..i-1 $}}

It’s an integer problem with a quadratic number of variables and constraints.
Some other formulations have been proposed, and there may be a specific structure
to exploit given that we have a complete graph.
For the moment though, this generic formulation will do.

A Julia implementation

What we want is a function taking a collection of names and returning which
are selected. The first thing to do is build this distance matrix.
We will be using the
StringDistances.jl
package not to have to re-implement the Hamming distance.

import StringDistances

hamming(s1, s2) = StringDistances.evaluate(StringDistances.Hamming(), s1, s2)

function build_dist(vstr, dist = hamming)
    return [dist(vstr[i], vstr[j]) for i in eachindex(vstr), j in eachindex(vstr)]
end

We keep the option to change the distance function with something else later.
The optimization model can now be built, using the distance function and $k$,
the maximum number of nodes to take.

using JuMP
import SCIP

function max_clique(dist, k)
    m = Model(with_optimizer(SCIP.Optimizer))
    n = size(dist)[1]
    @variable(m, x[1:n], Bin)
    @variable(m, y[i=1:n,j=1:i-1], Bin)
    @constraint(m, sum(x) <= k)
    @constraint(m, [i=1:n,j=1:i-1], 2y[i,j] <= x[i] + x[j])
    @objective(m, Max, sum(y[i,j] * dist[i,j] for i=1:n,j=1:i-1))
    return (m, x, y)
end

I’m using SCIP as an integer solver to avoid proprietary software,
feel free to switch it for your favourite one.
Note that we don’t optimize the model yet but simply build it.
It is a useful pattern when working with JuMP, allowing users
to inspect the build model or add constraints to it before starting the resolution.
The last steps are straightforward:

dist = build_dist(vstr)
(m, x, y) = max_clique(dist, k)
optimize!(m) # solve the problem

# get the subset of interest
diverse_names = [vstr[i] for i in eachindex(vstr) if JuMP.value(x[i])  1.]

And voilà.

Trying out the model

I will use 50 real names taken from
the list of random names website, which you
can find here.
The problem becomes large enough to be interesting, but reasonable enough for
a decent laptop. If you want to invite 4 of these people and get the most
different names, Christian, Elizbeth, Beulah and Wilhelmina are the ones you
are looking for.

Bonus and random ideas

It is computationally too demanding for now, but it would be interesting
to see how the total sum of distances evolves as you add more people.

Also, we are using the sum of distances as an objective to maximize.
One interesting alternative would be to maximize the smallest distance between
any two nodes in the subset. This changes the model, since we need to encode
the smallest distance using constraints. We will use an indicator constraint
to represent this:

$$\max_{x,y} d $$
subject to:
$$ y_{ij} \Rightarrow d \leq c_{ij} ,, \forall (i,j) \in E$$
$$ 2y_{ij} \leq x_i + x_j \forall (i,j) \in E $$
$$ \sum_{(i,j) \in E} y_{ij} = k\cdot (k-1) $$

Depending on the solver support, the indicator constraint can be modelled directly,
with big M or SOS1 constraints. This remains harder than the initial model.

Special thanks to Yuan for bringing out the discussion which led to this
post, and to BYP for the feedback.


Sources

[1] Alidaee, Bahram, et al. “Solving the maximum edge weight clique problem via unconstrained quadratic programming.” European Journal of Operational Research 181.2 (2007): 592-597.

[2] Park, Kyungchul, Kyungsik Lee, and Sungsoo Park. “An extended formulation approach to the edge-weighted maximal clique problem.” European Journal of Operational Research 95.3 (1996): 671-682.