Experiments on communicating vessels, constrained optimization and manifolds

By: Julia on μβ

Re-posted from: https://matbesancon.github.io/post/2020-05-09-volumes/

Table of Contents

HAHAHUGOSHORTCODE-TOC0-HBHB

Fluid mechanics was one of my favourite topics in my Process Engineering program
(some people will quit reading at this point and never talk to me again) so without surprise,
I could not resist diving into this new blog post
on SIAM News caught my attention.
This is the second time a post from Mark Levi caught my attention, the last was
was on heat exchangers,
on which I wrote a blog post, toying with parallel and counter-current heat exchangers.

This new post from Mark Levi illustrates a key concept in constrained optimization: Lagrange multipliers
and a nice interpretation in a problem of communicating vessels.

Communicating vessels and optimization formulation

Imagine $N$ vessels filled with water, all connected through a pipe at the bottom:
If you are familiar with fluid mechanics, feel free to skip this section.

Communicating vessels1

The problem statement is, given initial levels of water $x_k$ in each $k-th$ vessel:

  • how does the state evolve?
  • what equilibrium, if any, is eventually reached?

Otherwise, consider the weight of water creates pressure within it.
The lower a point in the water, the higher the pressure, since there is more water above which exercises its weight.
A difference in pressure between two points will create a motion of the water, until the pressure equalizes.
Put differently, some fluid moves from the full part of the vessel (with more pressure) to empty parts (with less pressure)
until the pressure equalizes.

Communicating vessels2

Since the pressure at a point depends on the height of the fluid above this point,
two points have equal pressure when the height of water above them is equal.
This is a phenomenon we often experience, with a watering can for instance.

Vessel equilibrium as an optimization problem

A system reaches an equilibrium at the minimum of its potential energy.
Feel free to skip this part if you read the blog post by Mark Levi, we basically go over the problem formulation once again.
An equilibrium state (where the state does not evolve anymore) can be found by
solving the optimization problem minimizing the potential energy, subject to the
respect of the laws of physics. These laws state two things:

  • No fluid loss: the mass of liquid is preserved, and since we are working with an incompressible liquid, the total volume too is constant.
  • No negative volume: the different vessels exchange water, their volume increasing or decreasing with time, but at no point can a vessel reach a negative volume.

Each vessel $k$ will be described by a profile, an area as function of the height $f_k(x)$.
We assume these functions $f_k$ are all continuous.
The state at any point in time is the height in each vessel $x_k$.
The total volume of water in the vessel is given by:
$$V_k(x_k) = \int_0^{x_k} f_k(x) dx.$$

The conservation of volume can be expressed as:

$$V_{0} = \sum_{k=1}^N V_k(x_k) = \sum_{k=1}^N \int_0^{x_k} f_k(x) dx$$

where $V_{0}$ is the initial total volume water.
The nonnegativity of water volume in each vessel can be expressed as:
$$\int_0^{x_k} f_k(x) dx \geq 0\,\,\, \forall k$$

The area at any height $f_k(x)$ is positive or null, so this constraint
can be simplified as:
$$x_k \geq 0 \,\,\, \forall k$$

The potential function, the objective minimized by the problem, is the last thing we miss.
It consists of the total potential function of the water in the vessels, caused by gravity only.
Each infinitesimal slice of water $dx$ exercises its weight, which is proportional to its volume
$f_k(x) dx$ times height $x$. By integrating over a whole vessel $k$, this gives a potential of:
$$ \int_0^{x_k} x f_k(x) M dx$$
with M a constant of appropriate dimension. Since we are minimizing the sum of these functions,
we will get rid of the constant (again, sorry for shocking physicists), yielding an objective:

$$ F(x) = \sum_{k=1}^N \int_0^{x_k} x f_k(x)dx.$$

To sum it all, the optimization problem finding an equilibrium is:

$$
\begin{align}
\min_{x} & \sum_{k=1}^N \int_0^{x_k} x f_k(x)dx \\\\
& \text{subject to:} \\
& G(x) = \sum_{k=1}^N \int_0^{x_k} f_k(x) dx – V_0 = 0\\\
& x_k \geq 0 \,\,\, \forall k
\end{align}
$$

If you read the blog post, you saw the best way to solve this problem is by
relaxing the positive constraint, write the first-order Karush-Kuhn-Tucker (KKT) conditions:

$$
\begin{align}
& \nabla F(x) = \lambda \nabla G(x) & \Leftrightarrow\\
& x_k f_k(x) = \lambda f_k(x_k) & \Leftrightarrow \\
& x_k = \lambda \,\,\,\forall k
\end{align}
$$

So the multiplier $\lambda$ ends up being the height of water across all vessels, the equations come back to the intuitive result.
Let us implement $F$, $G$ and their gradients in Julia to reproduce this result numerically.
We will use four vessels of various shapes:

import QuadGK

const funcs = (
    x -> oneunit(x),
    x -> 2x,
    x -> 2 * sqrt(x),
    x -> 2 * x^2
)

const N = length(funcs)

g(x) = sum(1:N) do k
    QuadGK.quadgk(funcs[k], 0, x[k])[1]
end

f(x) = sum(1:N) do k
    x[k] * QuadGK.quadgk(funcs[k], 0, x[k])[1]
end

QuadGK.quadgk from the Gauss–Kronrod package
computes a numerical integral of a function on an interval.
We are in an interesting case where the gradient of the functions are much easier to
compute than the functions themselves, since they remove the integrals:

∇f(x) = [x[k] * funcs[k](x[k]) for k in 1:N]

∇g(x) = [funcs[k](x[k]) for k in 1:N]

Let us pick a random starting point, such that all four vessels have the same height:

x0_height = rand()
x0_uniform = [x0_height for _ in 1:N]

∇f(x0_uniform) - x0_height * ∇g(x0_uniform)

and we obtain a vector of zeros as planned.

The rest of this post will be about trying to find the same optimum
without using the KKT trick, implementing an iterative algorithm solving the
problem in a generic form.
This will require several steps:

  1. From a given iterate, find a direction to follow;
  2. Ensure each iterate respects the constraints defined above (no thugs in physicstown);
  3. Converging to the feasible solution (which we know from Mark Levi’s post, but no cheating);
  4. Defining stopping criteria.

An interesting point on the structure of the problem,
this is not a generic equality-constrained non-linear problem,
the domain defined by $G(x) = 0$ is an Euclidean manifold, a smooth subspace
of $\mathbb{R}^N$. Other than throwing fancy words, having this structure
lets us use specific optimization methods which have been developed for manifolds.
A whole ecosystem has been developed in Julia
to model and solve optimization problems over manifolds.
We will not be using it and will build our method from scratch, inefficient but
preferred for unknown reasons, like your sourdough starter in lockdown.

Computing a direction

From a given solution, we need to be able to find a direction in which we can progress.
Fair warning, this is the most “optimization-heavy” section.

Let us start from a random point. To be sure you can reproduce all results,

Random.seed!(42)

# 4 uniform random points between [0,2]
x0 = 2 * rand(4) 
V0 = g(x0)
# 1.9273890036845946

In unconstrained optimization, the gradient provides us with information on the steepest
ascent direction, by following the opposite direction, the function will decrease, at least locally.

xnew = x_i - γ * ∇f(xinit)

See this good blog post
by Ju Yang several really good illustrations to grasp an intuition.
If we naively follow the descent direction minimizing $F$, we likely leave the
manifold, the region where $G(x) = 0$.

Think of the curve as the feasible region where we are supposed remain.
$x_i$ is our current iterate and the direction points to the steepest descent of $F(x)$,
i.e. $\nabla F(x)$.

Naive descent

Moving in this direction will drive our iterates away from the feasible region,
which is not desired. Instead, we will want to project this direction to follow
the equality constraints, like the red direction:

Projected descent

Of course, by following a fixed direction, the iterate ends up not on the curve, but not “too far”.
More importantly, we will have ensured that the point has not been moved for nothing, which would
be the case if we simply get away from the manifold.
We are looking for a search direction $d$:

  • which improves the objective function as much as possible: $\langle ∇F(x_i), d\rangle$ as low as possible
  • of same amplitude (norm) as $∇F(x_i)$
  • tangent to the manifold.

For the last requirement, we need a direction in the tangent space to the manifold, so $\langle \nabla G(x_i), d\rangle = 0$,
we end up with the following problem:

$$
\begin{align}
\min_{d} & \langle\nabla F(x_i), d \rangle \\
& \text{subject to:} \\
& \langle \nabla G(x_i), d\rangle = 0 \\
& \|d\| \leq \|\nabla F(x_i)\|
\end{align}
$$

If the objective value of this problem is strictly negative, we find a compatible direction
which improves the objective, otherwise there is none, and we have a local optimum.
The tangent space here is easy to express because we just consider one constraint,
it gets much more complex with multiple or non-smooth equalities.
The problem above is has a norm constraint, we can formulate it as a second-order
cone problem (SOCP) with an additional variable:

$$
\begin{align}
\min_{d, t} & \langle\nabla F(x_i), d \rangle \\
& \text{subject to:} \\
& \langle \nabla G(x_i), d\rangle = 0 \\
& t = |\nabla F(x_i)| \\
& \|d\| \leq t
\end{align}
$$

The second-order cone constraint is $\|d\| \leq t$.
SOCPs can be solved efficiently to optimality, so this is good news.
We will use JuMP to express the problem
and ECOS as underlying solver.

using JuMP
import ECOS

function compute_direction(grad_f, grad_g)
    N = length(grad_f)
    m = Model(ECOS.Optimizer)
    MOI.set(m, MOI.Silent(), true)
    @variable(m, d[1:N])
    @constraint(m, grad_g ⋅ d == 0)
    @variable(m, t == norm(grad_f))
    @constraint(m, [t;d] in SecondOrderCone())
    @objective(m, Min, d ⋅ grad_f)
    optimize!(m)
    termination_status(m) == MOI.OPTIMAL || error("Something wrong?")
    return JuMP.value.(d)
end

If you don’t know about JuMP, just assume this solves the problem above and returns the “red” direction
from the diagram above.
On the point x0 defined above, the naive descent direction yields:

∇f(x0)
# 4-element Array{Float64,1}:
#  1.0663660320877226
#  1.649139647696062
#  0.013306072041938516
#  0.08274729914625051

and the projected gradient:

d = compute_direction(∇f(x0), ∇g(x0))
# 4-element Array{Float64,1}:
#  -0.7980557152422237
#  0.0054117074806136894
#  1.66214850442488
#  0.6812946467870831

Note that all elements in $∇f(x0)$ are positive, which makes sense from the intuition of the physics,
the water in each vessel has a weight, thus exercising a pressure downwards.

There is still one thing we forgot once the direction is found.
Remember the positivity constraint $x_k \geq 0$?
It ensures the solution found makes sense, and that fluid mechanics
specialist won’t laugh at the solutions computed.
If one of the coordinates of the found point is negative,
what we can do is maintain the direction, but reduce the step.
Notice that one of our containers has an area of $2\sqrt{x}$,
reaching $x=0$ could lead to odd behaviour,
we will maintain the constraint x_k \geq minval with
minval a small positive number.

function corrected_step(x, d, γ = 0.05; minval = 0.005)
    res = x + γ * d
    for k in eachindex(res)
        if res[k] < minval
            γ = (minval - x[k]) / d[k]
            res = x + γ * d
        end
    end
    return res
end

Note: in a general setting, a more appropriate method like the active set method
would have handled inequality constraints in a cleaner way.
In our case, if a height is close to 0, it will not stay there but
“bounce back”, so keeping track of active sets is unnecessary.
So we now have an iterate, the res variable returned from corrected_step,
which will always respect the positivity constraints and be improving the
objective in general.

Projecting on the manifold

We know in which direction $d$ the next iterate must be searched and have found an adequate step size $\gamma$,
but a straight line can never perfectly stick to a curved surface.
So once the direction is found and a new iterate $x_i + \gamma d$ computed,
we need to project this iterate on the manifold, i.e. find the solution to:

$$
\begin{align}
\min_{x} & dist(x, x_i + \gamma d) \\
& \text{subject to:} \\
& \nabla G(x) = 0
\end{align}
$$

Sadly, this is where we need evaluations of $G(x)$, which is notably more expensive
than its gradient. Evaluating $\nabla G(x_i + \gamma d)$ gives us either 0
(the volume conservation holds), a positive or negative quantity (for a volume creation or destruction).
We can shift all the vessel heights by a same scalar $\alpha$ until G(x_i + \gamma d + \alpha) = 0$.

function h(x)
    function(α)
        g(x .+ α) - V0
    end
end
h(corrected_step(x0, d, 1.5))(-0.5)
# 1.0821379579735244
h(corrected_step(x0, d, 1.5))(-1.0)
# -1.0689727809622327

The problem then becomes a root-finding problem on $h(x)(\alpha)$.
Typical methods for solving a root-finding problems
are Newton-type methods, bisections. We will use the
Roots.jl package, this post is already too
long to implement one from scratch.

# computes the good alpha, starting from 0
Roots.find_zero(h(corrected_step(x0, d, 1.5)), 0.0)
# -0.7168922312579131

g(corrected_step(x0, d, 1.5) .+ -0.7168922312579131) - V0 
# -4.440892098500626e-16
# good enough

Putting it all together

We now have all the ingredients to make this algorithm work:

Compute a gradient, correct it for negative points, project it on the manifold with the SOCP,
re-project the resulting point with root-finding on alpha.
We will stop the algorithm either:

  • if a number of iterations is reached (which is considered a failure since we did not converge)
  • the norm of the projected gradient is almost zero and we would not move in a new iterate
  • the distance between two successive iterates it low enough

    function find_equilibrium(funcs, x0; mingradnorm=10e-5, maxiter = 1000, γ = 0.05, mindiff=10e-4)
    xs = [x0]
    niter = 0
    while niter <= maxiter
        x = xs[end] # last iterate
        # compute projected direction
        d = compute_direction(∇f(x), ∇g(x))
        # keep new point in positive orthant
        xpos = corrected_step(x, d, γ)
        # project point on Manifold
        α = Roots.find_zero(h(xpos), 0.0)
        xnew = xpos .+ α
        push!(xs, xnew)
        niter += 1
        if norm(d) < mingradnorm
            @info "Min gradient condition reached"
            return xs
        end
        if norm(x - xnew) < mindiff
            @info "Min difference condition reached"
            return xs
        end
    end
    @info "Max iterations reached without convergence"
    return xs
    end
    

Giving it a try with a first rough idea of parameters:

xs = find_equilibrium(funcs, x0, γ = 0.005, maxiter = 5000)
# Info: Max iterations reached without convergence

Bad sign already, let’s check the iterates:

xs_pivot = map(1:4) do k
    getindex.(xs, k)
end

plot(xs_pivot)
xlims!(1, 200)

Plot 1

Let us zoom is:

plot(xs_pivot)
xlims!(100, 120)
ylims!(0.5, 0.8)

Plot 1

It converges nicely at the beginning, and then oscillates
around the equilibrium without reading a satisfying convergence criterion.

We can damp the step $\gamma$ by reducing it at each iteration,
multiplying it by a damping rate in (0,1).

function find_equilibrium_damp(funcs, x0; mingradnorm=10e-5, maxiter = 1000, γ = 0.05, mindiff=10e-5, damp_factor=0.95)
    xs = [x0]
    niter = 0
    while niter <= maxiter
        x = xs[end] # last iterate
        # compute projected direction
        d = compute_direction(∇f(x), ∇g(x))
        # keep new point in positive orthant
        xpos = corrected_step(x, d, γ)
        # project point on Manifold
        α = Roots.find_zero(h(xpos), 0.0)
        xnew = xpos .+ α
        push!(xs, xnew)
        niter += 1
        if norm(d) < mingradnorm
            @info "Min gradient condition reached"
            return xs
        end
        if norm(x - xnew) < mindiff
            @info "Min difference condition reached"
            return xs
        end
        γ *= damp_factor
    end
    @info "Max iterations reached without convergence"
    return xs
end

Let us reuse the same parameters as before:

xs = find_equilibrium_damp(funcs, x0, γ = 0.5, maxiter = 5000)
# Info: Min difference condition reached

Damp 1

The result is even more oscillatory, but the result converges this time,
with successive iterates being close enough.

By tuning both the damping factor and step size a bit more,
we reach a satisfying, smooth enough convergence:

xs = find_equilibrium_damp(funcs, x0, γ = 0.0122, maxiter = 5000, damp_factor = 0.9785)
# Info: Min difference condition reached

Damp 1

Conclusion and perspective

I wanted to add a section on the corresponding dynamical system, namely a
differential algebraic equation (DAE) system, but this is clearly long enough,
and I couldn’t get anything to work.
TL;DR: we modelled the equilibrium and developed a technique to find it, relying
on local optimization tools. The problem structure allowed us to express the gradient
and estimate projection steps using cheap enough methods, namely SOCP and root finding
on a univariate function.

Interesting thing to do on top of this:

  • Leverage the toolbox already present in JuliaManifolds;
  • Replace the first-order method used here with higher-order, which should come cheaply from the decomposability of both $F$ and $G$.

The latter point should also lighten the requirements for hand-tuned step size and damping factor.

Sources

Some ideas for this post came from a talk by Antoine Levitt
at the Julia Paris meetup, where he presented some applications of optimization on manifolds for
quantum physics (if I recall?).