Tag Archives: Julia

Optimization with JuMP in Julia

By: Jaafar Ballout

Re-posted from: https://www.supplychaindataanalytics.com/optimization-with-jump-in-julia/

As a result of the emergence and progress of open-source languages SCM and OR analysts have access to a growing number of constantly improving tools and solvers. Julia, for example, is a flexible dynamic open-source programming language that is appropriate for scientific and numerical computing. Julia’s source code is available on GitHub and the programming language is supported by a growing number of frameworks and applications for optimization and scheduling.

I introduced Julia in one of my previous posts already, covering a linear programming example. In this article I will give a more comprehensive introduction to Julia, comparing it with other programming languages, and specifically focusing on optimization.

JuMP: A package for mathematical programming in Julia

JuMP (Julia for Mathematical Programming) is a package available in Julia. This modelling framework allows users to formulate an optimization problem (linear, mixed-integer, quadratic, conic quadratic, semidefinite, and nonlinear) with easy-to-read code. The problem can then be solved by optimizers (solvers) written in low-level languages. The selected solver for this demonstration is GLPK which is a state-of-art open-source solver for linear programming (LP) and mixed-integer programming (MIP) problems. It incorporates algorithms such as the simplex method and the interior-point method.

Comparing Julia vs. Matlab for optimization problems

At first, a quick comparison between Julia and Matlab is necessary to appreciate the power of open-source programming languages.

Criteria Julia 🙂 Matlab 🙁
Cost Free Hundreds of dollars per year
License Open-Source One year user license
Source Non-profitable foundation and community MathWorks company
Scope Mainly numeric Numeric only
Performance Excellent Faster than Python but slower than Julia
Editor/IDE Jupyter is recommended IDE already included
Packaging Pkg manager included Toolboxes included
Usage Generic Industry and research in academia
Fame Young but starts to be known Old and well-known
Support Community MathWorks
Documentation Growing Okay

JuMP and GLPK packages for optimization with Julia

JuMP:

  • Modeling language and collection of supporting packages for mathematical optimization in Julia
  • Supported solvers: GLPK, CLP, CBC, CPLEX, ECOS, GUROBI, HiGHS, MOSEK, XPRESS, OSQP, PATH, ProxSDP SCIP, SCS, SDPA.

The first step is to add the required packages: JuMP and GLPK. This is done using Pkg which is a built-in package manager in Julia. I can add the requried or desired packages using the following commands in Julia REPL. If Julia is installed, typing Julia in the command line is enough to open REPL (Julia run command).

using Pkg
Pkg.add("JuMP")
Pkg.add("GLPK")

Now, I should import the necessary packages in the IDE which is a Jupyter notebook in our case.

# Import pacakges: JuMP and GLPK 
using JuMP # Julia package - high level programming language
using GLPK # open-source solver - low level language

A test problem

Let us consider the following optimization model:

The feasible region (as x and y are continuous decision variables) is shown below:

According to the gradient of the objective (the red arrow) and direction of the optimization (maximization), the green point is the optimal solution, in which x=3.75, y=1.25, and the optimal value of the objective function is 23.75. I will solve this simple continuous linear programming model with the JuMP package in Julia and comment on its syntax.

Solving the problem

The following commented code aims to solve the proposed linear programming model with JuMP (the name of the package) in Julia:

# Define the model
m = Model(); # m stands for model

# Set the optimizer
set_optimizer(m,GLPK.Optimizer); # calling the GLPK solver

# Define the variables
@variable(m, x>=0);
@variable(m, y>=0);

# Define the constraints 
@constraint(m, x+y<=5);
@constraint(m, 10x+6y<=45);

# Define the objective function
@objective(m, Max, 5x+4y);

# Run the solver
optimize!(m);

# Output
println(objective_value(m)) # optimal value z
println("x = ", value.(x), "\n","y = ",value.(y)) # optimal solution x & y

In future work I will explore the power of Julia in solving supply chain management problems, specifically mixed integer programming (MIP).

The post Optimization with JuMP in Julia appeared first on Supply Chain Data Analytics.

Engineering Trade-Offs in Automatic Differentiation: from TensorFlow and PyTorch to Jax and Julia

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/engineering-trade-offs-in-automatic-differentiation-from-tensorflow-and-pytorch-to-jax-and-julia/

To understand the differences between automatic differentiation libraries, let’s talk about the engineering trade-offs that were made. I would personally say that none of these libraries are “better” than another, they simply all make engineering trade-offs based on the domains and use cases they were aiming to satisfy. The easiest way to describe these trade-offs is to follow the evolution and see how each new library tweaked the trade-offs made of the previous.

Early TensorFlow used a graph building system, i.e. it required users to essentially define variables in a specific graph language separate from the host language. You had to define “TensorFlow variables” and “TensorFlow ops”, and the AD would then be performed on this static graph. Control flow constructs were limited to the constructs that could be represented statically. For example, an `ifelse` function statement is very different from a conditional `if` then `else` of code because `ifelse` would semantically be the same as always calling both branches and then choosing the result, thus only having a single code path (though I say semantically because further compiler optimizations may and usually do reduce that). This static sublanguage is then represented in an intermediate representation (IR) known as XLA which then performed a lot of simplification of linear algebra, and AD was done using the simple graph representation algorithms given that there was no true control flow at this representation. While this gives a lot of efficiency (XLA is great for simplification because it can easily see the whole world), it of course had some major downsides in terms of flexibility and convenience.

Thus you can almost think of this as a source code transformation because all of the autodiff is done on essentially an IR for a language which is not the same as the host language, but for the most part it was requiring the user does the translation to the new language for the AD system which is… rather inconvenient.

PyTorch came along to solve the flexibility and convenience issues by instead using a tape-based method. It generates the code to autodiff every time you run the forward pass by simply storing the operation that it sees in a given forward pass, and then differentiates that set of operations in reverse. This “building of the tape” is done by operator overloading as part of the Tensor type PyTorch says you need to use. How it works is easy to see. For example, f(2.0) would take the first branch of the if statement and then run the while loop 5 times. So then the AD pass would take that set of operations and start running backpropagation through 5 passes of the while loop and back through the first branch. Notice that by using this form, the AD does not “see” any dynamic control flow: that was all in Python, but not in the tape. Thus the AD does not have to handle dynamic control flow, and this makes it very easy to handle a lot of odd cases of the language. The downside to this approach though is that the AD is “per value”, i.e. you cannot do a lot of optimizations on the backwards passes because you will not necessarily ever see the same backwards pass again, and this allows for a lot less optimization.

Does this harm PyTorch’s efficiency beyond repair? Well, no and yes. No it does not harm efficiency in the sense of, most machine learning algorithms are so heavily reliant on expensive kernels, such as matrix multiplication (`A*x`), `conv`, etc., so the amount of work per operation is extremely high in most ML applications that it hides the overhead of this approach. This allows the PyTorch team to spend most of its time optimizing the 2,000+ operators that it provides, and so most people in ML see PyTorch as fast because it comes with fast kernels (fast conv calls, fast GPU linear algebra) despite the AD overhead. That said, you can very easily run into cases where AD and Python interpreter overhead are not washed out. Cases of that are where your arrays are small or where a lot of scalar operations are happening, for example the Julia vs PyTorch Neural ODE benchmarks on cases matching scientific model discovery workflows you see a 100x performance improvement in Julia (even major differences without AD in the ODE and SDE solvers), and can mostly be attributed to language and AD overhead due to the small kernels used in these cases. For this reason the PyTorch team has been working on things like `torch.@jit` as a separate sublanguage that can compile and optimize differently from the rest of the code, specific to handling these cases, though there’s a lot of discussion of the long-term viability of that approach. But anyways, PyTorch has done really well because it made good choices for its domain of use.

So then TensorFlow Eager (2.0) comes around as adds dynamic control flow support in a manner similar to PyTorch as a sad attempt to get everyone back, but of course then it doesn’t play nicely will all of the XLA tooling (because it cannot see the whole graph of all possible operations for all input values to optimize it well) so it didn’t hit the TensorFlow speeds everyone was expecting, so it was kind of the worst of all worlds.

Subsequent tools then all sought ways to either expand the domains of these ideas or try to mix some of the advantages of the two sides. Jax is one of those. Jax uses non-standard interpretation to build a copy of the full code in its own IR to then perform AD on, finally lowering it to TensorFlow’s XLA for optimizations. Jax’s non-standard interpretation is kind of like operator overloading in that it has special objects walk through code in order to build out the exprs (this is called the “tracing” step). But wait, how is it able to trace the full code if there’s dynamic control flow, won’t it have the same issues as PyTorch that it only sees parts of the full code’s potential paths? Indeed that is true, and that’s why it doesn’t want you to use full dynamic control flow and instead use Jax primitives like lax.while, which are function calls that can be caught during tracing to avoid the code having true dynamic behavior at trace time. Also, for this to be true you need that what your function does can be completely determined by its inputs, i.e. the functions must be “pure”. For this reason Jax requires programming in a functional style with pure functions rather than the object-oriented standard of Python, thus a notable trade-off of the abstract interpretation approach. But what you essentially get is a more natural graph builder for TensorFlow, because at the end of the day it ends up in TensorFlow’s XLA IR, and so you get the same efficiency there but in a form that can look and feel a lot more natural. The downside of course is that you still don’t have true dynamism which is why those linked primitives exist, and why they are not well optimized as described in Jax – The Sharp Bits. However, “most” ML algorithms don’t use very much dynamism (example: recurrent neural networks know how many layers they have, they don’t have a while loop iterate to tolerance), and so “most” algorithms tend to do well in this sublanguage. In that sense, it can optimize a lot of codes rather naturally.

What about keeping dynamism in the AD?

This of course then begs the question, is it possible to keep the full dynamism of the host language in the AD system? It is possible, but it is hard. This is what a lot of the Julia AD tools have focused on with source code transformations (along with Swift for TensorFlow). However, since source code is “for humans”, it can be a rather difficult level to algorithmically work on. Thus instead these tools work on lowered IR, where these lowered representations remove a lot of the “cruft” of syntax to give a much smaller support surface. This was the core of Zygote.jl’s approach where it saw that by acting on the SSA IR it could directly support control flow like while loops without unrolling them into sets of operations (like PyTorch or TensorFlow Eager) or only supporting a sublanguage of control flow (like Jax). This is essentially done by converting while loops and other dynamic constructs into static (source code) representations that have new lines of code in there for things like stacks that keep information about the forward pass (like which branch is taken), and then these stacks are accessed and used in the generated backwards pass. Thus what code is generated is not dynamic (a while loop forward gives a for loop in reverse), but the generated backwords pass is dynamic (because it uses the stack to tell it how many times to walk the for loop). This allows AD to have a single code for all branches (unlike the tape-building forms) and thus it can optimize more like TensorFlow but in a world where the dynamic control flow is not eliminated.

Well that sounds like the best of both worlds, so why isn’t everyone using it? There’s two factors involved in that. First, accepting that your AD will have to deal with the full dynamic nature of an entire programming language means accepting a much more difficult job. The whole purpose of the AD approaches in TensorFlow/PyTorch/Jax is for these constructs to be eliminated before the AD, so they have a much smaller surface of language support required. Because of this added complexity, this pretty much guarantees you cannot use Python because it’s such a crazy language in terms of what it allows with dynamism (fun fact, the Jax folks at Google Brain did have a Python source code transform AD at one point but it was scrapped essentially because of these difficulties), and so people working on these solutions flocked to languages with clear syntax that is easy for compilers to optimize, i.e. Julia and Swift. Python has most of the ML crowd, so that creates a barrier to entry.

But even then, the problem is still very hard. In Julia it was found that Zygote acts on too high of an IR, i.e. before compiler optimizations, which then requires you do AD on unoptimized code only delete most of the work later, and so it would be better for it to go even lower. This is the reason why the Diffractor.jl project started. But there’s even a reason to act lower, since some optimization only occurs at the LLVM level, which is why Julia developers started directly building an AD system that acts on LLVM’s IR itself known as Enzyme (note that while this project included members of the Julia Lab like Valentin, because it acts at the LLVM level it is applicable to any LLVM compiled language, such as C/C++ (Clang) or Rust). There is then a trade-off that occurs with source code transform methods as you go lower and lower in the IRs which I describe in a separate post. tl;dr there: Enzyme can act after compiler optimizations so much of the higher level information might be deleted (at least, without completion of dialects like MLIR which aren’t quite ready). Enzyme only sees the barest of low level code so it may not have the high level linear algebra definitions to do all of the linear algebra simplifications, like how XLA will fuse many matrix-vector multiplications into a matrix-matrix multiplication, since some of the function calls may have been inlined and deleted. Optimizing this remining loopy code to reach BLAS speeds is thus as hard as generating looping code that reaches BLAS speeds, and history shows this is hard but not impossible. Additionally, function calls to a nonlinear solver may have already been deleted, so optimized adjoints which outperform the direct differentiation of code, like in the case of Deep Equilibrium Models (DEQs), may end up less optimized. But that lowest level allows for very efficient scalar code differentiation and mutation support. On the other hand, Diffractor uses Julia’s typed IR so it can apply higher level rules easily and consistently, and in theory it can do transformations similar to XLA (i.e. keeping BLAS calls intact and fusing them). But writing such analyses on a fully dynamic compute IR is difficult enough that it has not been done. Tooling around escape analysis and shape propagation are being built to try and enable such optimizations, but the fact remains that it’s a lot more work to do it on a language IR instead of a sublanguage graph like XLA. In theory you could have compiler passes prove that a function is semi-static in the sense of XLA and get the same optimizations as Jax or TensorFlow, but that doesn’t happen today and it’s not easy to do. The future of Julia AD systems will likely mix the Enzyme and Diffractor approaches to tackle this issue, but the clear trade-off being made here is generality at the cost of implementation complexity.

The second factor, and probably the more damning one, is that most ML codes don’t actually use that much dynamism. Recurrent neural networks, transformers, convolutions, etc. all have simpler forms of dynamism which in some sense is quite static. That’s an important trade-off most people don’t always consider: why solve problems your users don’t have? The number of layers you have do not depend on the values coming out of the layers. Support for dynamism for ML workflows is thus mostly about convenience, not necessity. When algorithms do have dynamism, in most cases you can get away with wrapping it as an operation in the language, i.e. defining a function and defining the adjoint derivative for that function. This for example is how Jax supports ODEs even though adaptive ODE solvers require knowing the calculated values in order to determine the number of steps. You cannot differentiate an ODE code with Jax, but if you use an ODE solver with a defined adjoint you are okay. While this does mean that some algorithms are not possible with Jax (at least without forgoing a lot of optimizations), and algorithms where differntiating solvers is fundamentally different from adjoint definitions can limit which performance/stability trade-offs can be made (see the supplemental section 8 for details in the case of ODEs about stability of “discrete adjoints”]), these factors seem to be rather rare in standard ML use cases which is why most people haven’t bothered to learn a new programming language to get around these issues.

That leaves us where we are today. Are more ML algorithms of the future going to require handling more dynamic structures? Is optimizing scalar and mutating code going to be important for people using AD systems? The reason why I know this story so well is because the answer for my domain, scientific machine learning (SciML), is yes. Climate models use mutation because reallocating huge buffers would greatly effect performance. Adaptive solvers on stiff equations are a fact of life, so simple adjoints used in PyTorch and Jax are unstable and simply give Inf as the gradients in these cases. Time will tell whether this physics-informed, expert-guided, science-guided, scientific machine learning domain becomes standard, but hopefully this describes how all of the choices made here were not “better” or “worse”, but instead it’s all about domain-specific engineering trade-offs.

The post Engineering Trade-Offs in Automatic Differentiation: from TensorFlow and PyTorch to Jax and Julia appeared first on Stochastic Lifestyle.

Linear programming in Julia with GLPK and JuMP

By: Jaafar Ballout

Re-posted from: https://www.supplychaindataanalytics.com/linear-programming-in-julia-with-glpk-and-jump/

In previous posts we have covered various linear programming examples in both Python and R. We have e.g. introduced lpSolve in R, PuLP in Python, Gekko in Python, MIP in Python, ortools in Python as well as many other packages and modules for linear programming. In this post I will demonstrate how one can implement linear programming in Julia.

Julia is a flexible dynamic language that is appropriate for scientific and numerical computing. Besides, Julia is an open-source project. Its source code is available on GitHub. Using Julia provides access to many packages already developed for this language. For the linear programming example presented in this post I will use the Julia packages GLPK and JuMP.

Financial engineering example utilizing JuMP and GLPK in Julia

Logo of Julia programming language

The demonstrated example is about a bank loan policy. This simple example will help us understand the power of linear programming in solving most of the problems we face in supply chain management and the financial fields. Indeed, Julia will be incorporated to solve the optimization problem. The example is adapted from Hamdy Taha’s book available on Amazon.

Type of loan Interest rate Bad-debt ratio
Personal 0.14 0.1
Car 0.13 0.07
Home 0.12 0.03
Farm 0.125 0.05
Commercial 0.10 0.02

A bank is in the process of devising a loan policy that involves a maximum of $12 million. The table, shown above, provides the data about the available loans. It is important to note that bad debts are unrecoverable and produce no interest revenue. Competition with other financial institutions dictates the allocation of at least 40% of the funds to farm and commercial loans. To assist the housing industry in the region, home loans must equal at least 50% of the personal, car, and home loans. The bank limits the overall ratio of bad debts on all loans to at most 4%.

Developing a mathematical problem statement

The hardest part of any optimization problem is building the mathematical model. The steps to overcome this hardship are as follows:

  1. Define the decision variables
  2. Write down the objective function
  3. Formulate all the constraints

Decison variables

According to the given in the problem, we need to determine the amount of loan in million dollars. The table shows five categories or types of loans. Thus, we need five decision variables corresponding to each category.

  • x1 = personal loans
  • x2 = car loans
  • x3 = home loans
  • x4 = farm loans
  • x5 = commercial loans

Objective function

The objective function is to maximize profits (net return). The net return is the difference between the revenues, generated by interest rate, and losses due to the bad-debt ratio.

Then, the objective function is as follows:

Constraints

Total Loans should be less or equal to 12 million dollars

Farm and Commercial loans equal at least 40% of all loans

Home loans should equal at least 50% of the personal, car, and home loans

Bad debts should not exceed 4% of all loans

Non-negativity

Application of linear programming in Julia

The first step is to add the required packages to solve the linear program. This is done using the <Pkg> which is the built-in package manager in Julia. We can add the needed packages using the following commands in Julia REPL. If Julia is installed, typing Julia at the command line, in the computer command prompt, is enough to open REPL.

Julia REPL
using Pkg
Pkg.add("JuMP")
Pkg.add("GLPK")

The work presented below is done in a Jupyter notebook with a Julia kernel. JuMP is a modeling language for mathematical optimization in Julia while GLPK is the solver that we will use. After importing the packages in Jupyter, we defined the optimization problem by creating a variable BM that stands for Bank Model. The next step is setting the optimizer by declaring the solver (GLPK) and the model (BM). Moving on, we specified the decision variables, constraints, and objective function.

In [1]:

## Importing the necessary packages ## 
using JuMP
using GLPK

In [2]:

## Defining the model ##
BM = Model() # BM stands for Bank Model

## Setting the optimizer ##
set_optimizer(BM,GLPK.Optimizer)

## Define the variables ##
@variable(BM, x1>=0)
@variable(BM, x2>=0)
@variable(BM, x3>=0)
@variable(BM, x4>=0)
@variable(BM, x5>=0)

## Define the constraints ##
@constraint(BM, x1+x2+x3+x4+x5<=12)
@constraint(BM, 0.4x1+0.4x2+0.4x3-0.6x4-0.6x5<=0)
@constraint(BM, 0.5x1+0.5x2-0.5x3<=0)
@constraint(BM, 0.06x1+0.03x2-0.01x3+0.01x4-0.02x5<=0)

## Define the objective function ##
@objective(BM, Max, 0.026x1+0.0509x2+0.0864x3+0.06875x4+0.078x5)

## Run the optimization ##
optimize!(BM)

Declaring the objective function and constraints in Julia is easier because the algebraic equation can be inputted as it is (i.e. no need to show the multiplication operator between the variable and the constant). After running the optimization model, we can view the output of the model as follows:

In [3]:

BM

Out[3]:

A JuMP Model
Maximization problem with:
Variables: 5
Objective function type: AffExpr
`AffExpr`-in-`MathOptInterface.LessThan{Float64}`: 4 constraints
`VariableRef`-in-`MathOptInterface.GreaterThan{Float64}`: 5 constraints
Model mode: AUTOMATIC
CachingOptimizer state: ATTACHED_OPTIMIZER
Solver name: GLPK
Names registered in the model: x1, x2, x3, x4, x5

In [4]:

objective_value(BM)

Out[4]:

0.99648

In [5]:

objective_sense(BM)

Out[5]:

MAX_SENSE::OptimizationSense = 1

In [6]:

println("x1 = ", value.(x1), "\n","x2 = ",value.(x2), "\n", "x3 = ",value.(x3), "\n", "x4 = ",value.(x4), "\n", "x5 = ",value.(x5))
x1 = 0.0
x2 = 0.0
x3 = 7.199999999999999
x4 = 0.0
x5 = 4.8

Analyzing the output of the model, we can see that the bank should spend 7.2 million dollars on home loans and 4.8 million dollars on commercial loans. This distribution will help the bank maximize its net returns (i.e. profits) to 0.99648 million dollars.

Concluding remarks

In this post, we explored the power of linear programming in the banking sector and used Julia to solve the LP optimization problem. It is nice to see that open-source languages can solve these kinds of problems in an easy way without extensive hours of coding. In the future, we will have an insight into non-linear problems in supply chain management.

The post Linear programming in Julia with GLPK and JuMP appeared first on Supply Chain Data Analytics.