Re-posted from: https://white.ucc.asn.au/2018/05/28/Optimizing-your-diet-with-JuMP.html
I’ve been wanting to do a JuMP blog post for a while.
JuMP is a Julia mathematical programing library.
It is to an extent a DSL for describing constrained optimisation problems.
A while ago, a friend came to me who was looking to get “buff”,
what he wanted to do, was maximise his protein intake, while maintaining a generally healthy diet.
He wanted to know what foods he should be eating.
To devise a diet.
If one thinks about this,
this is actually a Linear Programming problem – constrained linear optimisation.
The variables are how much of each food to eat,
and the contraints are around making sure you have enough (but not too much) of all the essential vitamins and minerals.
Note: this is a bit of fun, in absolutely no way do I recommend using the diets the code I am about to show off generates.
I am in no way qualified to be giving dietry or medical advice, etc.
But this is a great way to play around with optimisation.
So lets get started:
Humans need nutrients, food contains nutrients do humans eat food.
We could go down the path of consuming nutrient soup directly, but it is considered unfun.
People are also dubious of it ability to provide a complete diet due to sciences incomplete knowledge of exactly what trace minerals we might need.
The same criticism kinda occurs here too.
Input:
We are going to load up JuMP.
We are using GLPK as the solver.
We actually have many options for the solver
Last time I checked (which was a while back), GLPK was the only free solver that supported callbacks with Mixed Integer Programming.
We are not requiring that so our choices are much wider.
I find the commercial Gurobi solver to be orders of magnitude faster the GLPK (for MIP problems); and they are nice enough to give away liscense to accademics, but I do not have it installed on this computer, so GLPK will do.
So when defining a diet the goal is the choose to eat certain amounts of different foods,
such that when totalled all your nutrient needs are taken care off.
So the first thing we are going to need to know is what are the nutrient contents of various foods we can buy.
A table breaking that down has been produced by Food Standards Australia.
Someone might like to try and get this working with
USDA Food Composition Databases,
which is likely a bit more comprehensive.
We define a data dependency of this program on that database using DataDeps.jl:
Input:
Input:
Output:
Food ID | Survey ID | Food Name | Survey flag | Energy, with dietary fibre (kJ) | Energy, without dietary fibre (kJ) | Moisture (g) | Protein (g) | Total fat (g) | Available carbohydrates, with sugar alcohols (g) | Available carbohydrates, without sugar alcohol (g) | Starch (g) | Total sugars (g) | Added sugars (g) | Free sugars (g) | Dietary fibre (g) | Alcohol (g) | Ash (g) | Preformed vitamin A (retinol) (µg) | Beta-carotene (µg) | Provitamin A (b-carotene equivalents) (µg) | Vitamin A retinol equivalents (µg) | Thiamin (B1) (mg) | Riboflavin (B2) (mg) | Niacin (B3) (mg) | Niacin derived equivalents (mg) | Folate, natural (µg) | Folic acid (µg) | Total Folates (µg) | Dietary folate equivalents (µg) | Vitamin B6 (mg) | Vitamin B12 (µg) | Vitamin C (mg) | Alpha-tocopherol (mg) | Vitamin E (mg) | Calcium (Ca) (mg) | Iodine (I) (µg) | Iron (Fe) (mg) | Magnesium (Mg) (mg) | Phosphorus (P) (mg) | Potassium (K) (mg) | Selenium (Se) (µg) | Sodium (Na) (mg) | Zinc (Zn) (mg) | Caffeine (mg) | Cholesterol (mg) | Tryptophan (mg) | Total saturated fat (g) | Total monounsaturated fat (g) | Total polyunsaturated fat (g) | Linoleic acid (g) | Alpha-linolenic acid (g) | C20:5w3 Eicosapentaenoic (mg) | C22:5w3 Docosapentaenoic (mg) | C22:6w3 Docosahexaenoic (mg) | Total long chain omega 3 fatty acids (mg) | Total trans fatty acids (mg) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 10F40019 | 3.1103e7 | Beef, extract, bonox | missing | 401.0 | 401.0 | 56.6 | 16.6 | 0.2 | 6.5 | 6.5 | 6.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 19.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.36 | 0.27 | 5.4 | 7.71 | 6.0 | 0.0 | 6.0 | 6.0 | 0.23 | 8.0 | 0.0 | 0.6 | 0.6 | 110.0 | 9.1 | 2.0 | 60.0 | 360.0 | 690.0 | 4.0 | 6660.0 | 1.5 | 0.0 | 0.0 | 136.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 13A12001 | 3.1302e7 | Basil, dried | missing | 1079.0 | 774.0 | 10.0 | 18.2 | 5.5 | 15.5 | 15.5 | 15.5 | 0.0 | 0.0 | 0.0 | 38.2 | 0.0 | 15.5 | 0.0 | 27135.0 | 27334.0 | 4556.0 | 0.327 | 1.9 | 7.79 | 11.62 | 436.0 | 0.0 | 436.0 | 436.0 | 1.34 | 0.0 | 337.0 | 6.9 | 6.91 | 2091.0 | 57.0 | 16.36 | 273.0 | 291.0 | 3818.0 | 2.7 | 118.0 | 17.27 | 0.0 | 0.0 | 225.0 | 2.36 | 1.4 | 0.61 | 0.26 | 0.34 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 10E10113 | 3.1302e7 | Cardamom, seeds, ground | missing | 1333.0 | 1109.0 | 8.3 | 10.8 | 6.7 | 40.5 | 40.5 | 31.0 | 9.5 | 0.0 | 0.0 | 28.0 | 0.0 | 5.8 | 0.0 | 0.0 | 0.0 | 0.0 | 0.198 | 0.182 | 1.1 | 3.74 | 0.0 | 0.0 | 0.0 | 0.0 | 0.23 | 0.0 | 21.0 | 0.5 | 0.5 | 383.0 | 0.5 | 13.97 | 229.0 | 178.0 | 1119.0 | 0.5 | 18.0 | 7.47 | 0.0 | 0.0 | 155.0 | 2.2 | 2.81 | 1.39 | 1.0 | 0.39 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 10E10098 | 3.1302e7 | Chilli (chili) powder | missing | 1441.0 | 1167.0 | 7.8 | 12.3 | 16.8 | 20.5 | 20.5 | 10.4 | 10.1 | 0.0 | 0.0 | 34.2 | 0.0 | 11.8 | 0.0 | 15000.0 | 17790.0 | 2965.0 | 0.35 | 0.79 | 7.9 | 8.98 | 9.0 | 0.0 | 9.0 | 9.0 | 2.09 | 0.0 | 0.0 | 38.1 | 38.14 | 278.0 | 0.5 | 14.3 | 170.0 | 303.0 | 1920.0 | 20.4 | 1010.0 | 2.7 | 0.0 | 0.0 | 64.0 | 2.41 | 3.15 | 7.85 | 7.32 | 0.52 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
If you talk a look at that table is should quickly become apparent that everything is for 100g of that foodstuffs.
So basical dataframes usage lets us look up a column:
Input:
Output:
5740-element Array{Float64,1}:
16.6
18.2
10.8
12.3
14.1
4.2
6.0
13.0
18.4
12.7
23.0
8.5
11.1
⋮
3.3
3.2
2.2
2.6
0.9
1.3
1.3
1.2
1.2
1.2
1.1
1.0
If I wanted to know how much Protein was in 100g of Beef extract (the first food in the list).
Input:
Output:
16.6
If I wanted to know how much Protein was in 100g of Beef extract (the first food in the list) and 150g of Cardamom seeds (the 3rd item).
That would be:
Input:
Output:
32.800000000000004
If I had a big list (a vector) of how much of each item I was having (in units of 100g),
I could write it using the dot product.
Input:
Output:
32.800000000000004
Our “diet” is just that, it is a long list specifying how much of each food is to be eaten each day.
(It is hopefully a sparse list).
So we can use that to express our constraints.
I’m going to define a function to construct such as diet as a JuMP variable.
The syntax
@variable(m, x[1:10], lowerbound=0)
says: within the model m
to declare a new variable x
that is a vector indexed between 1 and 10,
and it comes with the built in constraint that is its greater than 0
.
A JuMP model is basically the representation of the system of contraints and the optimisation function etc;
so when we declare a new constraint or variable we always pass in the model to specify which system we are working with.
(You might be building multiple systems at once.)
Input:
Output:
new_diet (generic function with 2 methods)
The syntax:
@constraints(m, begin
5 <= x
3 <= y <= 10
end)
Says to that in model m
with declare that the variable x
must be 5 or greater, and that variable y
must be between 3 and 10 (inclusive).
Our constraints are going to be a bit more complicated than that.
In particular I am going to use the dot product we expressed before to calcuate the total intake of a particular nutrient.
To do that I will define a closure over the variable diet
,
which I will call intake
. One might need to spend a little time looking at this.
I spent some time digging up some constraints on what nutrients people need.
You can see the references below in the comments.
We are going to use them to define our basic constraints before considering what we are working with.
We’ld define a function to add those constraints to our model.
Input:
Output:
basic_nutrient_requirements (generic function with 2 methods)
Note this list is not comprehensive, (or necessarily entirely correct).
I started with a fairly small list of constraints then added to it based on the weirdnesses of the diets that were being generated.
For example, some of my initial diets generated included something like “Drink 4L coffee per day”,
so I added a constrain against consuming more than 400 mg of caffine.
Feel encourage to mess around, and change or add constraints (like we will below).
The snippit below is useful to search for the column name if you don’t know how it is specified – the table is a bit large to do that by hand.
I’d recommend against ever trying to display the full model: JuMP will display it mathematically,
and in JuPyTer even will render it as LaTeX.
But it is utterly huge:
the system so far has 25 constraints, each referring to 5740 variables
Input:
Output:
2-element Array{String,1}:
"Vitamin B6 (mg)"
"Vitamin B12 (µg)"
The last thing we will need before we can start getting solutions is a way to view the result.
getvalue(diet)
will return the solve value (once solved).
But since out diet is defines is a vector with 5740 entries, just showing it doesn’t get us much.
So I define an function to print it out neatly
Input:
Output:
show_diet (generic function with 1 method)
Diet 1: Minimize Energy, with only basic nutritional constraints (LP)
So now we are ready to go.
We set up out system of constraints,
add an objective to minimze the total energy (I guess for weight loss),
and solve it.
It is a linear programming problem.
It is actually underconstrained, so almost certainly has multiple optimal solutions.
But that is fine we just want one.
Input:
Output:
So looking at that result it is fairly ok.
A buch of basically supplements, plus some oils, and vegetables.
That makes up your vitamins and minerals I’ld say.
The packet of “Lolly, non-mint flavours, intense sweetened”, makes up your carbs.
2 Problems:
Firstly, it is suggesting dringking 32L of water in a day.
Why is that? This is because water has basically no nutritional impact to our constraints,
so it can take almost any value.
To solve this, a simple trick is to add a small factor to out objective,
saying “All other things being equal, a diet with less total mass, is better”.
This can be done by adding a term + 0.01sum(diet)
.
The 0.001 is small enough that it shouldn’t impact the true objective much.
Secondly, it recommends drinking over 2 kg of tea, and 2 kg of watered milk.
That is pretty boring,
so lets add a constraint that one shouldn’t ever consume >500gram of the same item.
Diet 2: Minimize Energy, improved (LP)
Input:
Output:
This is a bit better than before.
I particularly appreciate the handful of Cumquats and the half an oyster.
Still rather high on Milk, since it has basically worked around the 500g limit, by using 3 different types of watered-down Milk.
Also that seems like a whole lot of coffee, even if some is decaff.
How much caffine is that?
Input:
Output:
399.99999999999943
So it is right up against out boundary of <400g caffine per day.
This isn’t surprising, since the optimal solution to a linear programming problem is somewhere on the edge of the feasible region.
I’m just not happy with the amount of liquid in this diet still though.
Diet 4: Minimize Energy, improved, limit liquid portions (LP)
I think, I’ld just like my diet to be less than 25% liquid by mass.
After all I do drink water on my own.
Input:
Output:
mmmmmmmmmm, yum. Powdered suppliments.
One thing that is going on here is that it is stocking up on powder’s to bring the total moister portion down.
For example Thiamin is B1 suppliments,
Apparently you can’t normally overdose on B1,
and so we have no containts on B1 intake, except the 500gram max.
Diet 5: Minimize Energy, improved, ban liquids (LP)
What I really want to do is just remove items from the list that are basically liquids.
I could filter the list to remove anything with more than 96% liquid,
but actually that knocks out things like lettuce, but doesn’t knock out all the varients milk.
More sensible is to just blacklist by name, and cross-ref with moisture.
Input:
Output:
forbid_liquids (generic function with 2 methods)
Input:
Output:
I hope you enjoy your shashi slice of lamb and goat.
Diet 6: Maximize protein (LP)
So my friend trying to get buff, he wanted to maximize protein.
Input:
Output:
Again, if we just want to maximize Protein, then the solution comes out rather silly.
It basically says eat as much protein containing food as you can until you hit one of the nutrient upper bounds.
Almost 25Kg, of Lard, over 42Kg of powerered sweetener.
It is kinda amazing that this is apparently hitting all the nutritonal requirements.
Obvious mistake is we forgot to bound the amount of energy.
Diet 7: Maximize Protein with Energy Constraints (LP)
We could do the Basal Metabolic Rate calculations,
to determine how much energy a person needs,
but I’m willing to just ball-park it.
For an person not doing much exercise Nutrition Australia says roughly says between 7,000Kj and 11,000Kj.
Using their calculator for my friend who is working out tons,
says up to 15,000Kj.
So lets say between 8,000Kj and 15,000Kj.
Input:
Output:
Diet 7: Minimize Weight (LP)
What is the least weight of food we can be carrying while hiking?
We’ll up the minimum energy to take into account the high expenditure,
and remove the cap on it (since weight alone can constrol our upper bound)
Input:
Output:
This is actually fairly inline with the kind of stuff intense hikers eat.
Nuts (well nut, singular), crisps, protein powder, cereal.
The trick in here seems to be the use of Breakfast cereal, to hit most nutient requirements,
the Seaweed/nori to cover a whole bunch more, in particular iodine.
and then abusing the actual contents of additives and subsitutes like acesulfame-potassium and stevia, to hit the others.
Conclusion
One can go a long way with just linear programming.
To go a bit further (for example if you had the data require nonfactional amounts of fruit etc), you’ld need mixed integer programming.
JuMP makes both easy; and with this number of variables even the NP-Hard mixed iteger programming can be solved in fairly reasonable time using the free solvers.
I’ll conclude with another disclaimer: please don’t try using diets you’ve made up based on constraints you’ve found online, and solved using linear programming.
I do not want to know what actually happens when someone eats 42Kg of powdered sweetener.
Still I hope this has been illustrative of the power of constrained optimization.
In general if one can often transform problem into linear programming (P), or mixed integer programming (NP-hard),
then that is a generally a short path to solving it.
And unless you’ve screwed up, such a formulation will likely correspond to the algorithm of minimum known time complexity for your problem.