This is the second post on this years Santa Kaggle Challenge.
Haven’t read the first? -> Here you go!.
I wanted to blog more about it without sharing too many details as people might get mad, understandably. Unfortunately I fell back into competing mode and not into blogging mode. Sorry about that.
Feeling especially bad for my patrons as I also didn’t continue with my constraint solver posts. The gamification of Kaggle is too strong for me… Therefore I decided to not charge my patrons this month. My Christmas gift 🙂
In this post I want to share what have changed since last time by summarizing some ideas that came up in the forums and a bit about my own approach but I’ll not share code for that. Hopefully I will find the time to share all my code in a more structured way after the challenge. It would also be nice to know if someone wants to chat after the challenge and tell me about their ideas. That would be wonderful!
If you find any of this information useful and want to read more please comment and if you’re generally interested in optimization please check out other posts I wrote:
- How to build a constraint solver which is my latest long term challenge/series.
- The Simplex method: How to build an LP solver
- The hungarian method
- and many more
I don’t wanna beg for your money but if you consider to buy a second yacht you might give me some of your dollars 😉
You can donate via Patreon or PayPal.
Okay let’s start
I said last time that I want to take about linear programming for a bit.
A small recap on the problem: We want to match 5,000 families to 100 days and we want to minimize two types of costs. The preference costs which depend on how happy we can make the families (before giving them money) and the accounting costs which are an indication of how smooth the change between days is considering the number of people per day.
In mathematical terms:
Preference costs:
\[
\sum_{f \in F} \sum_{d \in D} \text{prefcost}_{f,d} x_{f,d}\]
Here prefcost depends on the preferences of each family and is a matrix of size \(5,000 \times 100\) and \(x_{f,d}\) describes the binary decision variable whether family \(f\) visits the workshop of Santa on day \(d\).
The accounting costs:
\[
\sum_{d=100}^{1} \frac{\left(N_{d}-125\right)}{400} N_{d}^{\left(\frac{1}{2}+\frac{\left|N_{d}-N_{d+1}\right|}{50}\right)}\]
Here \(N_d\) is the number of people assigned to day \(d\).
The preference cost is a linear function and is relatively easy. In that formulation it need still a lot of decision variables but okay.
Before we start with the linear program we might wanna have a short look at the accounting cost. Can it get huge? Like do we need to care about it? Or when is it small? Is there a way to linearize it?
Easy to visualize with Julia:
using Plots
acc(N,N1) = ((N-125)/400)*N^(0.5+abs(N-N1)/50)
acc_mat = zeros(176,176)
for i=125:300, j=125:300
acc_mat[i-124,j-124] =...