By: Staging Team
Re-posted from: https://sciml.ai/news/2024/08/10/sciml_small_grants_successes/index.html
First Successes of the SciML Small Grants Program: Faster OrdinaryDiffEq Startup and New Benchmarks
By: Staging Team
Re-posted from: https://sciml.ai/news/2024/08/10/sciml_small_grants_successes/index.html
First Successes of the SciML Small Grants Program: Faster OrdinaryDiffEq Startup and New Benchmarks
Re-posted from: https://hasithv.github.io/posts/projects/24-08-17-basketballrandomwalk/
About two years ago, I attended a seminar given by Dr. Sid Redner of the Santa Fe Institute titled, “Is Basketball Scoring a Random Walk?” I was certainly skeptical that such an exciting game shared similarities with coin flipping, but, nevertheless, Dr. Redner went on to convince me–and surely many other audience members–that basketball does indeed exhibit behavior akin to a random walk.
At the very end of his lecture, Dr. Redner said something along the lines of, “the obvious betting applications are left as an exercise to the audience.” So, as enthusiastic audience members, let’s try to tackle this exercise.
Note: all code and data for this project can be found in the github repository [2]
I highly recommend reading the paper [1] that Dr. Redner et. al. published for a full understanding. However, here are the main points that we will need:
Two things I wanted to improve were to expand the dataset and to use bayesian updates to better estimate the $\lambda$ and $I_A$ for a game.
For the dataset, Dr. Redner only used games from 2006-2009, but I managed to obtain all playoff games after 2000. Using this, I looked at the distribution for the average number of plays per 30s
which gives us a prior for $\lambda$ that we can live-update to better fit our model to a given game (and we already have the prior for $X$):
$$\lambda \sim \mathcal{N}(1.005,0.1)$$
$$X \sim \mathcal{N}(1, \sqrt{0.0083})$$
Using simple bayesian updates, we should be able to properly estimate how likely a certain $\lambda$ or $I_A$ is given the game data of scoring rates $\{t_1,\ldots,t_n\}$, and who scored on each play $\{r_1,\ldots,r_n\}$:
$$\begin{align}
f(\lambda | \{t_1,\ldots,t_n\}) &\propto f(\{t_1,\ldots,t_n\} | \lambda) f(\lambda) \\
&\propto \left(\prod_{i=1}^{n} f(t_i | \lambda) \right) f(\lambda) \\
&\propto \left(\prod_{i=1}^{n} \lambda e^{-\lambda t_i} \right) \mathcal{N}(1.005,0.1) \\
&\propto \left(\lambda^n e^{-\lambda \sum_{i=1}^{n} t_i} \right) \mathcal{N}(1.005,0.1) \\ \\
f(X_A, X_B | \{r_1,\ldots,r_n\}) &\propto f(\{r_1,\ldots,r_n\} | X_A,X_B) f(X_A,X_B) \\
&\propto \left(\prod_{i=1}^{n} f(r_i | X_A,X_B) \right) f(X_A,X_B) \\
&\propto \left|\prod_{i=1}^{n} p\left(\frac{X_A}{X_A + X_B}, r_{i-1}, \Delta_{i-1} \right) – \frac{1-r_i}{2} \right| \cdot \mathcal{N}(1, \sqrt{0.0083})(X_A) \cdot \mathcal{N}(1, \sqrt{0.0083})(X_B)\\
\end{align}
$$
As you can see, the update for the $X$ values is a bit more complicated, but it is still fairly easy to compute. The code to do this is show below:
function update_rate!(params, time_deltas)
time_deltas = time_deltas/30
params.rate = (x) -> x^length(time_deltas) * exp(-x * sum(time_deltas)) * pdf(defaultRate, x) / params.rate_Z
normalize_rate!(params)
end
function update_strengths!(params, scoring_data, lookback=15)
lookback = min(lookback, length(scoring_data))
scoring_data = scoring_data[end-lookback+1:end]
score_probs = (x,y) -> prod(map((z) -> score_prob(z, x, y), scoring_data))
params.strengths = (x,y) -> score_probs(x,y) * pdf(defaultStrengths, x) * pdf(defaultStrengths, y) / params.strengths_Z
normalize_strengths!(params)
end
The real roadblock, however, is actually sampling the $\lambda$ and $X$ values from the pdfs.
Since we have access to the pdfs (even their normalizing constants are quite easy to compute using numeric methods), we can employ importance sampling as a brute force method. I am sure that there are fancier MCMC algorithms that could be used, but the unfriendly distribution of the $X$ values made it hard for me to use external libraries like Turing.jl
.
Anyhow, for interested readers, the reason we can use importance sampling to compute the expected value of a function $g$ with respect to a pdf $f$ using another pdf $h$ is because of the following:
$$\begin{align}
\underset{X \sim f}{\mathbb{E}}[g(X)] &= \int g(x) f(x) dx \\
&= \int g(x) \frac{f(x)}{h(x)} h(x) dx \\
&= \underset{X \sim h}{\mathbb{E}}\left[\frac{f(X)}{h(X)} g(X)\right]
\end{align}$$
Which also tells us that $h$ has the condition that it must be non-zero wherever $f$ is non-zero. When working with empricial calulations, the term $\frac{f(x)}{h(x)}$ is referred to as the weight of the sample for obvious reasons.
So, for our empirical estimations a good choice for $h$ is the prior distributions. The following code shows the implementation of the sampling functions:
function sample_params(game, n)
r = rand(defaultRate, n)
wr = game.params.rate.(r) ./ pdf(defaultRate, r)
s = rand(defaultStrengths, n, 2)
ws = game.params.strengths.(s[:,1], s[:,2]) ./ (pdf(defaultStrengths, s[:,1]) .* pdf(defaultStrengths, s[:,2]))
w = wr .* ws
return r, s, w
end
function sample_games(game, n=1000, k=1000)
results = zeros(n)
for i in 1:n
r, s, w = sample_params(game, k)
sample_results = zeros(k)
Threads.@threads for j in 1:k
X = s[j, 1]
Y = s[j, 2]
sample_results[j] = simulate_game(game, r[j], X, Y) * w[j]
end
results[i] = sum(sample_results) / k
end
return sum(results)/n
end
The final step is to simulate the games. This is quite easy to do if we are able to pass in all the parameters we need.
function simulate_game(game, lambda, Xa, Xb)
if length(game.plays) == 0
t = 0
s = 0
r = 0
else
t = game.plays[end][1]
s = net_score(game)
r = sign(game.plays[end][2])
end
while t < 2880
t += rand(Exponential(1/lambda)) * 30
if rand() < score_prob((1, r, s), Xa, Xb)
s += random_play()
r = 1
else
s -= random_play()
r = -1
end
end
return (sign(s)+1)/2
end
The above code snippets allowed me to peer into quite a few example games, and gave me the following conclusions about how baksetball random walks work:
Due to the performant nature of the code (thanks Julia
!), it made sense to spin up website with a simpler version of the model (no bayesian updates since they hardly made a difference and it’d be a lot of work for the user to input each play). This way, someone betting on a game can make a mathematically-backed decision on how to spend their money!
The application computes the odds for a team to win. In other words, it outputs the inverse probability for a team to win, so the closer it is to 1, the more likely that team is to win, and vice versa.
If a bookie offers a payout multiplier that is highger than the calcualted odds, it might be a good idea to buy it because we are prdicting that the team is more likely to win than the bookie thinks (thus, the bookie overpriced the payout).
I cannot host the application myself, but you can find the code for it–along with the instructions to run it–in the github repository [2].
By: Andrew Claster
Re-posted from: https://info.juliahub.com/blog/state-of-julia-2024
Kristoffer Carlsson (JuliaHub Software Engineer), Lilith Hafner and Dr. Elliot Saba (JuliaHub Director of EDA Engineering) presented the State of Julia 2024 during JuliaCon on July 9, 2024 in Eindhoven, Netherlands.