Is Basketball a Random Walk?

By: Julia on HasItHalted

Re-posted from: https://hasithv.github.io/posts/projects/24-08-17-basketballrandomwalk/

About two years ago, I attended a seminar given by Dr. Sid Redner of the Santa Fe Institute titled, “Is Basketball Scoring a Random Walk?” I was certainly skeptical that such an exciting game shared similarities with coin flipping, but, nevertheless, Dr. Redner went on to convince me–and surely many other audience members–that basketball does indeed exhibit behavior akin to a random walk.

At the very end of his lecture, Dr. Redner said something along the lines of, “the obvious betting applications are left as an exercise to the audience.” So, as enthusiastic audience members, let’s try to tackle this exercise.

Note: all code and data for this project can be found in the github repository [2]

Understanding the Model

I highly recommend reading the paper [1] that Dr. Redner et. al. published for a full understanding. However, here are the main points that we will need:

Assumptions

  1. Random Walk Definition: The net score, $\{\Delta_n\}_{n \in \mathbb{N}}$ (the difference between the scores of teams A and B) can be modeled as an anti-persistent random walk. This means that if the score moves up during a play, then the next play is more likely to move down.
    $$\Delta_n = \sum_{i=1}^{n} \delta_i$$
    $$\begin{cases}
    \delta_i > 0 & \text{with probability } p_i \\
    \delta_i < 0 & \text{with probability } 1 - p_i \end{cases}$$ $$p_n = (\textcolor{orange}{\text{other terms}}) - .152 \left(\frac{|\delta_{n-1}|}{\delta_{n-1}}\right), \quad \forall \; i,n \in \mathbb{N}$$ Where $\delta_i$ is the points made during the $i$th play, and $\Delta_0 = \delta_0 = 0$. This is explained by the fact that the scoring team loses possession of the ball, so it is harder for them to score again.
  2. Coasting and Grinding: The probability of a team scoring is proportional to how far they are behind in points:
    $$p_n = (\textcolor{orange}{\text{other terms}}) – .152 r_{n-1} – .0022 \Delta_{n-1}, \quad \forall \; n \in \mathbb{N}$$
    Here, $r_{n-1} = \left(\frac{|\delta_{n-1}|}{\delta_{n-1}}\right)$.This is explained in the paper as “the winning team coasts, and the losing team grinds.”
  3. Team Strengths: The strength of a team also has a effect on the probability of scoring:
    $$p(I_A, r_{n-1}, \Delta_{n-1}) = I_A – 0.152r_{n-1} – 0.0022 \Delta_{n-1}, \quad \forall \; n \in \mathbb{N}$$
    Where the strength of a team is defined by parameters $X_A$ and $X_B$ as $I_A(X_A, X_B) = \frac{X_A}{X_A + X_B}$. Additionally, $X_A$ and $X_B$ are distributed according to $\mathcal{N}(\mu = 1,\sigma^2=.0083).$
  4. Time Between Plays: The time between each play is exponentially distributed
    $$\tau_n \sim \text{Exp}(\lambda)$$
  5. Scoring Probabilities: For each play, the probabilities of scoring $n$ points is
    $$\begin{cases}
    \begin{align}
    \delta = 1, \quad &8.7\% \\
    \delta = 2, \quad &73.86\% \\
    \delta = 3, \quad &17.82\% \\
    \delta = 4, \quad &0.14\% \\
    \delta = 5, \quad &0.023\% \\
    \delta = 6, \quad &0.0012\% \\
    \end{align}
    \end{cases}$$
    ⚠️ The only confusion I have with the paper is that the above “probabilities” do not sum to 1, so I am not sure how to interpret them. I went ahead and removed $\delta=6$ and lowered the probability of $\delta=5$ so that the probabilities sum to 1. This should be okay since 5 and 6 point plays are so rare that they should not affect the model too much.

Building the Simulation

Gathering Simulation Data

Two things I wanted to improve were to expand the dataset and to use bayesian updates to better estimate the $\lambda$ and $I_A$ for a game.

For the dataset, Dr. Redner only used games from 2006-2009, but I managed to obtain all playoff games after 2000. Using this, I looked at the distribution for the average number of plays per 30s

Distribution for $\lambda$ values. The orange normal curve has mean 1.005 and std 0.1. I am not sure why there was a large deficit at the 1 play per 30s mark; it seems to be half as high as it shold be.

Distribution for $\lambda$ values. The orange normal curve has mean 1.005 and std 0.1. I am not sure why there was a large deficit at the 1 play per 30s mark; it seems to be half as high as it shold be.

which gives us a prior for $\lambda$ that we can live-update to better fit our model to a given game (and we already have the prior for $X$):

$$\lambda \sim \mathcal{N}(1.005,0.1)$$
$$X \sim \mathcal{N}(1, \sqrt{0.0083})$$

Bayesian Updating

Using simple bayesian updates, we should be able to properly estimate how likely a certain $\lambda$ or $I_A$ is given the game data of scoring rates $\{t_1,\ldots,t_n\}$, and who scored on each play $\{r_1,\ldots,r_n\}$:

$$\begin{align}
f(\lambda | \{t_1,\ldots,t_n\}) &\propto f(\{t_1,\ldots,t_n\} | \lambda) f(\lambda) \\
&\propto \left(\prod_{i=1}^{n} f(t_i | \lambda) \right) f(\lambda) \\
&\propto \left(\prod_{i=1}^{n} \lambda e^{-\lambda t_i} \right) \mathcal{N}(1.005,0.1) \\
&\propto \left(\lambda^n e^{-\lambda \sum_{i=1}^{n} t_i} \right) \mathcal{N}(1.005,0.1) \\ \\
f(X_A, X_B | \{r_1,\ldots,r_n\}) &\propto f(\{r_1,\ldots,r_n\} | X_A,X_B) f(X_A,X_B) \\
&\propto \left(\prod_{i=1}^{n} f(r_i | X_A,X_B) \right) f(X_A,X_B) \\
&\propto \left|\prod_{i=1}^{n} p\left(\frac{X_A}{X_A + X_B}, r_{i-1}, \Delta_{i-1} \right) – \frac{1-r_i}{2} \right| \cdot \mathcal{N}(1, \sqrt{0.0083})(X_A) \cdot \mathcal{N}(1, \sqrt{0.0083})(X_B)\\
\end{align}
$$

As you can see, the update for the $X$ values is a bit more complicated, but it is still fairly easy to compute. The code to do this is show below:

function update_rate!(params, time_deltas)
    time_deltas = time_deltas/30
    params.rate = (x) -> x^length(time_deltas) * exp(-x * sum(time_deltas)) * pdf(defaultRate, x) / params.rate_Z
    normalize_rate!(params)
end

function update_strengths!(params, scoring_data, lookback=15)
    lookback = min(lookback, length(scoring_data))
    scoring_data = scoring_data[end-lookback+1:end]

    score_probs = (x,y) -> prod(map((z) -> score_prob(z, x, y), scoring_data))
    params.strengths = (x,y) -> score_probs(x,y) * pdf(defaultStrengths, x) * pdf(defaultStrengths, y) / params.strengths_Z
    normalize_strengths!(params)
end

The real roadblock, however, is actually sampling the $\lambda$ and $X$ values from the pdfs.

Sampling Game Parameters

Since we have access to the pdfs (even their normalizing constants are quite easy to compute using numeric methods), we can employ importance sampling as a brute force method. I am sure that there are fancier MCMC algorithms that could be used, but the unfriendly distribution of the $X$ values made it hard for me to use external libraries like Turing.jl.

Anyhow, for interested readers, the reason we can use importance sampling to compute the expected value of a function $g$ with respect to a pdf $f$ using another pdf $h$ is because of the following:

$$\begin{align}
\underset{X \sim f}{\mathbb{E}}[g(X)] &= \int g(x) f(x) dx \\
&= \int g(x) \frac{f(x)}{h(x)} h(x) dx \\
&= \underset{X \sim h}{\mathbb{E}}\left[\frac{f(X)}{h(X)} g(X)\right]
\end{align}$$

Which also tells us that $h$ has the condition that it must be non-zero wherever $f$ is non-zero. When working with empricial calulations, the term $\frac{f(x)}{h(x)}$ is referred to as the weight of the sample for obvious reasons.

So, for our empirical estimations a good choice for $h$ is the prior distributions. The following code shows the implementation of the sampling functions:

function sample_params(game, n)
    r = rand(defaultRate, n)
    wr = game.params.rate.(r) ./ pdf(defaultRate, r)

    s = rand(defaultStrengths, n, 2)
    ws = game.params.strengths.(s[:,1], s[:,2]) ./ (pdf(defaultStrengths, s[:,1]) .* pdf(defaultStrengths, s[:,2]))
    w = wr .* ws

    return r, s, w
end

function sample_games(game, n=1000, k=1000)
    results = zeros(n)
    for i in 1:n
        r, s, w = sample_params(game, k)
        sample_results = zeros(k)
        Threads.@threads for j in 1:k
            X = s[j, 1]
            Y = s[j, 2]
            sample_results[j] = simulate_game(game, r[j], X, Y) * w[j]
        end
        results[i] = sum(sample_results) / k
    end
    return sum(results)/n
end

Simulating Games

The final step is to simulate the games. This is quite easy to do if we are able to pass in all the parameters we need.

function simulate_game(game, lambda, Xa, Xb)
    if length(game.plays) == 0
        t = 0
        s = 0
        r = 0
    else
        t = game.plays[end][1]
        s = net_score(game)
        r = sign(game.plays[end][2])
    end

    while t < 2880
        t += rand(Exponential(1/lambda)) * 30
        if rand() < score_prob((1, r, s), Xa, Xb)
            s += random_play()
            r = 1
        else
            s -= random_play()
            r = -1
        end
    end

    return (sign(s)+1)/2
end

Results

Observations of the Model

The above code snippets allowed me to peer into quite a few example games, and gave me the following conclusions about how baksetball random walks work:

  1. Strengths Don’t Dominate: Dr. Redner mentioned that it would be quite difficult to correctly predict the strengths of the teams given game data, and seeing as the bayesian updates hardly change the prior, I’ll have to agree
  2. The Games are Relatively Uniform: Even though the distribution for $\lambda$ did visually show significant updates throughout the game, the resulting probabilities hardly shifted–meaning that we will not be able to differentiate between most games.
  3. The Arcsine Law: The biggest factor which determines who wins is the current team that is leading. This is in agreement with the arcsine law which states that a random walk is most likely to spend its time on one side of the origin.

Application

Due to the performant nature of the code (thanks Julia!), it made sense to spin up website with a simpler version of the model (no bayesian updates since they hardly made a difference and it’d be a lot of work for the user to input each play). This way, someone betting on a game can make a mathematically-backed decision on how to spend their money!

The website takes in the scores of the teams and the time elapsed to calculate the odds that a team will win (the lower the better)

The website takes in the scores of the teams and the time elapsed to calculate the odds that a team will win (the lower the better)

The application computes the odds for a team to win. In other words, it outputs the inverse probability for a team to win, so the closer it is to 1, the more likely that team is to win, and vice versa.

If a bookie offers a payout multiplier that is highger than the calcualted odds, it might be a good idea to buy it because we are prdicting that the team is more likely to win than the bookie thinks (thus, the bookie overpriced the payout).

I cannot host the application myself, but you can find the code for it–along with the instructions to run it–in the github repository [2].

References

  1. Gabel, A., Redner, S., “Random Walk Picture of Basketball Scoring,” arXiv:1109.2825v1 (2011).
  2. Vattikuti, V., “NBA Odds,” github.com/hasithv/nba-odds (2024).

Maximizing Julia Development with VSCode Extension

By: Justyn Nissly

Re-posted from: https://blog.glcs.io/julia-vs-code-extension

In this article, we will review some of the features that the Julia extension offers! If you don’t have Julia, VS Code, or the Julia extension already installed, look at this article to help you get set up!

Running a Julia File

Now that we have the extension installed and configured, we can start using the extension.One of the first things we will examine is the most basic feature – running code!To run code, we do the following:

  1. Click the drop down in the top right of the code window
  2. Click “Julia: Execute Code in REPL”
  3. Enjoy the results!

When you hit Ctrl+Enter, the line your cursor is currently on will run, and the cursor will advance to the next line.This is one way to step through the code line by line.

You are also able to use Ctrl+F5 (Option+F5 for Macs) to run the entire file. You can learn more about running Julia code from the official documentation.

Code Navigation

The information in this section applies to any language and is not exclusive to the Julia extension. The features below are helpful to know and can increase your productivity as you code.

Within VS Code, if you use Ctrl+P, a prompt will open at the top of your editor. In that prompt, you can start typing the name of a file you want to jump to in your project. Once you click the option (or press enter), VS Code will jump directly to that file. You can also use Ctrl+G to jump to a specific line number in the file if you know which line you are looking for. Being able to jump back and forth between files without having to search the file tree will greatly enhance your workflow. Imagine all the time you will save by not searching through file trees!

Using Ctrl+Shift+O (be sure to use the letter O, not the number 0) will allow you to navigate through individual symbols within your program. After using the shortcut above, type : and all your symbols will be grouped by type. You are then able to navigate between them all.

Editing Code

Code navigation is a great skill to develop, especially when given some wonderful legacy code to unravel…we’ve all been there. Being proficient in navigating your code does you no good unless you are also proficient at editing the code.

One way to get going quickly is to use the “rename symbol” shortcut. You can either right click on a symbol and press “rename” or hit F2. When you rename the symbol, it will be renamed everywhere else in the file that it exists. Pretty neat, huh?

Change_Symbol

The Plot Viewer

Up until this point in the article we have laid the ground work for working with your code in VS Code. Next, we will look into some of the Julia specific features that the Julia extension offers, starting with the plot viewer.

The plot viewer is a really handy tool that lets you…well…view plots. We can look at an example to see how it works.

First, we will install the plots package if it hasn’t been installed already.

julia> using Pkgjulia> Pkg.add("Plots")# OR# We can type the "]" key and use the package interface(@v1.10) pkg>(@v1.10) pkg> add Plots

After we do that, we can create a plot to visualize.

using Plotsfunction make_plot()    # Create a plot    p = plot()     = range(0, 2, length=100)    x = cos.() * 5    y = sin.() * 5    plot!(p, x, y, linecolor=:black, linewidth=2, legend=:topright)    x_left = 1 .+ 0.5 * cos.()    y_left = 2 .+ 0.5 * sin.()    plot!(p, x_left, y_left, linecolor=:black, linewidth=2, fillalpha=0.2)    x_right = 3 .+ 0.5 * cos.()    y_right = 2 .+ 0.5 * sin.()    plot!(p, x_right, y_right, linecolor=:black, linewidth=2, fillalpha=0.2)    _arc = range(0, , length=10)    x_arc = 2 .+ cos.(_arc) * 2    y_arc = -1 .+ sin.(_arc) * 1    plot!(p, x_arc, y_arc, linecolor=:black, linewidth=2)    # Adjust plot limits and display the final plot    xlims!(-6, 6)    ylims!(-6, 6)    display(p)end# Execute the function to plotmake_plot()

Next we run this by using the keyboard shortcut we learned earlier (Ctrl+Enter), and we can see the result below!

Make_Plot

Pretty cool! Now we can see our charts generated in real time right inside our editor!

The Table Viewer

I don’t know about you, but I never liked having to try and dump the contents of an array, matrix,or other data structures to the console and try and parse through the data. Lucky for us we don’t have to do that.The Julia extension allows us to view any Tables.jl compatible table in the special table viewer.There are two ways to do this.

The first way is by clicking the “View in VS Code” button next to your table in the “Workspace” section.

Table_View_Button

The second way to do this is by running the vscodedisplay(name_of_table) directly in the REPL.

It’s pretty cool if you ask me.Having the ability to view the data is a nice feature, but to make it even better, you can sort the data in the UI by clicking the column headers. You can also copy data by using Ctrl + C just like you would in Excel!

A word of caution: All the table data you see is cached so any changes you make will not be reflected in the table viewer. To fix that just re-display the table in the editor and you will see your changes.

Debugging

The last topic we will cover is the debugging tools.

There are many things you can do in the VS Code debugger that are not language-specific. We aren’t going to cover all of those features here, so if you want a more in-depth look, check out the official documentation.

Since the only truly bug free code is no code, we will start by writing some code that we can test and try to catch bugs in.

function do_math(a, b)    c = a + b    d = a * b    c + dendfunction print_things()    println("First print")    println("Second print")    var = do_math(5,8)    println("Third print and math: ", var)    var2 = do_math("Bug",3)    println("Fourth print and bug: ", var2)endprint_things()

We have code to run, so we can run the debugger and see what we get. First, we must switch to the “Run and Debug” tab. We do this by either clicking on the tab (the one with the bug and play button) or by hitting Ctrl+Shift+D.

Once we are there, we will be greeted with a screen like this:

Debug_Screen

From here we can observe the compiled Julia code, our breakpoints, and several other things as the program runs. We will want to run our code through the debugger, and to do that, we can either click the big “Run and Debug” button or hit F5.

Debug_Error_Message

We step through the code a bit, and see some of what the debugger will show us.

Debugger_Annotated

1: The variables passed into the function

2: The variables local to the function

3: The indicator of which line we are currently on

4: A breakpoint indicator

We can set our breakpoints by double clicking next to the line numbers. After setting our breakpoints, we will be able to step through the code. As we step through the code line by line, the variables that get created are populated in the top section and their values are shown. Another neat feature the debugger gives that is not highlighted above but should be noted is the “CALL STACK” section. As the name suggests, this will show us the entire call stack as you step through the code. All of these are most likely things we have seen before in other debuggers, but they are useful nonetheless.

Keyboard Shortcuts

To wrap up, let’s look at a list of keyboard shortcuts for effective Julia extension usage.Note that a command like Alt+J Alt+C means press Alt+J followed by Alt+C.

Command Shortcut
Execute Code in REPL and Move Shift+Enter
Execute Code in REPL Ctrl+Enter
Execute Code Cell in REPL Alt+Enter
Execute Code Cell in REPL and Move Alt+Shift+Enter
Interrupt Execution Ctrl+C
Clear Current Inline Result Escape
Clear Inline Results In Editor Alt+J Alt+C
Select Current Module Alt+J Alt+M
New Julia File Alt+J Alt+N
Start REPL Alt+J Alt+O
Stop REPL Alt+J Alt+K
Restart REPL Alt+J Alt+R
Change Current Environment Alt+J Alt+E
Show Documentation Alt+J Alt+D
Show Plot Alt+J Alt+P
REPLVariables.focus Alt+J Alt+W
Interrupt Execution Ctrl+Shift+C
Browse Back Documentation Left
Browse Forward Documentation Right
Show Previous Plot Left
Show Next Plot Right
Show First Plot Home
Show Last Plot End
Delete plot Delete
Delete All Plots Shift+Delete

Summary

We reviewed some of the basics of the Julia VS Code extension. We looked at running a Julia file, the basics of code navigation and editing, the usefulness of the plot and table viewers, and some basic debugging features. This was only an overview, and there is much more that the extension has to offer! If you would like to do a deeper dive into the Julia VS Code extension, visit the official documentation or their GitHub.

If you would like to learn more about Julia so you can fully take advantage of the extension’s features, check out our Julia Basics series!

]]>