Category Archives: Julia

Coin-tossing game: a numerical approach

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/06/14/probability3.html

Introduction

Today I decided to follow up on my last post solving a coin-tossing game.
This time, instead of simulation I want to use numerical approach
(and so probably a bit harder).

The post was written under Julia 1.10.1 and Graphs.jl 1.11.0.

The problem

Let me describe the setting of a game first (it is an extension of this post).

Assume Alice and Bob toss a fair coin. In each toss head (h) or tail (t) can show up with equal probability.

Alice and Bob choose some sequence of h and t they are waiting for. We assume that the chosen sequences have the same length and are different.
For example Alice could choose htht and Bob tthh.

The winner of the game is the person who saw their sequence first.

The question we ask if for a fixed sequence length n we can get cycles, that is, for example, that sequence s1 beats s2, s2 beats s3, and s3 beats s1.

To answer this question we will represent the game as a Markov process.

Step 1: non-terminating Markov chain

First we create a transition matrix of a Markov chain tracking current n element sequence in the game we consider.
Here is the code:

function markov(size::Integer)
    idx2states = vec(join.(Iterators.product([['h', 't'] for _ in 1:size]...)))
    states2idx = Dict(idx2states .=> eachindex(idx2states))
    P = zeros(2^size, 2^size)
    for state in idx2states
        for next in ("h", "t")
            nextstate = chop(state, head=1, tail=0) * next
            P[states2idx[state], states2idx[nextstate]] = 0.5
        end
    end
    return P, idx2states, states2idx
end

What we do in it is as follows:

  1. idx2states vector keeps track of all h and t sequences that have length n (i.e. it is a mapping from state number to state signature).
  2. states2idx is an inverse mapping – from state signature to state number.
  3. P is transition matrix of our chain. Note that from the sequence ab... (where all elements are h or t) we go to sequence b...h or b...t with equal probability.

Step 2: terminating Markov chain

We now need to create a function that is aware of Alice’s and Bob’s chosen sequences and make them terminating. We want to compute the probabilities of ending up in Alice’s and Bob’s state.
Here is the code:

function game(P, states2idx, alice, bob)
    P_game = copy(P)
    alice_idx, bob_idx = states2idx[alice], states2idx[bob]
    P_game[alice_idx, :] .= 0.0
    P_game[alice_idx, alice_idx] = 1.0
    P_game[bob_idx, :] .= 0.0
    P_game[bob_idx, bob_idx] = 1.0
    n = length(states2idx)
    terminal = fill(1 / n, 1, n) * P_game^(2^30)
    return terminal[states2idx[alice]], terminal[states2idx[bob]]
end

Note that we first update the P_game matrix to make alice_idx and bob_idx states terminating. Then, since I was lazy, we assume we make 2^30 steps of the process (fortunately in Julia it is fast).
Observe that initially all states are equally probably, so terminal matrix keeps information about long term probabilities of staying in all possible states.
We extract the probabilities of Alice’s and Bob’s states and return them.

Step 3: looking for cycles

We are now ready for a final move. We can consider all possible preferred sequences of Alice and Bob and create a graph that keeps track of which sequences beat other sequences:

using Graphs

function analyze_game(size::Integer, details::Bool=true)
    P, idx2states, states2idx = markov(size)
    g = SimpleDiGraph(length(states2idx))
    details && println("\nWinners:")
    for alice in idx2states, bob in idx2states
        alice > bob || continue
        alice_win, bob_win = game(P, states2idx, alice, bob)
        if alice_win > 0.51
            winner = "alice"
            add_edge!(g, states2idx[alice], states2idx[bob])
        elseif bob_win > 0.51
            winner = "bob"
            add_edge!(g, states2idx[bob], states2idx[alice])
        else
            winner = "tie (or close :))"
        end
        details && println(alice, " vs ", bob, ": ", winner)
    end
    cycles = simplecycles(g)
    if !isempty(cycles)
        min_len = minimum(length, cycles)
        filter!(x -> length(x) == min_len, cycles)
    end
    println("\nCycles:")
    for cycle in cycles
        println(idx2states[cycle])
    end
end

Note that I used 0.51 threshold for detection of dominance of one state over the other. We could do it better, but in practice for small n it is enough and working this way is simpler numerically.
What this threshold means is that we want to be “sure” that one player beats the other.
In our code we do two things:

  • optionally print the information which state beats which state;
  • print information about cycles found in beating patterns (we keep only cycles of the shortest length).

Let us check the code. Start with sequences of length 2:

julia> analyze_game(2)

Winners:
th vs hh: alice
th vs ht: tie (or close :))
ht vs hh: tie (or close :))
tt vs hh: tie (or close :))
tt vs th: tie (or close :))
tt vs ht: bob

Cycles:

We see that only th beats hh and ht beats tt (this is a symmetric case). We did not find any cycles.

Let us check 3:

julia> analyze_game(3)

Winners:
thh vs hhh: alice
thh vs hth: tie (or close :))
thh vs hht: alice
thh vs htt: tie (or close :))
hth vs hhh: alice
hth vs hht: bob
tth vs hhh: alice
tth vs thh: alice
tth vs hth: alice
tth vs hht: tie (or close :))
tth vs tht: alice
tth vs htt: bob
hht vs hhh: tie (or close :))
tht vs hhh: alice
tht vs thh: tie (or close :))
tht vs hth: tie (or close :))
tht vs hht: bob
tht vs htt: tie (or close :))
htt vs hhh: alice
htt vs hth: tie (or close :))
htt vs hht: bob
ttt vs hhh: tie (or close :))
ttt vs thh: bob
ttt vs hth: bob
ttt vs tth: tie (or close :))
ttt vs hht: bob
ttt vs tht: bob
ttt vs htt: bob

Cycles:
["thh", "hht", "htt", "tth"]

We now have the cycle. The shortest cycle has length 4 and it is unique. Let us see what happens for patterns of length 4 (I suppress printing the details as there are too many of them):

julia> analyze_game(4, false)

Cycles:
["thhh", "hhth", "hthh"]
["thhh", "hhtt", "ttth"]
["hhth", "hthh", "thht"]
["hhth", "thtt", "tthh"]
["hthh", "hhtt", "thth"]
["hthh", "hhtt", "ttht"]
["hthh", "hhtt", "ttth"]
["thht", "hhtt", "ttth"]
["htht", "thtt", "tthh"]
["thtt", "tthh", "hhht"]
["thtt", "htth", "ttht"]
["thtt", "httt", "ttht"]
["tthh", "hhht", "htth"]
["tthh", "hhht", "httt"]

In this case we have many cycles that are even shorter as they have length three.

Conclusions

The conclusion is that the game is slightly surprising. We can have cycles of dominance between sequences. I hope you liked this example. Happy summer!

JuliaHub Air: Secure, High-Performance Computing for Government Innovation

By: Jasmine Chokshi

Re-posted from: https://info.juliahub.com/blog/high-performance-computing-for-government-innovation

Government agencies rely heavily on High-Performance Computing (HPC) for a wide range of critical tasks, including scientific research, simulations, data analysis, and decision support. However, deploying and managing HPC solutions in secure, air-gapped environments presents unique challenges related to data protection, compliance, and restricted network access. 

A multi-language overview on how to test your research project code

By: julia | Victor Boussange

Re-posted from: https://vboussange.github.io/post/testing-your-research-code/

Code testing is essential to identify and fix potential issues, to maintain sanity over the course of the development of the project and quickly identify bugs, and to ensure the reliability and sanity of your experiment overtime.

This post is part of a series of posts on best practices for managing research project code. Much of this material was developed in collaboration with Mauro Werder as part of the Course On Reproducible Research, Data Pipelines, and Scientific Computing (CORDS). If you have experiences to share or spot any errors, please reach out!

Content

Unit testing

Unit testing involves testing a unit of code, typically a single function, to ensure its correctness. Here are some key aspects to consider:

  • Test for correctness with typical inputs.
  • Test edge cases.
  • Test for errors with bad inputs.

Some developers start writing unit tests before writing the actual function, a practice known as Test-Driven Development (TDD). Define upstream on a piece of paper the behavior of the function, write corresponding tests, and when all tests pass, you are done. This philosophy ensures that you have a well-tested implementation, and avoids unnecessary feature development, forcing you to focus only on what is needed. While TDD is a powerful idea, it can be challenging to follow strictly.

A good idea is to write an additional test when you find a bug in your code.

Lightweight formal tests with assert

The simplest form of unit testing involves some sort of assert statement.

Python

def fib(x):
    if x <= 2:
        return 1
    else:
        return fib(x - 1) + fib(x - 2)

assert fib(0) == 0
assert fib(1) == 1
assert fib(2) == 1

Julia

@assert 1 == 0

When one test is broken, you’ll get an error for the corresponding test, which you’ll need to fix to check the following tests.

In Julia or Python, you could directly place the assert statement after your functions. This way, tests are run each time you execute the script. Here is nother pythonic approach, which can be used to decouple the test

def fib(x):
    if x <= 2:
        return 1
    else:
        return fib(x - 1) + fib(x - 2)

if __name__ == '__main__':
    assert fib(0) == 0
    assert fib(1) == 1
    assert fib(2) == 1
    assert fib(6) == 8
    assert fib(40) == 102334155
    print("Tests passed")

Consider using np.isclose, np.testing.assert_allclose (Python) or approx (Julia) for floating point comparisons.

Testing with a test suite

Once you have many tests, it makes sense to group them into a test suite and run them with a test runner. This approach will run all tests, even though some are broken, and retrieve and informative statements on those tests that passed, and those that did not. As you’ll see, it also allows to automatically run the test at each commit, with continuous integration.

Python

Two main frameworks for unit tests in Python are pytest and unittest, with pytest being more popular.

Example using pytest:

from src.fib import fib
import pytest

def test_typical():
    assert fib(1) == 1
    assert fib(2) == 1
    assert fib(6) == 8
    assert fib(40) == 102334155

def test_edge_case():
    assert fib(0) == 0

def test_raises():
    with pytest.raises(NotImplementedError):
        fib(-1)

    with pytest.raises(NotImplementedError):
        fib(1.5)

Run the tests with:

pytest test_fib.py

Julia

Built in module Test, relying on the macro @test. Consider grouping your tests with

julia> @testset "trigonometric identities" begin
          θ = 2/3*π
          @test sin(-θ)  -sin(θ)
          @test cos(-θ)  cos(θ)
          @test sin(2θ)  2*sin(θ)*cos(θ)
          @test cos(2θ)  cos(θ)^2 - sin(θ)^2
      end;

This will nicely output

Test Summary:            | Pass  Total  Time
trigonometric identities |    4      4  0.2s

which comes handy for grouping tests applied to a single function or concept. Test functions may require additional packages to your minimum working environment specified at your package root folder. An additional virtual environment may be specified for tests! To develop my tests interactively, I like using TestEnv. Unfortunately, using Pkg.activate in tests would not work there, you. You need TestEnv to have access to your package functions;

In your package environment,

using TestEnv
TestEnv.activate()

will activate the test environment.

To reactivate the normal environment,

Pkg.activate(".")

Here is a nice thread to read more on that.

R

testhat

Testing non-pure functions and classes

For nondeterministic functions, provide the random seed or variables needed by the function as arguments to make them deterministic.
For stateful functions, test postconditions to ensure the internal state changes as expected.
For functions with I/O side effects, create mock files to verify proper input reading and expected output.

Python

def file_to_upper(in_file, out_file):
    fout = open(out_file, 'w')
    with open(in_file, 'r') as f:
        for line in f:
            fout.write(line.upper())
    fout.close()

import tempfile
import os

def test_file_to_upper():
    in_file = tempfile.NamedTemporaryFile(delete=False, mode='w')
    out_file = tempfile.NamedTemporaryFile(delete=False)
    out_file.close()
    in_file.write("test123\nthetest")
    in_file.close()
    file_to_upper(in_file.name, out_file.name)
    with open(out_file.name, 'r') as f:
        data = f.read()
        assert data == "TEST123\nTHETEST"
    os.unlink(in_file.name)
    os.unlink(out_file.name)

Continuous integration

Automated testing on local machines is useful, but you can do better with continuous integration (CI). In fact, CI is essential for projects involving multiple developers and various target platforms. CI consists in running tests whenever changes are committed.
CI can also be used to automatically build documentation, check for code coverage, and more. GitHub Actions is a popular CI tool available within GitHub.
CI is based on .yaml files, which specify the environment to run the script. You can build matrices to test across different environments (e.g. Linux, Windows and MacOS, with different versino of python or Julia). Jobs will be created that run our tests for each permutation of these.

An example CI.yaml file for Julia

name: Run tests

on: push: branches: - master - main pull_request:

permissions: actions: write contents: read

jobs: test: runs-on: ${{ matrix.os }} strategy: matrix: julia-version: [‘1.6’, ‘1’, ’nightly’] julia-arch: [x64, x86] os: [ubuntu-latest, windows-latest, macOS-latest] exclude: - os: macOS-latest julia-arch: x86

steps:
  - uses: actions/checkout@v4
  - uses: julia-actions/setup-julia@v1
    with:
      version: ${{ matrix.julia-version }}
      arch: ${{ matrix.julia-arch }}
  - uses: julia-actions/cache@v1
  - uses: julia-actions/julia-buildpkg@v1
  - uses: julia-actions/julia-runtest@v1

An example CI.yaml file for Python

This action installs the conda environment called glacier-mass-balance, specified in the environment.yml file.
It then runs pytest, supposing that you have a test/ folder where your functions are located. First try whether pytest works locally. Do not forget to have pytest in your dependencies.


name: Run tests
on: push

jobs:
  miniconda:
    name: Miniconda ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
    strategy:
        matrix:
            os: ["ubuntu-latest"]
    steps:
      - uses: actions/checkout@v2
      - uses: conda-incubator/setup-miniconda@v2
        with:
          environment-file: environment.yml
          activate-environment: glacier-mass-balance
          auto-activate-base: false
      - name: Run pytest
        shell: bash -l {0}
        run: | 
          pytest


Cool tip

You can include a cool badge to show visually whether your tests are passing or failing, like so

Tests

You can get the code for this badge by going on your github repo, then Actions. Click on the test action, then on top right click on the ... and `Create status badge```.

Cool right?

Other types of tests

  • Docstring tests: Unit tests embedded in docstrings.
  • Integration tests: Test whether multiple functions work correctly together.
  • Regression tests: Ensure your code produces the same outputs as previous versions.

Resources

Take-home messages

  • Systematically implementing testing allows you to ensure the sanity of your code
  • The overhead cost of testing is usually well balanced by the reduced time spent downstream in identifying bugs