Coin-tossing game: a numerical approach

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/06/14/probability3.html

Introduction

Today I decided to follow up on my last post solving a coin-tossing game.
This time, instead of simulation I want to use numerical approach
(and so probably a bit harder).

The post was written under Julia 1.10.1 and Graphs.jl 1.11.0.

The problem

Let me describe the setting of a game first (it is an extension of this post).

Assume Alice and Bob toss a fair coin. In each toss head (h) or tail (t) can show up with equal probability.

Alice and Bob choose some sequence of h and t they are waiting for. We assume that the chosen sequences have the same length and are different.
For example Alice could choose htht and Bob tthh.

The winner of the game is the person who saw their sequence first.

The question we ask if for a fixed sequence length n we can get cycles, that is, for example, that sequence s1 beats s2, s2 beats s3, and s3 beats s1.

To answer this question we will represent the game as a Markov process.

Step 1: non-terminating Markov chain

First we create a transition matrix of a Markov chain tracking current n element sequence in the game we consider.
Here is the code:

function markov(size::Integer)
    idx2states = vec(join.(Iterators.product([['h', 't'] for _ in 1:size]...)))
    states2idx = Dict(idx2states .=> eachindex(idx2states))
    P = zeros(2^size, 2^size)
    for state in idx2states
        for next in ("h", "t")
            nextstate = chop(state, head=1, tail=0) * next
            P[states2idx[state], states2idx[nextstate]] = 0.5
        end
    end
    return P, idx2states, states2idx
end

What we do in it is as follows:

  1. idx2states vector keeps track of all h and t sequences that have length n (i.e. it is a mapping from state number to state signature).
  2. states2idx is an inverse mapping – from state signature to state number.
  3. P is transition matrix of our chain. Note that from the sequence ab... (where all elements are h or t) we go to sequence b...h or b...t with equal probability.

Step 2: terminating Markov chain

We now need to create a function that is aware of Alice’s and Bob’s chosen sequences and make them terminating. We want to compute the probabilities of ending up in Alice’s and Bob’s state.
Here is the code:

function game(P, states2idx, alice, bob)
    P_game = copy(P)
    alice_idx, bob_idx = states2idx[alice], states2idx[bob]
    P_game[alice_idx, :] .= 0.0
    P_game[alice_idx, alice_idx] = 1.0
    P_game[bob_idx, :] .= 0.0
    P_game[bob_idx, bob_idx] = 1.0
    n = length(states2idx)
    terminal = fill(1 / n, 1, n) * P_game^(2^30)
    return terminal[states2idx[alice]], terminal[states2idx[bob]]
end

Note that we first update the P_game matrix to make alice_idx and bob_idx states terminating. Then, since I was lazy, we assume we make 2^30 steps of the process (fortunately in Julia it is fast).
Observe that initially all states are equally probably, so terminal matrix keeps information about long term probabilities of staying in all possible states.
We extract the probabilities of Alice’s and Bob’s states and return them.

Step 3: looking for cycles

We are now ready for a final move. We can consider all possible preferred sequences of Alice and Bob and create a graph that keeps track of which sequences beat other sequences:

using Graphs

function analyze_game(size::Integer, details::Bool=true)
    P, idx2states, states2idx = markov(size)
    g = SimpleDiGraph(length(states2idx))
    details && println("\nWinners:")
    for alice in idx2states, bob in idx2states
        alice > bob || continue
        alice_win, bob_win = game(P, states2idx, alice, bob)
        if alice_win > 0.51
            winner = "alice"
            add_edge!(g, states2idx[alice], states2idx[bob])
        elseif bob_win > 0.51
            winner = "bob"
            add_edge!(g, states2idx[bob], states2idx[alice])
        else
            winner = "tie (or close :))"
        end
        details && println(alice, " vs ", bob, ": ", winner)
    end
    cycles = simplecycles(g)
    if !isempty(cycles)
        min_len = minimum(length, cycles)
        filter!(x -> length(x) == min_len, cycles)
    end
    println("\nCycles:")
    for cycle in cycles
        println(idx2states[cycle])
    end
end

Note that I used 0.51 threshold for detection of dominance of one state over the other. We could do it better, but in practice for small n it is enough and working this way is simpler numerically.
What this threshold means is that we want to be “sure” that one player beats the other.
In our code we do two things:

  • optionally print the information which state beats which state;
  • print information about cycles found in beating patterns (we keep only cycles of the shortest length).

Let us check the code. Start with sequences of length 2:

julia> analyze_game(2)

Winners:
th vs hh: alice
th vs ht: tie (or close :))
ht vs hh: tie (or close :))
tt vs hh: tie (or close :))
tt vs th: tie (or close :))
tt vs ht: bob

Cycles:

We see that only th beats hh and ht beats tt (this is a symmetric case). We did not find any cycles.

Let us check 3:

julia> analyze_game(3)

Winners:
thh vs hhh: alice
thh vs hth: tie (or close :))
thh vs hht: alice
thh vs htt: tie (or close :))
hth vs hhh: alice
hth vs hht: bob
tth vs hhh: alice
tth vs thh: alice
tth vs hth: alice
tth vs hht: tie (or close :))
tth vs tht: alice
tth vs htt: bob
hht vs hhh: tie (or close :))
tht vs hhh: alice
tht vs thh: tie (or close :))
tht vs hth: tie (or close :))
tht vs hht: bob
tht vs htt: tie (or close :))
htt vs hhh: alice
htt vs hth: tie (or close :))
htt vs hht: bob
ttt vs hhh: tie (or close :))
ttt vs thh: bob
ttt vs hth: bob
ttt vs tth: tie (or close :))
ttt vs hht: bob
ttt vs tht: bob
ttt vs htt: bob

Cycles:
["thh", "hht", "htt", "tth"]

We now have the cycle. The shortest cycle has length 4 and it is unique. Let us see what happens for patterns of length 4 (I suppress printing the details as there are too many of them):

julia> analyze_game(4, false)

Cycles:
["thhh", "hhth", "hthh"]
["thhh", "hhtt", "ttth"]
["hhth", "hthh", "thht"]
["hhth", "thtt", "tthh"]
["hthh", "hhtt", "thth"]
["hthh", "hhtt", "ttht"]
["hthh", "hhtt", "ttth"]
["thht", "hhtt", "ttth"]
["htht", "thtt", "tthh"]
["thtt", "tthh", "hhht"]
["thtt", "htth", "ttht"]
["thtt", "httt", "ttht"]
["tthh", "hhht", "htth"]
["tthh", "hhht", "httt"]

In this case we have many cycles that are even shorter as they have length three.

Conclusions

The conclusion is that the game is slightly surprising. We can have cycles of dominance between sequences. I hope you liked this example. Happy summer!

JuliaHub Air: Secure, High-Performance Computing for Government Innovation

By: Jasmine Chokshi

Re-posted from: https://info.juliahub.com/blog/high-performance-computing-for-government-innovation

Government agencies rely heavily on High-Performance Computing (HPC) for a wide range of critical tasks, including scientific research, simulations, data analysis, and decision support. However, deploying and managing HPC solutions in secure, air-gapped environments presents unique challenges related to data protection, compliance, and restricted network access. 

A multi-language overview on how to document your research project code

By: julia | Victor Boussange

Re-posted from: https://vboussange.github.io/post/documenting-your-research-code/

Documentation serves multiple purposes and may be useful for various audiences, including your future self, collaborators, users and contributors – should you aim at packaging some of your code into a general-purpose library.

This post is part of a series of posts on best practices for managing research project code. Much of this material was developed in collaboration with Mauro Werder as part of the Course On Reproducible Research, Data Pipelines, and Scientific Computing (CORDS). If you have experiences to share or spot any errors, please reach out!

Content

Style guides

The best documentation starts by writing self-explanatory code with good conventions.

Correctly naming your variables enhances code clarity.

There are only two hard things in Computer Science: cache invalidation and naming things.
Martin Fowler

Instead of using generic names like l for a list:

for l in L:
    pass

Use descriptive names like

for line in lines:
    pass

Using style guides for your chosen language ensures consistency and readability in your code. Here are some resources:

Do not hesitate to refactor your code regularly and remove dead code to prevent confusion for yourself and others.

Comments

In-line comments should be used sparingly. Aim to write self-explanatory code instead. Use comments to provide context not apparent from the code itself, such as references to papers, Stack Overflow topics, or TODOs.

Use single-line comments for brief explanations and multi-line comments for more detailed information.

julia

#=
This is a multi-line
comment
=#

python

"""
This is a multi-line
comment
"""

Tip: use vscode rewrap comment/text to nicely format multiline comments.

On top of nicely formatting your code and appending comments where necessary, a literal documentation greatly facilitates the maintenance, understandability and reproducibility of your code.

Literal documentation

Literal documentation helps users understand your tool and get started with it.

README

A README file is essential for any research repository. It is displayed on under the code structure when accessing a GitHub repo.
It should contain:

  • (Badges showing tests, and a nice logo)
  • A one-sentence description of your project
  • A longer description
  • An overview of the repository structure and files
  • Getting started or Examples section
  • An Installation section with dependencies
  • Citation/Reference section
  • (A link to the documentation)
  • (A How to contribute section)
  • An Acknowledgement section
  • License section

Some examples:

API documentation / doc strings

API documentation describes the usage of functions, classes (types) and modules (packages). Parsers usually support markdown styles, which also enhances raw readability for humans. In short, markdown styles consists in using

Doc strings in python live inside the function

def best_function_ever(a_param, another_parameter):
    """
    this is the docstring
    """
    # do some stuff

But above the function or type definition in Julia

"""
this is the docstring
"""
function best_function_ever(a_param, another_parameter)
# do some stuff
end

"Tell whether there are too foo items in the array."
foo(xs::Array) = ...

Best practice for docstrings include

  • (in Julia: insert the signature of your function )
  • Short description
  • Arguments (Args, Input,…)
  • Returns
  • Examples

Several flavours may be used, even for a single language.

python
3 Different documentation style flavours

Google style is easier to read for humans

def add(a, b):
    """
    Add two integers.

    This function takes two integer arguments and returns their sum.

    # Parameters:
    a: The first integer to be added.
    b: The second integer to be added.

    # Return:
    int: The sum of the two integers.

    # Raise:
    TypeError: If either of the arguments is not an integer.

    Examples:
    >>> add(2, 3)
    5
    >>> add(-1, 1)
    0
    >>> add('a', 1)
    Traceback (most recent call last):
        ...
    TypeError: Both arguments must be integers.
    """
    if not isinstance(a, int) or not isinstance(b, int):
        raise TypeError("Both arguments must be integers")
    return a + b

julia

"""
    add(a, b)

Adds two integers.

This function takes two integer arguments and returns their sum.

# Arguments
- `a`: The first integer to be added.
- `b`: The second integer to be added.

# Returns
- The sum of the two integers.

# Examples

```julia-repl
julia> add(2, 3)
5

julia> add(-1, 1)
0
```
"""
function add(a, b)
    return a + b
end

You may use tools like Documenter.jl or Sphinx to automatically render your API documentation on a website. Github actions can automatize the process of building the documentation for you, similarly to how it can automate testing.

Docstrings may be accompanied by typing.

Type annotations

Typing refers to the specification of variable types and function return types within a programming language. It helps define what kind of data a function or variable can handle, ensuring type safety and reducing runtime errors. It

  • clearly indicates the expected input and output types, making the code easier to understand.
  • helps catch type-related errors early in the development process.
  • encourages consistent usage of types throughout the codebase.

python

def add(a: int, b: int) -> int:
    return a + b

In Python, using typing does not enforce type checking at runtime! You may use decorators to enforce it.

julia

function add(a::Int, b::Int)
    return a + b
end

In Julia, types are enforced at runtime! Type annotations help the Julia compiler optimize performance by making type inferences easier.

Consider raising errors

  • We do not like reading manuals. But we are foreced to read error messages. Use assertions and error messages to handle unexpected inputs and guide users.

python

  • assert: When an assert doesn’t pass, it raises an AssertionError. You can optionally add an error message at the end.
  • NotImplementedError, ValueError, NameError: Commonly used, generic errors you can raise. I probably overuse NotImplementedError compared to other types.
def convolve_vectors(vec1, vec2):
    if not isinstance(vec1, list) or not isinstance(vec2, list):
        raise ValueError("Both inputs must be lists.")
    # convolve the vectors

Tutorials

Create tutorial Jupyter notebooks or vignettes in R to demonstrate the usage of your code. Those can be placed in a folder examples or tutorials. Format them as e.g.

  • vignettes in R,
  • or using Jupyter notebooks, which are the perfect format for tutorials

Accessing documentation

julia

?cos
?@time
?r""

python

help(myfun)

But e.g. VSCode can be also quite helpful, and this works also with your own code!

Doc testing

Doc testing, or doctest, allows you to test your code by running examples embedded in the documentation (docstrings). It compares the output of the examples with the expected results given in the docstrings, ensuring the code works as documented.

Why doc testing?

  • Ensures that the code examples in your documentation are accurate and up-to-date.
  • Simple to write and understand, making it accessible for both writing and reading tests.
  • Promotes writing comprehensive docstrings which enhance code readability and maintainability.

Python

def add(a, b):
    """
    Adds two numbers.

    >>> add(2, 3)
    5
    >>> add(-1, 1)
    0
    """
    return a + b

To run the test:

python -m doctest your_module.py

or from within a script

if __name__ == "__main__":
    import doctest
    doctest.testmod()

julia
Available through Documenter.jl

"""
Adds two numbers.

```jldoctest
julia> add(2, 3)
5

julia> add(-1, 1)
0
```
"""

function add(a, b)
    return a + b
end

Useful packages to help you write and lint your documentation

  • Better Comments

  • Automatic doc string generation

  • Python test explorer

More resources

Take home messages

  • Good documentation helps maintain the long-term memory of a project.
  • Refactor code to reduce complexity instead of documenting tricky code.
  • Writing unit tests is often more productive than extensive documentation.
  • Types of documentation include literal, API, and tutorial/example documentation.
  • Literal documentation explains the big picture and setup.
  • API documentation lives in docstrings and explains function usage.
  • Examples connect the details to common tasks.
  • Consider using tools like ChatGPT to assist with documenting your functions.