A simple coin-tossing game

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/05/24/probability.html

Introduction

I have been writing my blog for over 4 years now (without missing a single week).
My first post was on May 10, 2020, you can find it here.

There is a small change in how I distribute my content. Starting from last week I made the repository of my blog public,
so if you find any mistake please do not hesitate to open a Pull Request here.

To celebrate this I decided to go back to my favorite topic – mathematical puzzles.
Today I use a classic coin-tossing game example

The post was written under Julia 1.10.1 and StatsBase.jl 0.34.4, and FreqTables.jl 0.4.6, BenchmarkTools.jl 1.5.0.

The problem

Assume Alice and Bob toss a fair coin. Alice wins if after tossing a head (H) tail (T) is tossed, that is we see a HT sequence.
Bob wins if two consecutive heads are tossed, that is we see a HH sequence.

The questions are:

  • Who is more likely to win this game?
  • If only Alice played, how long, on the average, would she wait till HT was tossed?
  • If only Bob played, how long, on the average, would she wait till HH was tossed?

Let us try to answer these questions using Julia.

Both players play

This code simulates the situation when Alice and Bob play together:

function both()
    a = rand(('H', 'T'))
    while true
        b = rand(('H', 'T'))
        if a == 'T'
            a = b
        else
            return b == 'H' ? "Bob" : "Alice"
        end
    end
end

The both function returns "Bob" if Bob wins, and "Alice" otherwise.
From the code it should be already clear that both players have the same probability of winning.
The only way to terminate the simulation is return b == 'H' ? "Bob" : "Alice" and this condition is symmetric
with respect to Alice and Bob. Let us confirm this by running a simulation:

julia> using FreqTables, Random

julia> Random.seed!(1234);

julia> freqtable([both() for _ in 1:100_000_000])
2-element Named Vector{Int64}
Dim1  │
──────┼─────────
A     │ 50000012
B     │ 49999988

Indeed, the number of times Alice and Bob win seem to be the same.

Alice’s waiting time

Now let us check how long, on the average, Alice has to wait to see the HT sequence. Here is Alice’s simulator:

function alice()
    a = rand(('H', 'T'))
    i = 1
    while true
        b = rand(('H', 'T'))
        i += 1
        a == 'H' && b == 'T' && return i
        a = b
    end
end

Let us check it:

julia> using StatsBase

julia> describe([alice() for _ in 1:100_000_000])
Summary Stats:
Length:         100000000
Missing Count:  0
Mean:           3.999890
Std. Deviation: 2.000032
Minimum:        2.000000
1st Quartile:   2.000000
Median:         3.000000
3rd Quartile:   5.000000
Maximum:        31.000000
Type:           Int64

So it seems that, in expectation, Alice finishes her game in 4 tosses.
Can we expect the same for Ben (as we remember – if they play together they have the same chances of finishing first)?
Let us see.

Alice’s waiting time

Now let us check how long, on the average, Bob has to wait to see the HH sequence. Here is Bob’s simulator:

function bob()
    a = rand(('H', 'T'))
    i = 1
    while true
        b = rand(('H', 'T'))
        i += 1
        a == 'H' && b == 'H' && return i
        a = b
    end
end

Let us check it:

julia> describe([bob() for _ in 1:100_000_000])
Summary Stats:
Length:         100000000
Missing Count:  0
Mean:           5.999915
Std. Deviation: 4.690177
Minimum:        2.000000
1st Quartile:   2.000000
Median:         5.000000
3rd Quartile:   8.000000
Maximum:        87.000000
Type:           Int64

To our surprise, Bob needs 6 coin tosses, on the average, to see HH.

What is the reason of this difference? Assume we have just tossed H. Start with Bob. If we hit H we finish. If we hit T we then need to wait till we see H again to be able to consider finishing.
However, if we are Alice if we hit T we finish, but if we hit H we do not have to wait for anything – we are already in a state that gives us a chance to finish the game in the next step.

Conclusions

The difference between joint games and separate games is a bit surprising and I hope you found it interesting if you have not seen this puzzle before.
Today I have approached this problem using simulation. However, it is easy to write down a Markov chain representation of all three scenarios and solve them analytically.
I encourage you to try doing this exercise.

PS.

In the code I use the rand(('H', 'T')) form to generate randomness. It is much faster than e.g. writing rand(["H", "T"]) (which would be a first instinct), for two reasons:

  • using Char instead of String is a more lightweight option;
  • using Tuple instead of Vector avoids allocations.

Let us see a comparison of timing (I cut out the histograms from the output):

julia> using BenchmarkTools

julia> @benchmark rand(('H', 'T'))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.900 ns … 233.700 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.500 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.316 ns ±   2.772 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark rand(["H", "T"])
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  15.816 ns …  2.086 μs  ┊ GC (min … max): 0.00% … 96.30%
 Time  (median):     19.019 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   23.041 ns ± 60.335 ns  ┊ GC (mean ± σ):  9.44% ±  3.63%

 Memory estimate: 64 bytes, allocs estimate: 1.

In this case we could also just use rand(Bool) (as the coin is fair and has only two states):

julia> @benchmark rand(Bool)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.700 ns … 131.000 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.200 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.218 ns ±   2.112 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

 Memory estimate: 0 bytes, allocs estimate: 0.

but as you can see rand(('H', 'T')) has a similar speed and leads to a much more readable code.

oneAPI.jl 1.5: Ponte Vecchio support and oneMKL improvements

By: Tim Besard

Re-posted from: https://juliagpu.org/post/2024-05-24-oneapi_1.5/index.html

oneAPI.jl v1.5 is a significant release that brings many new features, from extended hardware support to greatly improved wrappers of the oneMLK math library.

Intel Ponte Vecchio

In oneAPI.jl v1.5 we introduce support for the Intel Ponte Vecchio (PVC) architecture, which empowers the Xe HPC GPUs as found in the Aurora supercomputer:

julia> oneAPI.versioninfo()
Binary dependencies:
- NEO: 24.13.29138+0
- libigc: 1.0.16510+0
- gmmlib: 22.3.18+0
- SPIRV_LLVM_Translator_unified: 0.4.0+0
- SPIRV_Tools: 2023.2.0+0Toolchain:
- Julia: 1.10.3
- LLVM: 15.0.71 driver:
- 00000000-0000-0000-17d2-6b1e010371d2 (v1.3.29138, API v1.3.0)16 devices:
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550
- Intel(R) Data Center GPU Max 1550

Apart from a handful of MKL-related issues, oneAPI.jl is fully functional on PVC, and passes all tests.

oneMLK wrappers

Thanks to the work of @amontoison, oneAPI.jl now provides greatly improved wrappers of the oneMKL library. This includes support for:

  • LAPACK: geqrf(_batched), orgqr(_batched), ormqr, potrf(_batched), potrs(_batched), getrf(_batched), getri(_batched), gebrd, gesvd, syevd, heevd, sygvd, hegvd

  • Sparse arrays: sparse_gemm, sparse_gemv, sparse_symv, sparse_trmv, sparse_trsv, sparse_optimize_gemv, sparse_optimize_trsv

Where possible, these functions are integrated with standard library interfaces, e.g., making it possible to simply call eigen, or to multiply two oneSparseMatrixCSRs.

Minor changes

There have of course been many other changes and improvements in oneAPI.jl v1.5. For a full list, please refer to the release notes, but some highlights include:

  • a new launch configuration heuristic that should generally improve performance;

  • broadcast now preserves the buffer type (host, device, or shared);

  • support for very large arrays that exceed the default device memory limit;

  • several toolchain bumps, with v1.5 using oneAPI 2024.1.0 with driver 24.13.29138.7;

  • minimal support for native Windows (next to WSL, which is fully supported).

DataFrames.jl: avoiding compilation

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/05/17/compilation.html

Introduction

Today I want to go back to a topic of performance considerations of using anonymous functions in combination with DataFrames.jl.
I have written about it in the past, but it is an issue that new users often ask about.
In the post I will explain you the problem, its causes, and how to avoid it.

The post was written under Julia 1.10.1 and DataFrames.jl 1.6.1.

Performance issues caused by anonymous functions

Consider the following sequence of DataFrames.jl operations (I am suppressing the output of the operations with ; as it is irrelevant):

julia> using DataFrames

julia> df = DataFrame(x=1:3);

julia> @time select(df, :x => (x -> 2 * x) => :x2);
  0.077134 seconds (73.33 k allocations: 5.010 MiB, 99.28% compilation time)

julia> @time select(df, :x => (x -> 2 * x) => :x2);
  0.013731 seconds (6.30 k allocations: 450.148 KiB, 98.15% compilation time)

julia> @time subset(df, :x => ByRow(x -> x > 1.5));
  0.094046 seconds (91.86 k allocations: 6.219 MiB, 99.29% compilation time)

julia> @time subset(df, :x => ByRow(x -> x > 1.5));
  0.086597 seconds (43.05 k allocations: 2.881 MiB, 42.62% gc time, 99.44% compilation time)

In both select and subset examples I used an anonymous function. In the first case it was x -> 2 * x, and in the second x -> x > 1.5.

What you can notice is that most of the time (even in consecutive calls) is spent in compilation. What is the reason for this?

Let me explain this by example. When we pass the x -> 2 * x function to select, the select function needs to be compiled. Since the select function
is quite complex its compilation time is long.

Why does this happen. The reason is that each time we write x -> 2 * x Julia defines a new anonymous function. Julia compiler does not recognize that it is in fact the same function. Have a look here:

julia> x -> 2 * x
#9 (generic function with 1 method)

julia> x -> 2 * x
#11 (generic function with 1 method)

We can see that we get a different function (one denoted #9 and the other #11) although the definition of the function is identical.

How to solve the compilation issue?

Fortunately, there is a simple way to resolve this problem. Instead of using an anonymous function, just use a named function:

julia> times2(x) = 2 * x
times2 (generic function with 1 method)

julia> @time select(df, :x => times2 => :x2);
  0.013728 seconds (5.63 k allocations: 401.305 KiB, 98.54% compilation time)

julia> @time select(df, :x => times2 => :x2);
  0.000142 seconds (71 allocations: 3.516 KiB)

julia> gt15(x) = x > 1.5
gt15 (generic function with 1 method)

julia> @time subset(df, :x => ByRow(gt15));
  0.041173 seconds (42.64 k allocations: 2.849 MiB, 99.01% compilation time)

julia> @time subset(df, :x => ByRow(gt15));
  0.000165 seconds (120 allocations: 5.648 KiB)

Now you see that consecutive calls are fast and do not cause compilation.

Actually, instead of defining the gt15 function we could just have written >(1.5):

julia> >(1.5)
(::Base.Fix2{typeof(>), Float64}) (generic function with 1 method)

Which defines a functor that works as a named function (so it requires only one compilation):

julia> @time subset(df, :x => ByRow(>(1.5)));
  0.075423 seconds (41.80 k allocations: 2.804 MiB, 99.32% compilation time)

julia> @time subset(df, :x => ByRow(>(1.5)));
  0.000189 seconds (124 allocations: 5.898 KiB)

If you want to learn how functors work in Julia, have a look here.

Conclusions

Today I have presented some simple examples, but I hope that they are useful for new users of DataFrames.jl in helping them to improve the performance of their code.