A multi-language overview on how to test your research project code

By: julia | Victor Boussange

Re-posted from: https://vboussange.github.io/post/testing-your-research-code/

Code testing is essential to identify and fix potential issues, to maintain sanity over the course of the development of the project and quickly identify bugs, and to ensure the reliability and sanity of your experiment overtime.

This post is part of a series of posts on best practices for managing research project code. Much of this material was developed in collaboration with Mauro Werder as part of the Course On Reproducible Research, Data Pipelines, and Scientific Computing (CORDS). If you have experiences to share or spot any errors, please reach out!

Content

Unit testing

Unit testing involves testing a unit of code, typically a single function, to ensure its correctness. Here are some key aspects to consider:

  • Test for correctness with typical inputs.
  • Test edge cases.
  • Test for errors with bad inputs.

Some developers start writing unit tests before writing the actual function, a practice known as Test-Driven Development (TDD). Define upstream on a piece of paper the behavior of the function, write corresponding tests, and when all tests pass, you are done. This philosophy ensures that you have a well-tested implementation, and avoids unnecessary feature development, forcing you to focus only on what is needed. While TDD is a powerful idea, it can be challenging to follow strictly.

A good idea is to write an additional test when you find a bug in your code.

Lightweight formal tests with assert

The simplest form of unit testing involves some sort of assert statement.

Python

def fib(x):
    if x <= 2:
        return 1
    else:
        return fib(x - 1) + fib(x - 2)

assert fib(0) == 0
assert fib(1) == 1
assert fib(2) == 1

Julia

@assert 1 == 0

When one test is broken, you’ll get an error for the corresponding test, which you’ll need to fix to check the following tests.

In Julia or Python, you could directly place the assert statement after your functions. This way, tests are run each time you execute the script. Here is nother pythonic approach, which can be used to decouple the test

def fib(x):
    if x <= 2:
        return 1
    else:
        return fib(x - 1) + fib(x - 2)

if __name__ == '__main__':
    assert fib(0) == 0
    assert fib(1) == 1
    assert fib(2) == 1
    assert fib(6) == 8
    assert fib(40) == 102334155
    print("Tests passed")

Consider using np.isclose, np.testing.assert_allclose (Python) or approx (Julia) for floating point comparisons.

Testing with a test suite

Once you have many tests, it makes sense to group them into a test suite and run them with a test runner. This approach will run all tests, even though some are broken, and retrieve and informative statements on those tests that passed, and those that did not. As you’ll see, it also allows to automatically run the test at each commit, with continuous integration.

Python

Two main frameworks for unit tests in Python are pytest and unittest, with pytest being more popular.

Example using pytest:

from src.fib import fib
import pytest

def test_typical():
    assert fib(1) == 1
    assert fib(2) == 1
    assert fib(6) == 8
    assert fib(40) == 102334155

def test_edge_case():
    assert fib(0) == 0

def test_raises():
    with pytest.raises(NotImplementedError):
        fib(-1)

    with pytest.raises(NotImplementedError):
        fib(1.5)

Run the tests with:

pytest test_fib.py

Julia

Built in module Test, relying on the macro @test. Consider grouping your tests with

julia> @testset "trigonometric identities" begin
          θ = 2/3*π
          @test sin(-θ)  -sin(θ)
          @test cos(-θ)  cos(θ)
          @test sin(2θ)  2*sin(θ)*cos(θ)
          @test cos(2θ)  cos(θ)^2 - sin(θ)^2
      end;

This will nicely output

Test Summary:            | Pass  Total  Time
trigonometric identities |    4      4  0.2s

which comes handy for grouping tests applied to a single function or concept. Test functions may require additional packages to your minimum working environment specified at your package root folder. An additional virtual environment may be specified for tests! To develop my tests interactively, I like using TestEnv. Unfortunately, using Pkg.activate in tests would not work there, you. You need TestEnv to have access to your package functions;

In your package environment,

using TestEnv
TestEnv.activate()

will activate the test environment.

To reactivate the normal environment,

Pkg.activate(".")

Here is a nice thread to read more on that.

R

testhat

Testing non-pure functions and classes

For nondeterministic functions, provide the random seed or variables needed by the function as arguments to make them deterministic.
For stateful functions, test postconditions to ensure the internal state changes as expected.
For functions with I/O side effects, create mock files to verify proper input reading and expected output.

Python

def file_to_upper(in_file, out_file):
    fout = open(out_file, 'w')
    with open(in_file, 'r') as f:
        for line in f:
            fout.write(line.upper())
    fout.close()

import tempfile
import os

def test_file_to_upper():
    in_file = tempfile.NamedTemporaryFile(delete=False, mode='w')
    out_file = tempfile.NamedTemporaryFile(delete=False)
    out_file.close()
    in_file.write("test123\nthetest")
    in_file.close()
    file_to_upper(in_file.name, out_file.name)
    with open(out_file.name, 'r') as f:
        data = f.read()
        assert data == "TEST123\nTHETEST"
    os.unlink(in_file.name)
    os.unlink(out_file.name)

Continuous integration

Automated testing on local machines is useful, but you can do better with continuous integration (CI). In fact, CI is essential for projects involving multiple developers and various target platforms. CI consists in running tests whenever changes are committed.
CI can also be used to automatically build documentation, check for code coverage, and more. GitHub Actions is a popular CI tool available within GitHub.
CI is based on .yaml files, which specify the environment to run the script. You can build matrices to test across different environments (e.g. Linux, Windows and MacOS, with different versino of python or Julia). Jobs will be created that run our tests for each permutation of these.

An example CI.yaml file for Julia

name: Run tests

on: push: branches: - master - main pull_request:

permissions: actions: write contents: read

jobs: test: runs-on: ${{ matrix.os }} strategy: matrix: julia-version: [‘1.6’, ‘1’, ’nightly’] julia-arch: [x64, x86] os: [ubuntu-latest, windows-latest, macOS-latest] exclude: - os: macOS-latest julia-arch: x86

steps:
  - uses: actions/checkout@v4
  - uses: julia-actions/setup-julia@v1
    with:
      version: ${{ matrix.julia-version }}
      arch: ${{ matrix.julia-arch }}
  - uses: julia-actions/cache@v1
  - uses: julia-actions/julia-buildpkg@v1
  - uses: julia-actions/julia-runtest@v1

An example CI.yaml file for Python

This action installs the conda environment called glacier-mass-balance, specified in the environment.yml file.
It then runs pytest, supposing that you have a test/ folder where your functions are located. First try whether pytest works locally. Do not forget to have pytest in your dependencies.


name: Run tests
on: push

jobs:
  miniconda:
    name: Miniconda ${{ matrix.os }}
    runs-on: ${{ matrix.os }}
    strategy:
        matrix:
            os: ["ubuntu-latest"]
    steps:
      - uses: actions/checkout@v2
      - uses: conda-incubator/setup-miniconda@v2
        with:
          environment-file: environment.yml
          activate-environment: glacier-mass-balance
          auto-activate-base: false
      - name: Run pytest
        shell: bash -l {0}
        run: | 
          pytest


Cool tip

You can include a cool badge to show visually whether your tests are passing or failing, like so

Tests

You can get the code for this badge by going on your github repo, then Actions. Click on the test action, then on top right click on the ... and `Create status badge```.

Cool right?

Other types of tests

  • Docstring tests: Unit tests embedded in docstrings.
  • Integration tests: Test whether multiple functions work correctly together.
  • Regression tests: Ensure your code produces the same outputs as previous versions.

Resources

Take-home messages

  • Systematically implementing testing allows you to ensure the sanity of your code
  • The overhead cost of testing is usually well balanced by the reduced time spent downstream in identifying bugs

Announcing Genie Builder on JuliaHub: Web Apps that Bring your Julia Projects to Life

By: Jasmine Chokshi

Re-posted from: https://info.juliahub.com/blog/announcing-genie-builder-on-juliahub

We are excited to announce the launch of Genie Builder on JuliaHub, a powerful low/no-code tool designed to help you bring your Julia projects to life with beautiful, interactive web applications. Whether you want to build dashboards or complex AI and simulation apps around your Julia code, Genie Builder empowers scientists and developers to create prototypes, internal tools, and even production-grade systems 10x faster and more affordably—all without needing to hire a web development team.

Genie Builder is brought to JuliaHub in partnership with Genie, the leading open-source web framework for Julia. It builds upon Genie’s extensibility and feature-rich foundation, making it an incredibly flexible and robust tool for your web development needs. 

If you have explored some of our Genie resources, then you have seen how Genie Builder accelerates web development with its no-code UI editor helping Julia users with no web development experience to rapidly build complex web apps and dashboards around their code.. Genie Builder builds on Julia’s strengths (high-level, high-performance, dynamic, JIT compiled) so all of your packages and projects can easily be visualized and shared with the world. 

To start, you can learn more about building and deploying apps with Genie Builder and JuliaHub in this webinar.

If you aren’t familiar with how Genie Builder works, here’s the video rundown.

A not so simple coin-tossing game

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/06/07/probability2.html

Introduction

Two weeks ago I wrote a post about a simple coin tossing game.
Today let me follow up on it with a bit more difficult question and a slightly changed implementation strategy.

The post was written under Julia 1.10.1, DataFrames.jl 1.6.1, and StatsBase.jl 0.34.3.

The problem

Let me describe the setting of a game first (it is similar to what I described in this post).

Assume Alice and Bob toss a fair coin n times. In each toss head (h) or tail (t) can show up with equal probability.

Alice counts the number of times a ht sequence showed.
Bob counts the number of times a hh sequence showed.

The winner of the game is the person who saw a bigger number of occurrences of their favorite sequence.
So for example for n=3. If we get hhh then Bob wins (seeing 2 occurrences of hh, and Alice saw 0 occurrences of ht). If we get hht there is a tie (both patterns ocurred once). If we get tht Alice wins.

The questions are:

  • Who, on the average sees more occurrences of their favorite pattern?
  • Who is more likely to win this game?

Let us try to answer these questions using Julia as usual.

Simulating one game

We start by writing a simulator of a single game:

using Random

function play(n::Integer)
    seq = randstring("ht", n)
    return (hh=count("hh", seq, overlap=true),
            ht=count("ht", seq, overlap=true))
end

The function is not optimized for speed (as we could even avoid storing the whole sequence),
but I think it nicely shows how powerful library functions in Julia are. The randstring function
allows us to generate random strings. In this case consisting of a random sequence of h and t.
Next the count function allows us to count the number of occurrences of desired patterns.
Note that we use the overlap=true keyword argument to count all occurrences of the pattern
(by default only disjoint occurrences are counted).

Let us check the output of a single run of the game:

julia> play(10)
(hh = 3, ht = 3)

In my case (I did not seed the random number generator) we see that for n=10 we got a sequence that
had both 3 occurrences of hh and ht, so it is a tie.

Testing the properties of the game

Here is a simulator that, for a given n, runs the game reps times and aggregates the results:

using DataFrames
using Statistics
using StatsBase

function sim_play(n::Integer, reps::Integer)
    df = DataFrame([play(n) for _ in 1:reps])
    df.winner = cmp.(df.hh, df.ht)
    agg = combine(df,
                  ["hh", "ht"] .=> [mean std skewness],
                  "winner" .=>
                  [x -> mean(==(i), x) for i in -1:1] .=>
                  ["ht_win", "tie", "hh_win"])
    return insertcols!(agg, 1, "n" => n)
end

What we do in the code is as follows. First we run the game reps times and transform a result into a DataFrame.
Next we add a column denoting the winner of the game. In the "winner" column 1 means that hh won, 0 means a tie, and -1 means that ht won.
Finally we compute the following aggregates (using transformation minilanguage; if you do not have much experience with it you can have a look at this post):

  • mean, standard deviation, and skewness of hh and ht counts;
  • probability that ht wins, that there is a tie and that hh wins.

Here is the result of running the code for reps=1_000_000 and n varying from 2 to 16:

julia> Random.seed!(1234);

julia> reduce(vcat, [sim_play(n, 1_000_000) for n in 2:16])
15×10 DataFrame
 Row │ n      hh_mean   ht_mean   hh_std    ht_std    hh_skewness  ht_skewness   ht_win    tie       hh_win
     │ Int64  Float64   Float64   Float64   Float64   Float64      Float64       Float64   Float64   Float64
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │     2  0.25068   0.249825  0.433405  0.432912     1.15052    1.15578      0.249825  0.499495  0.25068
   2 │     3  0.499893  0.499595  0.706871  0.5          1.06068    0.00162      0.374385  0.375765  0.24985
   3 │     4  0.751224  0.748855  0.902063  0.559496     1.0232     0.00312512   0.373833  0.37559   0.250577
   4 │     5  1.00168   1.00012   1.06192   0.612535     0.940274  -6.5033e-5    0.406445  0.28037   0.313185
   5 │     6  1.25098   1.2493    1.19926   0.661162     0.869559  -0.0012833    0.437276  0.233841  0.328883
   6 │     7  1.49972   1.50011   1.32213   0.707523     0.812272  -0.00190003   0.437774  0.234531  0.327695
   7 │     8  1.75064   1.74802   1.43616   0.750169     0.76024    0.00319491   0.440714  0.211252  0.348034
   8 │     9  1.99906   2.00108   1.53902   0.789413     0.715722   0.000107041  0.451749  0.189353  0.358898
   9 │    10  2.24857   2.25009   1.63787   0.829086     0.676735  -0.00207707   0.45343   0.184585  0.361985
  10 │    11  2.50092   2.50007   1.73343   0.867326     0.646397   0.000650687  0.454418  0.175059  0.370523
  11 │    12  2.74753   2.75065   1.81994   0.901478     0.621238  -0.00118389   0.458332  0.164575  0.377093
  12 │    13  2.99635   3.00128   1.90199   0.935108     0.597227   0.00212776   0.460248  0.159239  0.380513
  13 │    14  3.2469    3.25101   1.9814    0.96887      0.575535  -0.000255108  0.460817  0.154523  0.38466
  14 │    15  3.50074   3.49934   2.05981   0.998945     0.55527    0.000827465  0.461547  0.147699  0.390754
  15 │    16  3.75258   3.7513    2.13521   1.03027      0.538056   0.000772964  0.463627  0.142931  0.393442

What do we learn from these results?

On the average hh and ht occur the same number of times.
We see this from "hh_mean" and "ht_mean" columns.
This is expected. As in a given sequence of two observations hh and ht have the same
probability of occurrence (0.25) the result just follows the linearity of expected value.
We can see that as we increase n the values in these columns increase roughly by 0.25.

However, the probability of ht winning is higher than the probability of hh winning
(except n=2 when it is equal). We can see this from the "ht_win" and "hh_win" columns.
This is surprising as the patterns occur, on the average the same number of times.

To understand the phenomenon we can look at the "hh_std", "ht_std",
"hh_skewness", and "ht_skewness" columns.
We can clearly see that hh pattern count has a higher standard deviation and for n>2 it is positively skewed
(while ht has zero skewness).
This means that hh counts are more spread (i.e. they can be high, but also low).
Additionally we have few quite high values balanced by more low values for hh relatively to ht (as the means for both patterns are the same). This, in turn, means that if hh wins over ht then it wins by a larger margin, but it happens less rarely than seeing ht winning over hh.

The core reason for this behavior was discussed in my previous post. The hh values can cluster (as e.g. in the hhh pattern), while ht patterns cannot overlap.

Conclusions

I hope you found this puzzle interesting. If you are interested how the properties we described can be proven analytically I recommend you check out this paper.