Author Archives: Great Lakes Consulting

Best Practices for Testing Your Julia Packages

By: Great Lakes Consulting

Re-posted from: https://blog.glcs.io/package-testing

This post was written by Steven Whitaker.

The Julia programming languageis a high-level languagethat is known, at least in part,for its excellent package managerand outstanding composability.(See another blog post that illustrates this composability.)

Julia makes it super easyfor anybody to create their own package.Julia’s package manager enables easy development and testing of packages.The ease of package developmentencourages developers to split reusable chunks of codeinto individual packages,further enhancing Julia’s composability.

In our previous post,we discussed how to create and register your own package.However,to encourage people to actually use your package,it helps to have an assurancethat the package works.This is why testing is important.(Plus, you also want to know your package works, right?)

In this post,we will learn about some of the toolsJulia provides for testing packages.We will also learn how to use GitHub Actionsto run package testsagainst commits and/or pull requeststo check whether code changes break package functionality.

This post assumes you are comfortable navigating the Julia REPL.If you need a refresher,check out our post on the Julia REPL.

Example Package

We will use a custom package called Averages.jlto illustrate how to implement testing in Julia.

The Project.toml looks like:

name = "Averages"uuid = "1fc6e63b-fe0f-463a-8652-42f2a29b8cc6"version = "0.1.0"[deps]Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"[extras]Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"[targets]test = ["Test"]

Note that this Project.toml has two more sections besides [deps]:

  • [extras] is used to indicate additional packagesthat are not direct dependencies of the package.In this example,Test is not used in Averages.jl itself;Test is used only when running tests.
  • [targets] is used to specify what packages are used where.In this example,test = ["Test"] indicates that the Test package should be usedwhen testing Averages.jl.

The actual package code in src/Averages.jl looks like:

module Averagesusing Statisticsexport compute_averagecompute_average(x) = (check_real(x); mean(x))function compute_average(a, b...)    check_real(a)    N = length(a)    for (i, x) in enumerate(b)        check_real(x)        check_length(i + 1, x, N)    end    T = float(promote_type(eltype(a), eltype.(b)...))    average = Vector{T}(undef, N)    average .= a    for x in b        average .+= x    end    average ./= length(b) + 1    return a isa Real ? average[1] : averageendfunction check_real(x)    T = eltype(x)    T <: Real || throw(ArgumentError("only real numbers are supported; unsupported type $T"))endfunction check_length(i, x, expected)    N = length(x)    N == expected || throw(DimensionMismatch("the length of input $i does not match the length of the first input: $N != $expected"))endend

Adding Tests

Tests for a package live in test/runtests.jl.(The file name is important!)Inside this file there are two main testing utilities that are used:@testset and @test.Additionally,@test_throws can also be useful for testing.The Test standard library package provides all of these macros.

  • @testset is used to organize tests into cohesive blocks.
  • @test is used to actually test package functionality.
  • @test_throws is used to ensure the package throws the errors it should.

Here is how test/runtests.jl might look for Averages.jl:

using Averagesusing Test@testset "Averages.jl" begin    a = [1, 2, 3]    b = [4.0, 5.0, 6.0]    c = (BigInt(7), 8f0, Int32(9))    d = 10    e = 11.0    bad = ["hi", "hello", "hey"]    @testset "`compute_average(x)`" begin        @test compute_average(a) == 2        @test compute_average(a) isa Float64        @test compute_average(c) == 8        @test compute_average(c) isa BigFloat        @test compute_average(d) == 10    end    @testset "`compute_average(a, b...)`" begin        @test compute_average(a, a) == a        @test compute_average(a, b) == [2.5, 3.5, 4.5]        @test compute_average(a, b, c) == b        @test compute_average(a, b, c) isa Vector{Float64}        @test compute_average(b, b, b) == b        @test compute_average(d, e) == 10.5    end    @testset "Error Handling" begin        @test_throws ArgumentError compute_average(im)        @test_throws ArgumentError compute_average(a, bad)        @test_throws ArgumentError compute_average(bad, c)        @test_throws DimensionMismatch compute_average(a, b[1:2])        @test_throws DimensionMismatch compute_average(a[1:2], b)    endend

Now let’s look more closely at the macros used:

  • @testset can be given a labelto help organize the reporting Julia doesat the end of testing.Besides that,@testset wraps around a set of tests(including other @testsets).
  • @test is given an expressionthat evaluates to a boolean.If the boolean is true, the test passes;otherwise it fails.
  • @test_throws takes two inputs:an error type and then an expression.The test passes if the expressionthrows an error of the given type.

Testing Against Other Packages

In some cases,you might want to ensure your packageis compatible with a type defined in another package.For our example,let’s test against StaticArrays.jl.Our package does not depend on StaticArrays.jl,so we need to add it as a test-only dependencyby editing the [extras] and [targets] sectionsin the Project.toml:

[extras]StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"[targets]test = ["StaticArrays", "Test"]

(Note that I grabbed the UUID for StaticArrays.jlfrom its Project.toml on GitHub.)

Then we can add some teststo make sure compute_average is generic enoughto work with StaticArrays:

using Averagesusing Testusing StaticArrays@testset "Averages.jl" begin        @testset "StaticArrays.jl" begin        s = SA[12, 13, 14]        @test compute_average(s) == 13        @test compute_average(s, s) == [12, 13, 14]        @test compute_average(a, b, s) == [17/3, 20/3, 23/3]        @test compute_average(s, a, c) == [20/3, 23/3, 26/3]    endend

Running Tests Locally

Now Averages.jl is ready for testing.To run package tests on your own computer,start Julia, activate the package environment,and then run test from the package prompt:

(@v1.X) pkg> activate /path/to/Averages(Averages) pkg> test

The first thing test doesis set up a temporary package environment for testingthat includes the packages defined in the test targetin the Project.toml.Then it runs the tests and displays the result:

     Testing Running tests...Test Summary: | Pass  Total  TimeAverages.jl   |   20     20  0.7s     Testing Averages tests passed

If a test fails,the result looks like this:

     Testing Running tests...`compute_average(a, b...)`: Test Failed at /path/to/Averages/test/runtests.jl:27  Expression: compute_average(a, b) == [2.0, 3.5, 4.5]   Evaluated: [2.5, 3.5, 4.5] == [2.0, 3.5, 4.5]Stacktrace: [1] macro expansion   @ /path/to/julia-1.X.Y/share/julia/stdlib/v1.X/Test/src/Test.jl:672 [inlined] [2] macro expansion   @ /path/to/Averages/test/runtests.jl:27 [inlined] [3] macro expansion   @ /path/to/julia-1.X.Y/share/julia/stdlib/v1.X/Test/src/Test.jl:1577 [inlined] [4] macro expansion   @ /path/to/Averages/test/runtests.jl:26 [inlined] [5] macro expansion   @ /path/to/julia-1.X.Y/share/julia/stdlib/v1.X/Test/src/Test.jl:1577 [inlined] [6] top-level scope   @ /path/to/Averages/test/runtests.jl:7Test Summary:                | Pass  Fail  Total  TimeAverages.jl                  |   19     1     20  0.9s  `compute_average(x)`       |    5            5  0.1s  `compute_average(a, b...)` |    5     1      6  0.6s  Error Handling             |    5            5  0.0s  StaticArrays.jl            |    4            4  0.2sERROR: LoadError: Some tests did not pass: 19 passed, 1 failed, 0 errored, 0 broken.in expression starting at /path/to/Averages/test/runtests.jl:5ERROR: Package Averages errored during testing

Some things to note:

  • When all tests in a test set pass,the test summary does not report the individual resultsof nested test sets.When a test fails,results of nested test sets are reported individuallyto report more precisely where the failure occurred.
  • When a test fails,the file and line number of the failing test are reported,along with the expression that failed.This information is displayedfor all failures that occur.
  • The test summary reports how many tests passed and how many failedin each test set,in addition to how long each test set took.
  • Tests in a test set continue to run after a test fails.To have a test set stop on failure,use the failfast option:
    @testset failfast = true "Averages.jl" begin
    (This option is available only in Julia 1.9 and later.)

Now, when developing Averages.jl,we can run the tests locallyto ensure we don’t break any functionality!

Running Tests with GitHub Actions

Besides running tests locally,one can use GitHub Actions to run testson one of GitHub’s servers.One advantageis that it enables automated testingon various machines/operating systemsand across various Julia versions.Automating tests in this way is an essential part of continuous integration (CI)(so much so that the phrase “running CI”is equivalent to “running tests via GitHub Actions”,even though CI technically involves more than just testing).

To enable testing via GitHub Actions,we just need to add an appropriate .yml filein the .github/workflows directory of our package.As mentioned in our previous post,PkgTemplates.jl can automatically generatethe necessary .yml file.This is the default CI workflow generated by PkgTemplates.jl:

name: CIon:  push:    branches:      - main    tags: ['*']  pull_request:  workflow_dispatch:concurrency:  # Skip intermediate builds: always.  # Cancel intermediate builds: only if it is a pull request build.  group: ${{ github.workflow }}-${{ github.ref }}  cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}jobs:  test:    name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}    runs-on: ${{ matrix.os }}    timeout-minutes: 60    permissions: # needed to allow julia-actions/cache to proactively delete old caches that it has created      actions: write      contents: read    strategy:      fail-fast: false      matrix:        version:          - '1.10'          - '1.6'          - 'pre'        os:          - ubuntu-latest        arch:          - x64    steps:      - uses: actions/checkout@v4      - uses: julia-actions/setup-julia@v2        with:          version: ${{ matrix.version }}          arch: ${{ matrix.arch }}      - uses: julia-actions/cache@v2      - uses: julia-actions/julia-buildpkg@v1      - uses: julia-actions/julia-runtest@v1

For most users,the most relevant fields to customizeare version and os(under jobs: test: strategy: matrix).Under os,specify the operating systems to run tests on(e.g., ubuntu-latest, windows-latest, macOS-latest).Under version,specify the versions of Julia to use when testing:

  • '1.X' means run on Julia 1.X.Y,where Y is the largest patchof Julia 1.X that has been released.For example,'1.9' means run on Julia 1.9.4.
  • '1' means run on the latest stable version of Julia.
  • 'pre' means run on the latest pre-release version of Julia.
  • 'lts' means run on Julia’s long-term support (LTS) version.

Usually,it makes sense just to test '1' and 'pre'to ensure compatibility with the currentand upcoming Julia versions.

One can also fine-tune the version and os fields,as well as other fields,when generating a packagewith PkgTemplates.jl.For example,to generate the .yml fileto run tests only on Windowswith Julia 1.8 and the latest pre-release version of Julia:

using PkgTemplatesgha = GitHubActions(; linux = false, windows = true, extra_versions = ["1.8", "pre"])t = Template(; dir = ".", plugins = [gha])t("MyPackage")

Note that the .yml file generatedwill also include testing on Julia 1.6.The Template constructor has a keyword argument juliathat sets the minimum version of Juliayou want your package to support,and this version is included in testing.As of this writing,by default the minimum version is Julia 1.6.

See the PkgTemplates.jl docsabout Template and GitHubActionsfor more detailson customizing the .yml file.See also the GitHub Actions docs,and in particular the workflow syntax docs,for more details on what makes up the .yml file.(Be warned, these docs are quite lengthyand probably aren’t practically usefulfor most people to get a CI workflow up and running.For a more approachable overview of the .yml file,consider looking at this tutorial for building and testing Python.)

Once we push .github/workflows/CI.yml to GitHub,whenever branch main is pushed to,or a pull request (PR) is opened or pushed to,our package’s tests will run.This is the essence of CI:continuously making sure changes we make to our codeintegrate well with the code base(i.e., don’t break anything).By running tests against PRs,we can be sure changes madedon’t break existing functionality.

One neat thing about GitHub Actionsis that GitHub provides a status badge/iconthat you can display in your package’s README.This badge lets people know

  1. that your package is regularly tested, and
  2. whether the current state of your package passes those tests.

In other words,this badge is a good wayto boost confidence that your package is suitable for use.You can add this badge to your package’s READMEby adding something like the following markdown:

[![CI](https://github.com/username/Averages.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/username/Averages.jl/actions/workflows/CI.yml)

And it will display as follows:

GitHub CI badge

Summary

In this post,we learned how to add teststo our own Julia package.We also learned how to enable CI with GitHub Actionsto run our tests against code changesto ensure our package remains in working order.

How difficult was it for you to set up CI for the first time?Do you have any tips for beginners?Let us know in the comments below!

Additional Links

]]>

How to Create a Julia Package from Scratch

By: Great Lakes Consulting

Re-posted from: https://blog.glcs.io/package-creation

This post was written by Steven Whitaker.

The Julia programming languageis a high-level languagethat is known, at least in part,for its excellent package managerand outstanding composability.(See another blog post that illustrates this composability.)

Julia makes it super easyfor anybody to create their own package.Julia’s package manager enables easy development and testing of packages.The ease of package developmentencourages developers to split reusable chunks of codeinto individual packages,further enhancing Julia’s composability.

In this post,we will learn what comprises a Julia package.We will also discuss toolsthat automate the creation of packages.Finally,we will talk about the basics of package developmentand walk through how to publish (register) a packagefor others to use.

This post assumes you are comfortable navigating the Julia REPL.If you need a refresher,check out our post on the Julia REPL.

Components of a Package

Packages are easy enough to use:just install them with add PkgName in the package promptand then run using PkgName in the julia prompt.But what actually goes into a package?

Packages must follow a specific directory structureand include certain informationto be recognized as a package by Julia.

Suppose we are creating a package called PracticePackage.jl.First, we create a directory called PracticePackage.This directory is the package root.Within the root directory we need a file called Project.tomland another directory called src.

The Project.toml requires the following information:

name = "PracticePackage"uuid = "11111111-2222-3333-aaaa-bbbbbbbbbbbb"authors = ["Your Name <youremail@email.com>"]version = "0.1.0"
  • uuid stands for universally unique identifier,and can be generated in Julia withusing UUIDs; uuid4().The purpose of a UUID is to allow different packages of the same name to coexist.
  • version should be set to whatever version is appropriate for your package,typically "0.1.0" or "1.0.0" for an initial release.The versioning of Julia packages follows SemVer.
  • The Project.toml will also include informationabout package dependencies,but more on that later.

The src directory requires one Julia filenamed PracticePackage.jlthat defines a module named PracticePackage:

module PracticePackage# Package code goes here.end

So, the directory structure of the packagelooks like the following:

PracticePackage Project.toml src     PracticePackage.jl

And that’s all there is to a package!(Well, at least minimally.)

Some Technicalities

Feel free to skip this section,but if you are curious about some technicalitiesfor what comprises a valid package,read on.

  • The Project.toml only needs the name and uuid fieldsfor Julia to recognize the package.Without the version field,Julia treats the version as v0.0.0.
    • However, the version and authors fields are neededto register the package.
  • The name of the package root directory doesn’t matter,meaning it doesn’t have to match the package name.However, the name field in Project.tomldoes have to match the name of the moduledefined in src/PracticePackage.jl,and the file name of src/PracticePackage.jl also has to match.
    • For example,we could change the name of the packageby setting name = "Oops" in Project.toml,renaming src/PracticePackage.jl to src/Oops.jl,and defining module Oops in that file.We would not have to rename the package root directoryfrom PracticePackage to Oops(though that would be a good idea to avoid confusion).

Automatically Generating Packages

The basic structure of a package is pretty simple,so there ought to be a way to automate it, right?(I mean, who wants to manually generate a UUID?)Good news: package creation can be automated!

Package generate Command

Julia comes with a generate package command built-in.First, change directoriesto where the package root directory should live,then run generate in the Julia package prompt:

pkg> generate PracticePackage

This command creates the package root directory PracticePackageand the Project.toml and src/PracticePackage.jl files.Some notes:

  • The Project.toml is pre-filled with the correct fields and values,including an automatically generated UUID.When I ran generate on my computer,it also pre-filled the authors fieldwith my name and email from my ~/.gitconfig file.
  • src/PracticePackage.jl is pre-filledwith a definition for the module PracticePackage.It also defines a function greet in the module,but typically you will replace that with your own code.

PkgTemplates.jl

The generate command works fine,but it’s barebones.For example,if you are planning on hosting your package on GitHub,you might want to include a GitHub Actionfor continuous integration (CI),so it would be niceto automate the creation of the appropriate .yml file.This is where PkgTemplates.jl comes in.

PkgTemplates.jl is a normal Julia package,so install it as usual and run using PkgTemplates.Then we can create our PracticePackage.jl:

t = Template(; dir = ".")t("PracticePackage")

Running this code creates the packagewith the following directory structure:

PracticePackage .git    .github    dependabot.yml    workflows        CI.yml        CompatHelper.yml        TagBot.yml .gitignore LICENSE Manifest.toml Project.toml README.md src    PracticePackage.jl test     runtests.jl

As you can see,PkgTemplates.jl automatically generates a lot of filesthat aid in following package development best practices,like adding CI and tests.

Note that many optionscan be supplied to Templateto customize what files are generated.See the PkgTemplates.jl docs for all the options.

Checklist of settings

Basic Package Development

Once your package is set up,the next step is to actually add code.Add the functions, types, constants, etc.that your package needsdirectly in the PracticePackage module in src/PracticePackage.jl,or add additional files in the src directoryand include them in the module.(See a previous blog post for more information about modules,though note that using modules directly works slightly differentlythan using packages.)

To add dependencies for your package to use,you will need to activate your project’s package environmentand then add packages.For example,if you want your package to use the DataFrames.jl package,start Julia and navigate to your package root directory.Then, activate the package environment and add the package:

(@v1.X) pkg> activate .(PracticePackage) pkg> add DataFrames

After this,you will be able to include using DataFramesin your package codeto enable the functionality provided by DataFrames.jl.

Adding packages after activating the package environmentedits the package’s Project.toml file.It adds a [deps] sectionthat lists the added packages and their UUIDs.In the example above,adding DataFrames.jladds the following lines to the Project.toml file:

[deps]DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"

(And (PracticePackage) pkg> rm DataFrames would remove the DataFrames = ... line,so it is best not to edit the [deps] section manually.)

Finally,to try out your package,activate your package environment (as above)and then load your package as usual:

julia> using PracticePackage # No need to `add PracticePackage` first.

Note that by default Julia will have to be restartedto reload any changes you make to your package code.If you want to avoid restarting Juliawhenever you make changes,check out Revise.jl.

Publishing/Registering a Package

Once your package is in working order,it is natural to want to publish the packagefor others to use.

A package can be publishedby registering it in a package registry,which basically is a map that tells the Julia package managerwhere to find a packageso it can be downloaded.

Treasure map

The General registry is the largest registryas well as the default registry used by Julia;most, if not all, of the most popular open-source packages(DataFrames.jl, Plots.jl, StaticArrays.jl, ModelingToolkit.jl, etc.)exist in General.Once a package is registered in General,it can be installed with pkg> add PracticePackage.

(Note that if registering a package is not desired for some reason,a package can be added via URL, e.g.,pkg> add https://github.com/username/PracticePackage.jl,assuming the package is in a public git repository.However,the package manager has limited abilityto manage packages added in this way;in particular,managing package versions must be done manually.)

The most common wayto register a package in Generalis to use Registrator.jl as a GitHub App.See the README for detailed instructions,but the process basically boils down to:

  1. Write/test package code.
  2. Update the version field in the Project.toml(e.g., to "0.1.0" or "1.0.0" for the first registered version).
  3. Add a comment with @JuliaRegistrator registerto the latest commit that should be includedin the registered version of the package.

Note that there are additional steps for preparing a package for publishingthat we did not discuss in this post(such as specifying compatible versionsof Julia and package dependencies).Refer to the General registry’s documentation and links therein for details.

Summary

In this post,we discussed creating Julia packages.We learned what comprises a package,how to automate package creation,and how to register a package in Julia’s General registry.

What package development tips do you have?Let us know in the comments below!

Additional Links

Cover image background provided by www.proflowers.com athttps://www.flickr.com/photos/127365614@N08/16011252136.

Treasure map image source: https://openclipart.org/detail/299283/x-marks-the-spot

]]>

Julia’s Parallel Processing

By: Great Lakes Consulting

Re-posted from: https://blog.glcs.io/parallel-processing

Julia is a relatively new,free, and open-source programming language.It has a syntaxsimilar to that of other popular programming languagessuch as MATLAB and Python,but it boasts being able to achieve C-like speeds.

While serial Julia code can be fast,sometimes even more speed is desired.In many cases,writing parallel codecan further reduce run time.Parallel code takes advantageof the multiple CPU coresincluded in modern computers,allowing multiple computationsto run at the same time,or in parallel.

Julia provides two methodsfor writing parallel CPU code:multi-threading and distributed computing.This post will coverthe basics ofhow to use these two methodsof parallel processing.

This post assumes you already have Julia installed.If you haven’t yet,check out our earlierpost on how to install Julia.

Multi-Threading

First, let’s learn about multi-threading.

To enable multi-threading,you must start Julia in one of two ways:

  1. Set the environment variable JULIA_NUM_THREADSto the number of threads Julia should use,and then start Julia.For example, JULIA_NUM_THREADS=4.
  2. Run Julia with the --threads (or -t) command line argument.For example, julia --threads 4 or julia -t 4.

After starting Julia(either with or without specifying the number of threads),the Threads module will be loaded.We can check the number of threads Julia has available:

julia> Threads.nthreads()4

The simplest wayto start writing parallel codeis just to use the Threads.@threads macro.Inserting this macro before a for loopwill cause the iterations of the loopto be split across the available threads,which will then operate in parallel.For example:

Threads.@threads for i = 1:10    func(i)end

Without Threads.@threads,first func(1) will run,then func(2), and so on.With the macro,and assuming we started Julia with four threads,first func(1), func(4), func(7), and func(9)will run in parallel.Then,when a thread’s iteration finishes,it will start another iteration(assuming the loop is not done yet),regardless of whether the other threadshave finished their iterations yet.Therefore,this loop will theoretically finish 10 iterationsin the time it takes a single thread to do 3.

Note that Threads.@threads is blocking,meaning code after the threaded for loopwill not run until the loop has finished.

Image of threaded for loop

Julia also provides another macro for multi-threading:Threads.@spawn.This macro is more flexible than Threads.@threadsbecause it can be used to run any codeon a thread,not just for loops.But let’s illustrate how to use Threads.@spawnby implementing the behavior of Threads.@threads:

# Function for splitting up `x` as evenly as possible# across `np` partitions.function partition(x, np)    (len, rem) = divrem(length(x), np)    Base.Generator(1:np) do p        i1 = firstindex(x) + (p - 1) * len        i2 = i1 + len - 1        if p <= rem            i1 += p - 1            i2 += p        else            i1 += rem            i2 += rem        end        chunk = x[i1:i2]    endendN = 10chunks = partition(1:10, Threads.nthreads())tasks = map(chunks) do chunk    Threads.@spawn for i in chunk        func(i)    endendwait.(tasks)

Let’s walk through this code,assuming Threads.nthreads() == 4:

  • First, we split the 10 iterationsevenly across the 4 threadsusing partition.So, chunks ends up being[1:3, 4:6, 7:8, 9:10].(We could have hard-coded the partitioning,but now you have a nice partition functionthat can work with more complicated partitionings!)
  • Then, for each chunk,we create a Task via Threads.@spawnthat will call funcon each element of the chunk.This Task will be scheduledto run on an available thread.tasks contains a referenceto each of these spawned Tasks.
  • Finally, we wait for the Tasks to finishwith the wait function.

To reemphasize, note that Threads.@spawn creates a Task;it does not wait for the task to run.As such, it is non-blocking,and program execution continuesas soon as the Task is returned.The code wrapped in the taskwill also run, but in parallel, on a separate thread.This behavior is illustrated below:

julia> Threads.@spawn (sleep(2); println("Spawned task finished"))Task (runnable) @0x00007fdd4b10dc30julia> 1 + 1 # This code executes without waiting for the above task to finish2julia> Spawned task finished # Prints 2 seconds after spawning the above taskjulia>

Spawned tasks can also return data.While wait just waits for a task to finish,fetch waits for a taskand then obtains the result:

julia> task = Threads.@spawn (sleep(2); 1 + 1)Task (runnable) @0x00007fdd4a5e28b0julia> fetch(task)2

Thread Safety

When using multi-threading,memory is shared across threads.If a thread writes to a memory locationthat is written to or read from another thread,that will lead to a race conditionwith unpredictable results.To illustrate:

julia> s = 0;julia> Threads.@threads for i = 1:1000000           global s += i       endjulia> s19566554653 # Should be 500000500000

Race condition

There are two methods we can useto avoid the race condition.The first involves using a lock:

julia> s = 0; l = ReentrantLock();julia> Threads.@threads for i = 1:1000000           lock(l) do               global s += i           end       endjulia> s500000500000

In this case,the addition can only occuron a given threadonce that thread holds the lock.If a thread does not hold the lock,it must wait for whatever thread controls itto release the lockbefore it can run the codewithin the lock block.

Using a lock in this exampleis suboptimal, however,as it eliminates all parallelismbecause only one thread can hold the lockat any given moment.(In other examples, however,using a lock works great,particularly when only a small portionof the code depends on the lock.)

The other way to eliminate the race conditionis to use task-local buffers:

julia> s = 0; chunks = partition(1:1000000, Threads.nthreads());julia> tasks = map(chunks) do chunk           Threads.@spawn begin               x = 0               for i in chunk                   x += i               end               x           end       end;julia> thread_sums = fetch.(tasks);julia> for i in thread_sums           s += i       endjulia> s500000500000

In this example,each spawned task has its own xthat stores the sumof the values just in the task’s chunk of data.In particular,none of the tasks modify s.Then, once each task has computed its sum,the intermediate values are summedand stored in sin a single-threaded manner.

Using task-local buffersworks better for this examplethan using a lockbecause most of the parallelism is preserved.

(Note that it used to be advisedto manage task-local buffersusing the threadid function.However, doing so does not guaranteeeach task uses its own buffer.Therefore, the method demonstrated in the above exampleis now advised.)

Packages for Quickly Utilizing Multi-Threading

In addition to writing your own multi-threaded code,there exist packages that utilize multi-threading.Two such examples are ThreadsX.jl and ThreadTools.jl.

ThreadsX.jl provides multi-threaded implementationsof several common functionssuch as sum and sort,while ThreadTools.jl provides tmap,a multi-threaded version of map.

These packages can be greatfor quickly boosting performancewithout having to figure out multi-threadingon your own.

Distributed Computing

Besides multi-threading,Julia also provides for distributed computing,or splitting work across multiple Julia processes.

There are two ways to start multiple Julia processes:

  1. Load the Distributed standard library packagewith using Distributedand then use addprocs.For example, addprocs(2)to add two additional Julia processes(for a total of three).
  2. Run Julia with the -p command line argument.For example, julia -p 2to start Julia with three total Julia processes.(Note that running Julia with -pwill implicitly load Distributed.)

Added processes are known as worker processes,while the original process is the main process.Each process has an id:the main process has id 1,and worker processes have id 2, 3, etc.

By default,code runs on the main process.To run code on a worker,we need to explicitly give code to that worker.We can do so with remotecall_fetch,which takes as inputsa function to run,the process id to run the function on,and the input arguments and keyword argumentsthe function needs.Here are some examples:

# Create a zero-argument anonymous function to run on worker 2.julia> remotecall_fetch(2) do           println("Done")       end      From worker 2:    Done# Create a two-argument anonymous function to run on worker 2.julia> remotecall_fetch((a, b) -> a + b, 2, 1, 2)3# Run `sum([1 3; 2 4]; dims = 1)` on worker 3.julia> remotecall_fetch(sum, 3, [1 3; 2 4]; dims = 1)1x2 Matrix{Int64}: 3  7

If you don’t need to wait for the result immediately,use remotecall instead of remotecall_fetch.This will create a Futurethat you can later wait on or fetch(similarly to a Task spawned with Threads.@spawn).

Super computer

Separate Memory Spaces

One significant differencebetween multi-threading and distributed processingis that memory is shared in multi-threading,while each distributed processhas its own separate memory space.This has several important implications:

  • To use a package on a given worker,it must be loaded on that worker,not just on the main process.To illustrate:

    julia> using LinearAlgebrajulia> IUniformScaling{Bool}true*Ijulia> remotecall_fetch(() -> I, 2)ERROR: On worker 2:UndefVarError: `I` not defined

    To avoid the error,we could use @everywhere using LinearAlgebrato load LinearAlgebra on all processes.

  • Similarly to the previous point,functions defined on one processare not available on other processes.Prepend a function definition with @everywhereto allow using the function on all processes:

    julia> @everywhere function myadd(a, b)           a + b       end;julia> myadd(1, 2)3# This would error without `@everywhere` above.julia> remotecall_fetch(myadd, 2, 3, 4)7
  • Global variables are not shared,even if defined everywhere with @everywhere:

    julia> @everywhere x = [0];julia> remotecall_fetch(2) do           x[1] = 2       end;# `x` was modified on worker 2.julia> remotecall_fetch(() -> x, 2)1-element Vector{Int64}: 2# `x` was not modified on worker 3.julia> remotecall_fetch(() -> x, 3)1-element Vector{Int64}: 0

    If needed,an array of data can be sharedacross processesby using a SharedArray,provided by the SharedArrays standard library package:

    julia> @everywhere using SharedArrays# We don't need `@everywhere` when defining a `SharedArray`.julia> x = SharedArray{Int,1}(1)1-element SharedVector{Int64}: 0julia> remotecall_fetch(2) do           x[1] = 2       end;julia> remotecall_fetch(() -> x, 2)1-element SharedVector{Int64}: 2julia> remotecall_fetch(() -> x, 3)1-element SharedVector{Int64}: 2

Now, a note about command line arguments.When adding worker processes with -p,those processes are spawnedwith the same command line argumentsas the main Julia process.With addprocs, however,each of those added processesare started with no command line arguments.Below is an example of where this behaviormight cause some confusion:

$ JULIA_NUM_THREADS=4 julia --banner=no -t 1julia> Threads.nthreads()1julia> using Distributedjulia> addprocs(1);julia> remotecall_fetch(Threads.nthreads, 2)4

In this situation, we have the environment variable JULIA_NUM_THREADS(for example, because normally we run Julia with four threads).But in this particular casewe want to run Julia with just one thread,so we set -t 1.Then we add a process,but it turns out that processhas four threads, not one!This is because the environment variable was set,but no command line arguments were givento the added process.To use just one threadfor the added process,we would need to use the exeflags keyword argumentto addprocs:

addprocs(1; exeflags = ["-t 1"])

As a final note, if needed,processes can be removedwith rmprocs,which removes the processesassociated with the provided worker ids.

Summary

In this post,we have provided an introductionto parallel processing in Julia.We discussed the basicsof both multi-threading and distributed computing,how to use them in Julia,and some things to watch out for.

As a parting piece of advice,when choosing whether to use multi-threading or distributed processing,choose multi-threadingunless you have a specific needfor multiple processes with distinct memory spaces.Multi-threading has lower overheadand generally is easier to use.

How do you use parallel processing in your code?Let us know in the comments below!

Additional Links