Author Archives: Steven Whitaker

Efficient Julia Optimization through an MRI Case Study

By: Steven Whitaker

Re-posted from: https://blog.glcs.io/optimization-mri

Welcome to our firstJulia for Devsblog post!This will be a continuously running series of postswhere our team will discusshow we have used Juliato solve real-life problems.So, let’s get started!

In this Julia for Devs post,we will discuss using Juliato optimize scan parametersin magnetic resonance imaging (MRI).

If you are interestedjust in seeing example codeshowing how to use Juliato optimize your own cost function,feel free to skip ahead.

While we will discuss MRI specifically,the concepts(and even some of the code)we will seewill be applicablein many situations where optimization is useful,particularly when there is a modelfor an observable signal.The model could be, e.g.,a mathematical equationor a trained neural network,and the observable signal(aka the output of the model)could be, e.g.,the image intensity of a medical imaging scanor the energy needed to heat a building.

Problem Setup

In this post,the signal model we will useis a mathematical equationthat gives the image intensityof a balanced steady-state free precession (bSSFP) MRI scan.This model has two sets of inputs:those that are user-definedand those that are not.In MRI,user-defined inputs are scan parameters,such as how long to run the scan foror how much power to use to generate MRI signal.The other input parameters are called tissue parametersbecause they are properties that are intrinsicto the imaged tissue (e.g., muscle, brain, etc.).

Tissue parameters often are assumedto take on a pre-defined valuefor each specific tissue type.However,because they are tissue-specificand can vary with health and age,sometimes tissue parameters are considered to be unknown valuesthat need to be estimated from data.

The problem we will discuss in this postis how to estimate tissue parametersfrom a set of bSSFP scans.Then we will discusshow to optimize the scan parametersof the set of bSSFP scansto improve the tissue parameter estimates.

To estimate tissue parameters,we need the following:

  1. a signal model, and
  2. an estimation algorithm.

Signal Model

Here’s the signal model:

# Reference: https://www.mriquestions.com/true-fispfiesta.htmlfunction bssfp(TR, flip, T1, T2)    E1 = exp(-TR / T1)    E2 = exp(-TR / T2)    (sin, cos) = sincosd(flip)    num = sin * (1 - E1)    den = 1 - cos * (E1 - E2) - E1 * E2    signal = num / den    return signalend

This function returns the bSSFP signalas a function of two scan parameters(TR, a value in units of seconds,and flip, a value in units of degrees)and two tissue parameters(T1 and T2, each a value in units of seconds).

Estimation Algorithm

There are various estimation algorithms out there,but, for simplicity in this post,we will stick with a grid search.We will compute the bSSFP signalover many pairs of T1 and T2 values.We will compare these computed signalsto the actual observed signal,and the pair of T1 and T2 valuesthat results in the closest matchwill be chosen as the estimates of the tissue parameters.Here’s the algorithm in code:

# `signal` is the observed signal.# `TR` and `flip` are the scan parameters that were used,# and we want to estimate `T1` and `T2`.# `signal`, `TR` and `flip` should be vectors of the same length,# representing running multiple scans# and recording the observed signal for each pair of `TR` and `flip` values.function gridsearch(signal, TR, flip)    # Specify the grid of values to search over.    # `T1_grid` and `T2_grid` could optionally be inputs to this function.    T1_grid = exp10.(range(log10(0.5), log10(3), 40))    T2_grid = exp10.(range(log10(0.005), log10(0.7), 40))    T1_est = T1_grid[1]    T2_est = T2_grid[1]    best = Inf    # Pre-allocate memory for some computations to speed up the following loop.    residual = similar(signal)    # Iterate over the Cartesian product of `T1_grid` and `T2_grid`.    for (T1, T2) in Iterators.product(T1_grid, T2_grid)        # Physical constraint: T1 is greater than T2, and both are positive.        T1 > T2 > 0 || continue        # For the given T1 and T2 pairs, compute the (noiseless) bSSFP signal        # for each bSSFP scan and subtract them from the given signals.        # Tip: In Julia, one can apply a scalar function elementwise on vector        #      inputs using the "dot" notation (see the `.` after `bssfp` below).        residual .= signal .- bssfp.(TR, flip, T1, T2)        # Compute the norm squared of the above difference.        err = residual' * residual        # If the candidate T1 and T2 pair fit the given signals better,        # keep them as the current estimate of T1 and T2.        if err < best            best = err            T1_est = T1            T2_est = T2        end    end    isinf(best) && error("no valid grid points; consider changing `T1_grid` and/or `T2_grid`")    return (T1_est, T2_est)end

Cost Function

Now,to optimize scan parameterswe need a function to optimize,also called a cost function.In this case,we want to minimize the errorbetween what our estimator tells usand the true tissue parameter values.But because we need to optimize scan parametersbefore running any scans,we will simulate bSSFP MRI scansusing the given sets of scan parametersand average the estimator errorover several sets of tissue parameters.Here’s the code for the cost function:

# Load the Statistics standard library package to get access to the `mean` function.using Statistics# `TR` and `flip` are vectors of the same length# specifying the pairs of scan parameters to use.function estimator_error(TR, flip)    # Specify the grid of values to average over and the noise level to simulate.    # For a given pair of `T1_true` and `T2_true` values,    # bSSFP signals will be simulated, and then the estimator    # will attempt to estimate what `T1_true` and `T2_true` values were used    # based on the simulated bSSFP signals and the given scan parameters.    T1_true = [0.8, 1.0, 1.5]    T2_true = [0.05, 0.08, 0.1]    noise_level = 0.01    T1_avg = mean(T1_true)    T2_avg = mean(T2_true)    # Pre-allocate memory for some computations to speed up the following loop.    signal = float.(TR)    # The following computes the mean estimator error over all pairs    # of true T1 and T2 values.    return mean(Iterators.product(T1_true, T2_true)) do (T1, T2)        # Ignore pairs for which `T2 > T1`.        T2 > T1 && return 0        # Simulate noisy signal using the true T1 and T2 values.        signal .= bssfp.(TR, flip, T1, T2) .+ noise_level .* randn.()        # Estimate T1 and T2 from the noisy signal.        (T1_est, T2_est) = gridsearch(signal, TR, flip)        # Compare estimates to truth.        T1_err = (T1_est - T1)^2        T2_err = (T2_est - T2)^2        # Combine the T1 and T2 errors into one single error metric        # by averaging the respective errors        # after normalizing by the mean true value of each parameter.        err = (T1_err / T1_avg + T2_err / T2_avg) / 2    endend

Optimization

Now we are finally ready to optimize!We will use Optimization.jl.Many optimizers are available,but because we have a non-convex optimization problem,we will use the adaptive particle swarm global optimization algorithm.Here’s a function that does the optimization:

# Load needed packages.# Optimization.jl provides the optimization infrastructure,# while OptimizationOptimJL.jl wraps the Optim.jl package# that provides the optimization algorithm we will use.using Optimization, OptimizationOptimJL# Load the Random standard library package to enable setting the random seed.using Random# `TR_init` and `flip_init` are vectors of the same length# that provide a starting point for the optimization algorithm.# The length of these vectors also determines the number of bSSFP scans to simulate.function scan_optimization(TR_init, flip_init)    # Ensure randomly generated noise is the same for each evaluation.    Random.seed!(0)    N_scans = length(TR_init)    length(flip_init) == N_scans || error("`TR_init` and `flip_init` have different lengths")    # The following converts the cost function we created (`estimator_error`)    # into a form that Optimization.jl can use.    # Specifically, Optimization.jl needs a cost function that takes two inputs    # (the first of which contains the parameters to optimize)    # and returns a real number.    # The input `x` is a concatenation of `TR` and `flip`,    # i.e., `TR == x[1:N_scans]` and `flip == x[N_scans+1:end]`.    # The input `p` is unused, but is needed by Optimization.jl.    cost_fun = (x, p) -> estimator_error(x[1:N_scans], x[N_scans+1:end])    # Specify constraints.    # The lower and upper bounds are chosen to ensure reasonable bSSFP scans.    (min_TR, max_TR) = (0.001, 0.02)    (min_flip, max_flip) = (1, 90)    constraints = (;        lb = [fill(min_TR, N_scans); fill(min_flip, N_scans)],        ub = [fill(max_TR, N_scans); fill(max_flip, N_scans)],    )    # Set up and solve the problem.    f = OptimizationFunction(cost_fun)    prob = OptimizationProblem(f, [TR_init; flip_init]; constraints...)    sol = solve(prob, ParticleSwarm(lower = prob.lb, upper = prob.ub, n_particles = 3))    # Extract the optimized TRs and flip angles, remembering that `sol.u == [TR; flip]`.    TR_opt = sol.u[1:N_scans]    flip_opt = sol.u[N_scans+1:end]    return (TR_opt, flip_opt)end

Results

Let’s see how scan parameter optimizationcan improve tissue parameter estimates.We will use the following functionto take scan parameters,simulate bSSFP scans,and then estimate the tissue parametersfrom those scans.We will compare the results of this functionwith manually chosen scan parametersto those with optimized scan parameters.

# Load the LinearAlgebra standard library package for access to `norm`.using LinearAlgebra# Load Plots.jl for displaying the results.using Plots# Helper function for plotting.function plot_true_vs_est(T_true, T_est, title, rmse, clim)    return heatmap([T_true T_est];        title,        clim,        xlabel = "RMSE = $(round(rmse; digits = 4)) s",        xticks = [],        yticks = [],        showaxis = false,        aspect_ratio = 1,    )endfunction run(TR, flip)    # Create a synthetic object to scan.    # The background of the object will be indicated with values of `0.0`    # for the tissue parameters.    nx = ny = 128    object = map(Iterators.product(range(-1, 1, nx), range(-1, 1, ny))) do (x, y)        r = hypot(x, y)        if r < 0.5            return (0.8, 0.08)        elseif r < 0.8            return (1.3, 0.09)        else            return (0.0, 0.0)        end    end    # Simulate the bSSFP scans.    T1_true = first.(object)    T2_true = last.(object)    noise_level = 0.001    signal = map(T1_true, T2_true) do T1, T2        # Ignore the background of the object.        (T1, T2) == (0.0, 0.0) && return 0.0        bssfp.(TR, flip, T1, T2) .+ noise_level .* randn.()    end    # Estimate the tissue parameters.    T1_est = zeros(nx, ny)    T2_est = zeros(nx, ny)    for i in eachindex(signal, T1_est, T2_est)        # Don't try to estimate tissue parameters for the background.        signal[i] == 0.0 && continue        (T1_est[i], T2_est[i]) = gridsearch(signal[i], TR, flip)    end    # Compute the root mean squared error.    m = T1_true .!= 0.0    T1_rmse = sqrt(norm(T1_true[m] - T1_est[m]) / count(m))    T2_rmse = sqrt(norm(T2_true[m] - T2_est[m]) / count(m))    # Plot the results.    p_T1 = plot_true_vs_est(T1_true, T1_est, "True vs Estimated T1", T1_rmse, (0, 2.5))    p_T2 = plot_true_vs_est(T2_true, T2_est, "True vs Estimated T2", T2_rmse, (0, 0.25))    return (p_T1, p_T2)end

First, let’s see the results with no optimization:

TR_init = [0.005, 0.005, 0.005]flip_init = [30, 60, 80](p_T1_init, p_T2_init) = run(TR_init, flip_init)
display(p_T1_init)

T1 estimates using initial scan parameters

display(p_T2_init)

T2 estimates using initial scan parameters

Now let’s optimize the scan parametersand then see how the tissue parameter estimates look:

(TR_opt, flip_opt) = scan_optimization(TR_init, flip_init)(p_T1_opt, p_T2_opt) = run(TR_opt, flip_opt)
display(p_T1_opt)

T1 estimates using optimized scan parameters

display(p_T2_opt)

T2 estimates using optimized scan parameters

We can see that the optimized scansresult in much better tissue parameter estimates!

Summary

In this post,we saw how Julia can be usedto optimize MRI scan parametersto improve tissue parameter estimates.Even though we discussed MRI specifically,the concepts presented hereeasily extend to other applicationswhere signal models are knownand optimization is required.

Note that most of the code for this postwas taken from a Dash appI helped create for JuliaHub.Feel free to check it outif you want to see this code in action!

Additional Links

Julia’s Parallel Processing

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/parallel-processing

Julia is a relatively new, free, and open-source programming language. It has a syntax similar to that of other popular programming languages such as MATLAB and Python, but it boasts being able to achieve C-like speeds.

While serial Julia code can be fast, sometimes even more speed is desired. In many cases, writing parallel code can further reduce run time. Parallel code takes advantage of the multiple CPU cores included in modern computers, allowing multiple computations to run at the same time, or in parallel.

Julia provides two methods for writing parallel CPU code: multi-threading and distributed computing. This post will cover the basics of how to use these two methods of parallel processing.

This post assumes you already have Julia installed. If you haven’t yet, check out our earlier post on how to install Julia.

Multi-Threading

First, let’s learn about multi-threading.

To enable multi-threading, you must start Julia in one of two ways:

  1. Set the environment variable JULIA_NUM_THREADS to the number of threads Julia should use, and then start Julia. For example, JULIA_NUM_THREADS=4.

  2. Run Julia with the --threads (or -t) command line argument. For example, julia --threads 4 or julia -t 4.

After starting Julia (either with or without specifying the number of threads), the Threads module will be loaded. We can check the number of threads Julia has available:

julia> Threads.nthreads()4

The simplest way to start writing parallel code is just to use the Threads.@threads macro. Inserting this macro before a for loop will cause the iterations of the loop to be split across the available threads, which will then operate in parallel. For example:

Threads.@threads for i = 1:10    func(i)end

Without Threads.@threads, first func(1) will run, then func(2), and so on. With the macro, and assuming we started Julia with four threads, first func(1), func(4), func(7), and func(9) will run in parallel. Then, when a thread’s iteration finishes, it will start another iteration (assuming the loop is not done yet), regardless of whether the other threads have finished their iterations yet. Therefore, this loop will theoretically finish 10 iterations in the time it takes a single thread to do 3.

Note that Threads.@threads is blocking, meaning code after the threaded for loop will not run until the loop has finished.

Image of threaded for loop

threads_for

Julia also provides another macro for multi-threading: Threads.@spawn. This macro is more flexible than Threads.@threads because it can be used to run any code on a thread, not just for loops. But let’s illustrate how to use Threads.@spawn by implementing the behavior of Threads.@threads:

# Function for splitting up `x` as evenly as possible# across `np` partitions.function partition(x, np)    (len, rem) = divrem(length(x), np)    Base.Generator(1:np) do p        i1 = firstindex(x) + (p - 1) * len        i2 = i1 + len - 1        if p <= rem            i1 += p - 1            i2 += p        else            i1 += rem            i2 += rem        end        chunk = x[i1:i2]    endendN = 10chunks = partition(1:10, Threads.nthreads())tasks = map(chunks) do chunk    Threads.@spawn for i in chunk        func(i)    endendwait.(tasks)

Let’s walk through this code, assuming Threads.nthreads() == 4:

  • First, we split the 10 iterations evenly across the 4 threads using partition. So, chunks ends up being [1:3, 4:6, 7:8, 9:10]. (We could have hard-coded the partitioning, but now you have a nice partition function that can work with more complicated partitionings!)

  • Then, for each chunk, we create a Task via Threads.@spawn that will call func on each element of the chunk. This Task will be scheduled to run on an available thread. tasks contains a reference to each of these spawned Tasks.

  • Finally, we wait for the Tasks to finish with the wait function.

To reemphasize, note that Threads.@spawn creates a Task; it does not wait for the task to run. As such, it is non-blocking, and program execution continues as soon as the Task is returned. The code wrapped in the task will also run, but in parallel, on a separate thread. This behavior is illustrated below:

julia> Threads.@spawn (sleep(2); println("Spawned task finished"))Task (runnable) @0x00007fdd4b10dc30julia> 1 + 1 # This code executes without waiting for the above task to finish2julia> Spawned task finished # Prints 2 seconds after spawning the above taskjulia>

Spawned tasks can also return data. While wait just waits for a task to finish, fetch waits for a task and then obtains the result:

julia> task = Threads.@spawn (sleep(2); 1 + 1)Task (runnable) @0x00007fdd4a5e28b0julia> fetch(task)2

Thread Safety

When using multi-threading, memory is shared across threads. If a thread writes to a memory location that is written to or read from another thread, that will lead to a race condition with unpredictable results. To illustrate:

julia> s = 0;julia> Threads.@threads for i = 1:1000000           global s += i       endjulia> s19566554653 # Should be 500000500000

Race condition

race_condition

There are two methods we can use to avoid the race condition. The first involves using a lock:

julia> s = 0; l = ReentrantLock();julia> Threads.@threads for i = 1:1000000           lock(l) do               global s += i           end       endjulia> s500000500000

In this case, the addition can only occur on a given thread once that thread holds the lock. If a thread does not hold the lock, it must wait for whatever thread controls it to release the lock before it can run the code within the lock block.

Using a lock in this example is suboptimal, however, as it eliminates all parallelism because only one thread can hold the lock at any given moment. (In other examples, however, using a lock works great, particularly when only a small portion of the code depends on the lock.)

The other way to eliminate the race condition is to use task-local buffers:

julia> s = 0; chunks = partition(1:1000000, Threads.nthreads());julia> tasks = map(chunks) do chunk           Threads.@spawn begin               x = 0               for i in chunk                   x += i               end               x           end       end;julia> thread_sums = fetch.(tasks);julia> for i in thread_sums           s += i       endjulia> s500000500000

In this example, each spawned task has its own x that stores the sum of the values just in the task’s chunk of data. In particular, none of the tasks modify s. Then, once each task has computed its sum, the intermediate values are summed and stored in s in a single-threaded manner.

Using task-local buffers works better for this example than using a lock because most of the parallelism is preserved.

(Note that it used to be advised to manage task-local buffers using the threadid function. However, doing so does not guarantee each task uses its own buffer. Therefore, the method demonstrated in the above example is now advised.)

Packages for Quickly Utilizing Multi-Threading

In addition to writing your own multi-threaded code, there exist packages that utilize multi-threading. Two such examples are ThreadsX.jl and ThreadTools.jl.

ThreadsX.jl provides multi-threaded implementations of several common functions such as sum and sort, while ThreadTools.jl provides tmap, a multi-threaded version of map.

These packages can be great for quickly boosting performance without having to figure out multi-threading on your own.

Distributed Computing

Besides multi-threading, Julia also provides for distributed computing, or splitting work across multiple Julia processes.

There are two ways to start multiple Julia processes:

  1. Load the Distributed standard library package with using Distributed and then use addprocs. For example, addprocs(2) to add two additional Julia processes (for a total of three).

  2. Run Julia with the -p command line argument. For example, julia -p 2 to start Julia with three total Julia processes. (Note that running Julia with -p will implicitly load Distributed.)

Added processes are known as worker processes, while the original process is the main process. Each process has an id: the main process has id 1, and worker processes have id 2, 3, etc.

By default, code runs on the main process. To run code on a worker, we need to explicitly give code to that worker. We can do so with remotecall_fetch, which takes as inputs a function to run, the process id to run the function on, and the input arguments and keyword arguments the function needs. Here are some examples:

# Create a zero-argument anonymous function to run on worker 2.julia> remotecall_fetch(2) do           println("Done")       end      From worker 2:    Done# Create a two-argument anonymous function to run on worker 2.julia> remotecall_fetch((a, b) -> a + b, 2, 1, 2)3# Run `sum([1 3; 2 4]; dims = 1)` on worker 3.julia> remotecall_fetch(sum, 3, [1 3; 2 4]; dims = 1)1x2 Matrix{Int64}: 3  7

If you don’t need to wait for the result immediately, use remotecall instead of remotecall_fetch. This will create a Future that you can later wait on or fetch (similarly to a Task spawned with Threads.@spawn).

Super computer

super_computer

Separate Memory Spaces

One significant difference between multi-threading and distributed processing is that memory is shared in multi-threading, while each distributed process has its own separate memory space. This has several important implications:

  • To use a package on a given worker, it must be loaded on that worker, not just on the main process. To illustrate:

      julia> using LinearAlgebra  julia> I  UniformScaling{Bool}  true*I  julia> remotecall_fetch(() -> I, 2)  ERROR: On worker 2:  UndefVarError: `I` not defined

    To avoid the error, we could use @everywhere using LinearAlgebra to load LinearAlgebra on all processes.

  • Similarly to the previous point, functions defined on one process are not available on other processes. Prepend a function definition with @everywhere to allow using the function on all processes:

      julia> @everywhere function myadd(a, b)             a + b         end;  julia> myadd(1, 2)  3  # This would error without `@everywhere` above.  julia> remotecall_fetch(myadd, 2, 3, 4)  7
  • Global variables are not shared, even if defined everywhere with @everywhere:

      julia> @everywhere x = [0];  julia> remotecall_fetch(2) do             x[1] = 2         end;  # `x` was modified on worker 2.  julia> remotecall_fetch(() -> x, 2)  1-element Vector{Int64}:   2  # `x` was not modified on worker 3.  julia> remotecall_fetch(() -> x, 3)  1-element Vector{Int64}:   0

    If needed, an array of data can be shared across processes by using a SharedArray, provided by the SharedArrays standard library package:

      julia> @everywhere using SharedArrays  # We don't need `@everywhere` when defining a `SharedArray`.  julia> x = SharedArray{Int,1}(1)  1-element SharedVector{Int64}:   0  julia> remotecall_fetch(2) do             x[1] = 2         end;  julia> remotecall_fetch(() -> x, 2)  1-element SharedVector{Int64}:   2  julia> remotecall_fetch(() -> x, 3)  1-element SharedVector{Int64}:   2

Now, a note about command line arguments. When adding worker processes with -p, those processes are spawned with the same command line arguments as the main Julia process. With addprocs, however, each of those added processes are started with no command line arguments. Below is an example of where this behavior might cause some confusion:

$ JULIA_NUM_THREADS=4 julia --banner=no -t 1julia> Threads.nthreads()1julia> using Distributedjulia> addprocs(1);julia> remotecall_fetch(Threads.nthreads, 2)4

In this situation, we have the environment variable JULIA_NUM_THREADS (for example, because normally we run Julia with four threads). But in this particular case we want to run Julia with just one thread, so we set -t 1. Then we add a process, but it turns out that process has four threads, not one! This is because the environment variable was set, but no command line arguments were given to the added process. To use just one thread for the added process, we would need to use the exeflags keyword argument to addprocs:

addprocs(1; exeflags = ["-t 1"])

As a final note, if needed, processes can be removed with rmprocs, which removes the processes associated with the provided worker ids.

Summary

In this post, we have provided an introduction to parallel processing in Julia. We discussed the basics of both multi-threading and distributed computing, how to use them in Julia, and some things to watch out for.

As a parting piece of advice, when choosing whether to use multi-threading or distributed processing, choose multi-threading unless you have a specific need for multiple processes with distinct memory spaces. Multi-threading has lower overhead and generally is easier to use.

How do you use parallel processing in your code? Let us know in the comments below!

Additional Links

Exploring Julia 1.10 – Key Features and Updates

By: Steven Whitaker

Re-posted from: https://glcs.hashnode.dev/julia-1-10

A new version of the Julia programming languagewas just released!Version 1.10 is now the latest stable version of Julia.

This release is a minor release,meaning it includes language enhancementsand bug fixesbut should also be fully compatiblewith code written in previous Julia versions(from version 1.0 and onward).

In this post,we will check out some of the features and improvementsintroduced in this newest Julia version.Read the full post,or click on the links belowto jump to the features that interest you.

If you are new to Julia(or just need a refresher),feel free to check out our Julia tutorial series,beginning with how to install Julia and VS Code.

Improved Latency, or Getting Started Faster

Julia 1.10 has improved latency,which means you can get started faster.

Two sources of latencyhistorically have been slow in Julia:package loadingand just-in-time code compilation.A classic example where this latency was readily noticeablewas when trying to create a plot;consequently,this latency often is calledthe time to first plot (TTFP),or how long one has to waitbefore seeing a plot.

Note that the TTFP issue exists in the first placebecause Julia was designedwith a trade-off in mind:by taking the time to compile a functionthe first time it is called,subsequent calls to the functioncan run at speeds comparable to C.This, however, leads to increased latencyon the first call.

Recent Julia versions have been tackling this issue,and Julia 1.10 further improves latency.

Below is a screenshot of a slide sharedduring the State of Julia talk at JuliaCon 2023.It shows how the time it takesto load Plots.jland then call plotdecreases when moving from Julia 1.8to Julia 1.9and then to Julia 1.10(in this case, Julia 1.10wasn’t released yet,so the alpha version was used).

Improved latency

I saw similar results on my computercomparing Julia 1.9.4 to Julia 1.10.0-rc1(the first release candidate of Julia 1.10):

# Julia 1.9.4julia> @time using Plots  1.278046 seconds (3.39 M allocations: 194.392 MiB, 10.10% gc time, 6.28% compilation time: 89% of which was recompilation)julia> @time display(plot(1:10))  0.365514 seconds (246.08 k allocations: 16.338 MiB, 58.76% compilation time: 10% of which was recompilation)# Julia 1.10.0-rc1julia> @time using Plots  0.713279 seconds (1.42 M allocations: 97.684 MiB, 3.30% gc time, 15.26% compilation time: 86% of which was recompilation)julia> @time display(plot(1:10))  0.257097 seconds (247.72 k allocations: 17.621 MiB, 6.29% gc time, 81.56% compilation time: 9% of which was recompilation)

It’s amazing how much latencyhas been improved!

Better Error Messages

Julia 1.10 now uses JuliaSyntax.jlas the default parser,replacing the old Lisp-based parser.

Having a new parserdoesn’t change how the language runs,but the new parserdoes improve error messages,enabling easier debuggingand creating a lower barrier to entryfor new Julia users.

As an example,consider the following buggy code:

julia> count = 0;julia> for i = 1:10           count++       end

Can you spot the error?

Julia 1.9 gives the following error message:

ERROR: syntax: unexpected "end"

Julia 1.10 gives the following:

ERROR: ParseError:# Error @ REPL[2]:3:1    count++end  invalid identifier

There are at least three improvementsto the error message:

  1. The file location of the offending tokenis prominently displayed.(REPL[2]:3:1 meansthe second REPL entry,the third line,and the first character.This would be replacedwith a file path and line and character numbersif the code were run in a file.)
  2. The specific offending tokenis pointed out with some context.
  3. It is now clear that an identifier(i.e., a variable name)was expectedafter count++.(Note that ++ is a user-definableinfix operator in Julia;so just as a + end is an error,so too is count ++ end.)

Improved error messagesare certainly a welcome addition!

Multithreaded Garbage Collection

Part of Julia’s garbage collectionis now parallelizedin Julia 1.10,resulting in faster garbage collection.

Below is a screenshot of a slide sharedduring the State of Julia talk at JuliaCon 2023.It shows the percentage of timea piece of code spentdoing garbage collectionin different Julia versions(here the master branch is a pre-release version of Julia 1.10).The takeaway is that using threadsdecreased garbage collection time!

Faster garbage collection

The parallelization is implementedusing threads,and the number of threadsavailable for garbage collectioncan be specified when starting Juliawith the command line argument --gcthreads.For example,to use four threads for garbage collection:

julia --gcthreads=4

By default,--gcthreads is halfthe total number of threadsJulia is started with.

Experiment with different numbersof garbage collection threadsto see what works bestfor your code.

Timing Package Precompilation

Timing how long individual packages take to precompileis now easily achieved withPkg.precompile(timing = true).

In Julia 1.9,Pkg.precompile reported just the overall time precompilation took:

julia> using Pkg; Pkg.precompile()Precompiling project...  20 dependencies successfully precompiled in 91 seconds. 216 already precompiled.

Pkg.precompile()(without the timing option)behaves the same in Julia 1.10.But now there is the optionto report the precompilation timefor individual packages:

julia> using Pkg; Pkg.precompile(timing = true)Precompiling project...  19850.9 ms   DataFrames   2858.4 ms   Flux  26206.5 ms   Plots  3 dependencies successfully precompiled in 49 seconds. 235 already precompiled.

Now it is easyto see what packagesprecompile faster than others!

Broadcasting Defined for CartesianIndex

Julia 1.10 now defines broadcastingfor CartesianIndex objects.

A CartesianIndex is a wayto represent an indexinto a multidimensional arrayand can be useful forworking with loops over arrays of arbitrary dimensionality.

Suppose we define the following:

julia> indices = [CartesianIndex(2, 3), CartesianIndex(4, 5)];julia> I = CartesianIndex(1, 1);

In Julia 1.9,attempting to broadcast over a CartesianIndex(for example, indices .+ I)resulted in the following error:

ERROR: iteration is deliberately unsupported for CartesianIndex.

With broadcasting defined,where previously we would have to wrapthe CartesianIndex in a Tuple(e.g., indices .+ (I,)),now the following works:

julia> indices .+ I2-element Vector{CartesianIndex{2}}: CartesianIndex(3, 4) CartesianIndex(5, 6)

Summary

In this post,we learned aboutsome of the new featuresand improvementsintroduced in Julia 1.10.Curious readers cancheck out the release notesfor the full list of changes.

What are you most excited aboutin Julia 1.10?Let us know in the comments below!

Additional Links