Julia 1.9.0 lives up to its promise

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/05/12/julia190.html

Introduction

Julia 1.9.0 has been released this week.
This release was much waited for as it brings many significant improvements.
You can find a summary of most important changes in the Julia 1.9 Highlights post.

Of all the additions the most user-visible change is probably caching of native code.
In simple words it means that if you use some package you can expect that the first
time some function from this package is run it should be executed faster than in previous
Julia releases.

The time of first execution was indeed a big pain point for many Julia users so I am
really excited by this functionality. However, to see the benefits of caching of native code,
packages you use must be designed in a way that takes this fact into account.

So the practical question is if given the current state of the package ecosystem in Julia
we indeed see these benefits. I decided to answer this question by running some
realistic data processing workflow on both Julia 1.8.5 and 1.9.0 to see the differences.

For the test I used code from the demo I prepared for ODSC-EUROPE-2021.
The reason is that it covers all standard operations like: reading and writing data to disk,
aggregation, group by, joining, sorting, and reshaping.

The tests require the following packages: CSV.jl v0.10.10,
Chain.jl v0.5.0, DataFrames.jl v1.5.0, HTTP.jl v1.9.1.

The test

When presenting the results I will show the code snippet and timing under Julia 1.8.5 and 1.9.0
below it (without showing the output to save space). All the tests were started under a fresh
Julia session.

First let us look at package load time:

@time begin
    using Chain
    using CSV
    using DataFrames
    using HTTP
    using Statistics
end

Timings:

Julia 1.8.5:
4.041740 seconds (8.09 M allocations: 536.512 MiB, 4.93% gc time, 33.25% compilation time: 87% of which was recompilation)

Julia 1.9.0:
1.721003 seconds (1.73 M allocations: 109.727 MiB, 4.92% gc time, 5.46% compilation time: 81% of which was recompilation)

As you can see the package load time is visibly improved. However, we see that a significant time is still
spent in recompilation, which means that it should be possible to improve things in the future with better design of
internals of the packages.

Next check the time to read and write a CSV file:

input = "https://raw.githubusercontent.com/Rdatatable\
         /data.table/master/vignettes/flights14.csv";
flights_bin = HTTP.get(input).body;
@time flights = CSV.read(flights_bin, DataFrame);
@time CSV.write("flights14.csv", flights);

Timings:

Julia 1.8.5:
  11.247200 seconds (3.19 M allocations: 202.963 MiB, 0.58% gc time, 98.33% compilation time)
  2.625601 seconds (15.55 M allocations: 469.223 MiB, 4.82% gc time, 53.75% compilation time)

Julia 1.9.0:
  1.023744 seconds (703.11 k allocations: 71.524 MiB, 2.36% gc time, 70.20% compilation time)
  1.147991 seconds (14.12 M allocations: 421.457 MiB, 6.22% gc time, 55.25% compilation time)

Here we see a really big gain. It is especially visible in CSV reading time.

The next test is dropping some rows from a data frame. I do it in four different ways:

@time flights[(flights.origin .== "EWR") .&& (flights.dest .== "PHL"), :];
@time filter(row -> row.origin == "EWR" && row.dest == "PHL", flights);
@time subset(flights, :origin => x -> x .== "EWR", :dest => x -> x .== "PHL");
@time subset(flights, :origin => ByRow(==("EWR")), :dest => ByRow(==("PHL")));

Timings:

Julia 1.8.5:
  0.842566 seconds (581.75 k allocations: 27.749 MiB, 99.43% compilation time)
  0.523396 seconds (2.07 M allocations: 69.233 MiB, 4.39% gc time, 85.76% compilation time)
  1.745602 seconds (1.78 M allocations: 90.815 MiB, 1.30% gc time, 99.59% compilation time: 2% of which was recompilation)
  0.565263 seconds (1.19 M allocations: 62.720 MiB, 3.93% gc time, 98.65% compilation time)

Julia 1.9.0:
  0.234892 seconds (255.22 k allocations: 17.176 MiB, 3.79% gc time, 98.27% compilation time)
  0.326977 seconds (1.47 M allocations: 46.091 MiB, 2.78% gc time, 83.76% compilation time)
  0.685728 seconds (560.88 k allocations: 37.067 MiB, 1.94% gc time, 99.13% compilation time)
  0.499620 seconds (534.75 k allocations: 37.376 MiB, 98.60% compilation time)

Again, in all cases we see a drop in time to first execution. You might ask why we still see a lot of compilation?
The major reason is that in the examples we define new functions or data structures that cause compilation.
For example x -> x .== "EWR" is an anonymous function that we have just created so it was impossible to precompile it.

Let us now perform a groupby and group selection by key-values:

@time flights_idx = groupby(flights, [:origin, :dest]);
@time flights_idx[("EWR", "PHL")];

Timings:

Julia 1.8.5:
  1.436151 seconds (2.02 M allocations: 104.135 MiB, 1.86% gc time, 99.47% compilation time)
  0.569703 seconds (458.67 k allocations: 26.118 MiB, 4.46% gc time, 99.60% compilation time)

Julia 1.9.0:
  1.216488 seconds (1.16 M allocations: 80.253 MiB, 3.13% gc time, 98.56% compilation time)
  0.281608 seconds (214.53 k allocations: 16.605 MiB, 4.83% gc time, 99.62% compilation time)

We still see the benefits. Yet, you might ask why we see so much compilation even under Julia 1.9.0.
The reason is that e.g. groupby by two columns is not very common, so DataFrames.jl decided to
leave it out from precompilation. For this reason when you run groupby(flights, [:origin, :dest])
native code for such a scenario is not cached. This is indeed a hard design decision for package
maintainers. You could add more and more precompilation statements to improve the coverage of
cached native code, but it also costs as it would impact: package installation time and
package load time.

Our next test is an aggregation operation. Again I chose it to be non-trivial and associated with
creation of an anonymous function:

julia> @time combine(flights_idx) do sdf
    max_air_time = maximum(sdf.air_time)
    return count(sdf.air_time .== max_air_time)
end;

Timings:

Julia 1.8.5:
  1.748435 seconds (1.40 M allocations: 76.052 MiB, 1.69% gc time, 99.53% compilation time)

Julia 1.9.0:
  0.953018 seconds (921.98 k allocations: 58.030 MiB, 1.55% gc time, 99.54% compilation time)

Again we see significant timing improvement.

It is time for a multi-step operation involving: groping, aggregation, and sorting:

@time @chain flights begin
    groupby(:month)
    combine(nrow, :dep_delay => mean)
    sort(:dep_delay_mean)
end;

Timings:

Julia 1.8.5:
  1.648610 seconds (778.81 k allocations: 43.905 MiB, 99.57% compilation time: 10% of which was recompilation)

Julia 1.9.0:
  0.560781 seconds (687.27 k allocations: 40.013 MiB, 2.39% gc time, 98.64% compilation time: 39% of which was recompilation)

Another big win. However, we see that we triggered recompilation, which means that we might try to improve internal design
of the package ecosystem here.

We are now ready for all-time favorite operation of all data scientists that is a join:

@time months = DataFrame(month=1:10,
                         month_name=["Jan", "Feb", "Mar", "Apr", "May",
                                     "Jun", "Jul", "Aug", "Sep", "Oct"]);
julia> @time leftjoin(flights, months, on=:month);

Timings:

Julia 1.8.5:
  0.033462 seconds (351 allocations: 16.797 KiB, 99.64% compilation time)
  4.870511 seconds (2.89 M allocations: 176.266 MiB, 1.21% gc time, 99.72% compilation time)

Julia 1.9.0:
  0.000090 seconds (26 allocations: 2.016 KiB)
  0.847289 seconds (343.69 k allocations: 48.378 MiB, 2.30% gc time, 94.35% compilation time)

We see another big win here. You might ask why we still see a lot of compilation in leftjoin
although there is no function passed to it? Now the reason is that various data frames can have
different column types. And again – we cannot precompile code against all possible column types
that user might use, only the most common ones are covered in cached native code.

The last benchmark is reshaping of a data frame, which is another commonly done operation.
In the example I use a non-trivial reshape that involves creating a pivot table:

julia> @time unstack(flights, :month, :carrier, :carrier, combine=length);

Timings:

Julia 1.8.5:
  2.059941 seconds (2.96 M allocations: 147.041 MiB, 98.75% compilation time)
Julia 1.9.0:
  1.081823 seconds (1.56 M allocations: 95.952 MiB, 1.92% gc time, 98.65% compilation time)

The last test also shows noticeable improvements, so we are indeed happy in all cases.

It is natural to ask what is the total time of running of our whole analysis. Here is the
code (leaving out package loading and data downloading). The test is run on a fresh
Julia session.

function test()
    flights = CSV.read(flights_bin, DataFrame)
    CSV.write("flights14.csv", flights)
    flights[(flights.origin .== "EWR") .&& (flights.dest .== "PHL"), :]
    filter(row -> row.origin == "EWR" && row.dest == "PHL", flights)
    subset(flights, :origin => x -> x .== "EWR", :dest => x -> x .== "PHL")
    subset(flights, :origin => ByRow(==("EWR")), :dest => ByRow(==("PHL")))
    flights_idx = groupby(flights, [:origin, :dest])
    flights_idx[("EWR", "PHL")]
    combine(flights_idx) do sdf
        max_air_time = maximum(sdf.air_time)
        return count(sdf.air_time .== max_air_time)
    end
    @chain flights begin
        groupby(:month)
        combine(nrow, :dep_delay => mean)
        sort(:dep_delay_mean)
    end
    months = DataFrame(month=1:10,
                       month_name=["Jan", "Feb", "Mar", "Apr", "May",
                                   "Jun", "Jul", "Aug", "Sep", "Oct"])
    leftjoin(flights, months, on=:month)
    unstack(flights, :month, :carrier, :carrier, combine=length)
end;
@time test();
@time test();

This time I run the code twice to show how much time is saved when we do not need
to compile things:

Julia 1.8.5:
 19.994426 seconds (27.73 M allocations: 1.083 GiB, 1.79% gc time, 93.94% compilation time: 1% of which was recompilation)
  1.016076 seconds (13.86 M allocations: 403.160 MiB, 5.66% gc time)
Julia 1.9.0:
  7.453665 seconds (20.80 M allocations: 856.780 MiB, 3.52% gc time, 86.36% compilation time: 3% of which was recompilation)
  0.958529 seconds (13.86 M allocations: 403.147 MiB, 6.47% gc time)

As you can see the first run is almost three times faster on Julia 1.9.0 in comparison to Julia 1.8.5.
However, the second run is comparable on both (as expected) and is significantly faster as it does not involve compilation.

Conclusions

There are three conclusions from our tests:

  • Indeed caching of native code significantly improves time of running functions defined in packages. Julia 1.9.0 indeed lives up to its promise.
  • Still, to see these benefits package maintainers need to appropriately prepare their distribution. From the tests we see that there is still room for improvement in this area.
  • Even with the best preparation of the packages you still will see run-time compilation.

The major reason why you still will see compilation are user defined functions and data structures (that are not known during package precompilation so native code handling them cannot be cached).
For some packages this is probably a minimal limitation. However, for packages such as DataFrames.jl this is a challenge, as most analysis involve custom non-standard data transformations
and potentially non-standard column types. This means that in DataFrames.jl we really need to think hard what to put into precompilation directives to ensure the best
user experience.

I hope you will enjoy using Julia 1.9.0!