Author Archives: Blog by Bogumił Kamiński

Julia 1.9.0 lives up to its promise

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/05/12/julia190.html

Introduction

Julia 1.9.0 has been released this week.
This release was much waited for as it brings many significant improvements.
You can find a summary of most important changes in the Julia 1.9 Highlights post.

Of all the additions the most user-visible change is probably caching of native code.
In simple words it means that if you use some package you can expect that the first
time some function from this package is run it should be executed faster than in previous
Julia releases.

The time of first execution was indeed a big pain point for many Julia users so I am
really excited by this functionality. However, to see the benefits of caching of native code,
packages you use must be designed in a way that takes this fact into account.

So the practical question is if given the current state of the package ecosystem in Julia
we indeed see these benefits. I decided to answer this question by running some
realistic data processing workflow on both Julia 1.8.5 and 1.9.0 to see the differences.

For the test I used code from the demo I prepared for ODSC-EUROPE-2021.
The reason is that it covers all standard operations like: reading and writing data to disk,
aggregation, group by, joining, sorting, and reshaping.

The tests require the following packages: CSV.jl v0.10.10,
Chain.jl v0.5.0, DataFrames.jl v1.5.0, HTTP.jl v1.9.1.

The test

When presenting the results I will show the code snippet and timing under Julia 1.8.5 and 1.9.0
below it (without showing the output to save space). All the tests were started under a fresh
Julia session.

First let us look at package load time:

@time begin
    using Chain
    using CSV
    using DataFrames
    using HTTP
    using Statistics
end

Timings:

Julia 1.8.5:
4.041740 seconds (8.09 M allocations: 536.512 MiB, 4.93% gc time, 33.25% compilation time: 87% of which was recompilation)

Julia 1.9.0:
1.721003 seconds (1.73 M allocations: 109.727 MiB, 4.92% gc time, 5.46% compilation time: 81% of which was recompilation)

As you can see the package load time is visibly improved. However, we see that a significant time is still
spent in recompilation, which means that it should be possible to improve things in the future with better design of
internals of the packages.

Next check the time to read and write a CSV file:

input = "https://raw.githubusercontent.com/Rdatatable\
         /data.table/master/vignettes/flights14.csv";
flights_bin = HTTP.get(input).body;
@time flights = CSV.read(flights_bin, DataFrame);
@time CSV.write("flights14.csv", flights);

Timings:

Julia 1.8.5:
  11.247200 seconds (3.19 M allocations: 202.963 MiB, 0.58% gc time, 98.33% compilation time)
  2.625601 seconds (15.55 M allocations: 469.223 MiB, 4.82% gc time, 53.75% compilation time)

Julia 1.9.0:
  1.023744 seconds (703.11 k allocations: 71.524 MiB, 2.36% gc time, 70.20% compilation time)
  1.147991 seconds (14.12 M allocations: 421.457 MiB, 6.22% gc time, 55.25% compilation time)

Here we see a really big gain. It is especially visible in CSV reading time.

The next test is dropping some rows from a data frame. I do it in four different ways:

@time flights[(flights.origin .== "EWR") .&& (flights.dest .== "PHL"), :];
@time filter(row -> row.origin == "EWR" && row.dest == "PHL", flights);
@time subset(flights, :origin => x -> x .== "EWR", :dest => x -> x .== "PHL");
@time subset(flights, :origin => ByRow(==("EWR")), :dest => ByRow(==("PHL")));

Timings:

Julia 1.8.5:
  0.842566 seconds (581.75 k allocations: 27.749 MiB, 99.43% compilation time)
  0.523396 seconds (2.07 M allocations: 69.233 MiB, 4.39% gc time, 85.76% compilation time)
  1.745602 seconds (1.78 M allocations: 90.815 MiB, 1.30% gc time, 99.59% compilation time: 2% of which was recompilation)
  0.565263 seconds (1.19 M allocations: 62.720 MiB, 3.93% gc time, 98.65% compilation time)

Julia 1.9.0:
  0.234892 seconds (255.22 k allocations: 17.176 MiB, 3.79% gc time, 98.27% compilation time)
  0.326977 seconds (1.47 M allocations: 46.091 MiB, 2.78% gc time, 83.76% compilation time)
  0.685728 seconds (560.88 k allocations: 37.067 MiB, 1.94% gc time, 99.13% compilation time)
  0.499620 seconds (534.75 k allocations: 37.376 MiB, 98.60% compilation time)

Again, in all cases we see a drop in time to first execution. You might ask why we still see a lot of compilation?
The major reason is that in the examples we define new functions or data structures that cause compilation.
For example x -> x .== "EWR" is an anonymous function that we have just created so it was impossible to precompile it.

Let us now perform a groupby and group selection by key-values:

@time flights_idx = groupby(flights, [:origin, :dest]);
@time flights_idx[("EWR", "PHL")];

Timings:

Julia 1.8.5:
  1.436151 seconds (2.02 M allocations: 104.135 MiB, 1.86% gc time, 99.47% compilation time)
  0.569703 seconds (458.67 k allocations: 26.118 MiB, 4.46% gc time, 99.60% compilation time)

Julia 1.9.0:
  1.216488 seconds (1.16 M allocations: 80.253 MiB, 3.13% gc time, 98.56% compilation time)
  0.281608 seconds (214.53 k allocations: 16.605 MiB, 4.83% gc time, 99.62% compilation time)

We still see the benefits. Yet, you might ask why we see so much compilation even under Julia 1.9.0.
The reason is that e.g. groupby by two columns is not very common, so DataFrames.jl decided to
leave it out from precompilation. For this reason when you run groupby(flights, [:origin, :dest])
native code for such a scenario is not cached. This is indeed a hard design decision for package
maintainers. You could add more and more precompilation statements to improve the coverage of
cached native code, but it also costs as it would impact: package installation time and
package load time.

Our next test is an aggregation operation. Again I chose it to be non-trivial and associated with
creation of an anonymous function:

julia> @time combine(flights_idx) do sdf
    max_air_time = maximum(sdf.air_time)
    return count(sdf.air_time .== max_air_time)
end;

Timings:

Julia 1.8.5:
  1.748435 seconds (1.40 M allocations: 76.052 MiB, 1.69% gc time, 99.53% compilation time)

Julia 1.9.0:
  0.953018 seconds (921.98 k allocations: 58.030 MiB, 1.55% gc time, 99.54% compilation time)

Again we see significant timing improvement.

It is time for a multi-step operation involving: groping, aggregation, and sorting:

@time @chain flights begin
    groupby(:month)
    combine(nrow, :dep_delay => mean)
    sort(:dep_delay_mean)
end;

Timings:

Julia 1.8.5:
  1.648610 seconds (778.81 k allocations: 43.905 MiB, 99.57% compilation time: 10% of which was recompilation)

Julia 1.9.0:
  0.560781 seconds (687.27 k allocations: 40.013 MiB, 2.39% gc time, 98.64% compilation time: 39% of which was recompilation)

Another big win. However, we see that we triggered recompilation, which means that we might try to improve internal design
of the package ecosystem here.

We are now ready for all-time favorite operation of all data scientists that is a join:

@time months = DataFrame(month=1:10,
                         month_name=["Jan", "Feb", "Mar", "Apr", "May",
                                     "Jun", "Jul", "Aug", "Sep", "Oct"]);
julia> @time leftjoin(flights, months, on=:month);

Timings:

Julia 1.8.5:
  0.033462 seconds (351 allocations: 16.797 KiB, 99.64% compilation time)
  4.870511 seconds (2.89 M allocations: 176.266 MiB, 1.21% gc time, 99.72% compilation time)

Julia 1.9.0:
  0.000090 seconds (26 allocations: 2.016 KiB)
  0.847289 seconds (343.69 k allocations: 48.378 MiB, 2.30% gc time, 94.35% compilation time)

We see another big win here. You might ask why we still see a lot of compilation in leftjoin
although there is no function passed to it? Now the reason is that various data frames can have
different column types. And again – we cannot precompile code against all possible column types
that user might use, only the most common ones are covered in cached native code.

The last benchmark is reshaping of a data frame, which is another commonly done operation.
In the example I use a non-trivial reshape that involves creating a pivot table:

julia> @time unstack(flights, :month, :carrier, :carrier, combine=length);

Timings:

Julia 1.8.5:
  2.059941 seconds (2.96 M allocations: 147.041 MiB, 98.75% compilation time)
Julia 1.9.0:
  1.081823 seconds (1.56 M allocations: 95.952 MiB, 1.92% gc time, 98.65% compilation time)

The last test also shows noticeable improvements, so we are indeed happy in all cases.

It is natural to ask what is the total time of running of our whole analysis. Here is the
code (leaving out package loading and data downloading). The test is run on a fresh
Julia session.

function test()
    flights = CSV.read(flights_bin, DataFrame)
    CSV.write("flights14.csv", flights)
    flights[(flights.origin .== "EWR") .&& (flights.dest .== "PHL"), :]
    filter(row -> row.origin == "EWR" && row.dest == "PHL", flights)
    subset(flights, :origin => x -> x .== "EWR", :dest => x -> x .== "PHL")
    subset(flights, :origin => ByRow(==("EWR")), :dest => ByRow(==("PHL")))
    flights_idx = groupby(flights, [:origin, :dest])
    flights_idx[("EWR", "PHL")]
    combine(flights_idx) do sdf
        max_air_time = maximum(sdf.air_time)
        return count(sdf.air_time .== max_air_time)
    end
    @chain flights begin
        groupby(:month)
        combine(nrow, :dep_delay => mean)
        sort(:dep_delay_mean)
    end
    months = DataFrame(month=1:10,
                       month_name=["Jan", "Feb", "Mar", "Apr", "May",
                                   "Jun", "Jul", "Aug", "Sep", "Oct"])
    leftjoin(flights, months, on=:month)
    unstack(flights, :month, :carrier, :carrier, combine=length)
end;
@time test();
@time test();

This time I run the code twice to show how much time is saved when we do not need
to compile things:

Julia 1.8.5:
 19.994426 seconds (27.73 M allocations: 1.083 GiB, 1.79% gc time, 93.94% compilation time: 1% of which was recompilation)
  1.016076 seconds (13.86 M allocations: 403.160 MiB, 5.66% gc time)
Julia 1.9.0:
  7.453665 seconds (20.80 M allocations: 856.780 MiB, 3.52% gc time, 86.36% compilation time: 3% of which was recompilation)
  0.958529 seconds (13.86 M allocations: 403.147 MiB, 6.47% gc time)

As you can see the first run is almost three times faster on Julia 1.9.0 in comparison to Julia 1.8.5.
However, the second run is comparable on both (as expected) and is significantly faster as it does not involve compilation.

Conclusions

There are three conclusions from our tests:

  • Indeed caching of native code significantly improves time of running functions defined in packages. Julia 1.9.0 indeed lives up to its promise.
  • Still, to see these benefits package maintainers need to appropriately prepare their distribution. From the tests we see that there is still room for improvement in this area.
  • Even with the best preparation of the packages you still will see run-time compilation.

The major reason why you still will see compilation are user defined functions and data structures (that are not known during package precompilation so native code handling them cannot be cached).
For some packages this is probably a minimal limitation. However, for packages such as DataFrames.jl this is a challenge, as most analysis involve custom non-standard data transformations
and potentially non-standard column types. This means that in DataFrames.jl we really need to think hard what to put into precompilation directives to ensure the best
user experience.

I hope you will enjoy using Julia 1.9.0!

Hungarian meeting with Euler for the third anniversary

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/05/05/hungarian.html

Introduction

I have been writing this blog for three years now, so I was
thinking what to post about to celebrate this.

Recently I have learned about the ProjectEuler.jl package.
I like it very much. It gives access to problems presented in
the Project Euler website in Julia REPL.
Additionally, when reading the documentation of the package it
mentioned a problem that I have not seen before. Therefore
I thought to solve it in this post.

This post was written under Julia 1.9.0-rc2, HiGHS v1.5.1,
Hungarian v0.7.0, JuMP v1.10.0, ProjectEuler v0.1.1.

The puzzle

Let us use ProjectEuler.jl to get the description of the
problem we want to solve:

julia> import ProjectEuler

julia> ProjectEuler.question(345)

│             Source: The following problem is taken from Project Euler
│      Problem Title: Problem 345: Matrix Sum
│       Published On: Saturday, 3rd September 2011, 04:00 pm
│          Solved By: 5813
│  Difficulty Rating: 15%

Problem
≡≡≡≡≡≡≡≡≡≡
We define the Matrix Sum of a matrix as the maximum possible sum of matrix
elements such that none of the selected elements share the same row or column.

For example, the Matrix Sum of the matrix below equals
3315 ( = 863 + 383 + 343 + 959 + 767):

                                                   7  53 183 439 863
                                                 497 383 563  79 973
                                                 287  63 343 169 583
                                                 627 343 773 959 943
                                                 767 473 103 699 303

Find the Matrix Sum of:

                         7  53 183 439 863 497 383 563  79 973 287  63 343 169 583
                       627 343 773 959 943 767 473 103 699 303 957 703 583 639 913
                       447 283 463  29  23 487 463 993 119 883 327 493 423 159 743
                       217 623   3 399 853 407 103 983  89 463 290 516 212 462 350
                       960 376 682 962 300 780 486 502 912 800 250 346 172 812 350
                       870 456 192 162 593 473 915  45 989 873 823 965 425 329 803
                       973 965 905 919 133 673 665 235 509 613 673 815 165 992 326
                       322 148 972 962 286 255 941 541 265 323 925 281 601  95 973
                       445 721  11 525 473  65 511 164 138 672  18 428 154 448 848
                       414 456 310 312 798 104 566 520 302 248 694 976 430 392 198
                       184 829 373 181 631 101 969 613 840 740 778 458 284 760 390
                       821 461 843 513  17 901 711 993 293 157 274  94 192 156 574
                        34 124   4 878 450 476 712 914 838 669 875 299 823 329 699
                       815 559 813 459 522 788 168 586 966 232 308 833 251 631 107
                       813 883 451 509 615  77 281 613 459 205 380 274 302  35 805

Manual solution

To start let us define the matrix that gives specification of the problem
and bind it to the M variable:

M = [  7  53 183 439 863 497 383 563  79 973 287  63 343 169 583
     627 343 773 959 943 767 473 103 699 303 957 703 583 639 913
     447 283 463  29  23 487 463 993 119 883 327 493 423 159 743
     217 623   3 399 853 407 103 983  89 463 290 516 212 462 350
     960 376 682 962 300 780 486 502 912 800 250 346 172 812 350
     870 456 192 162 593 473 915  45 989 873 823 965 425 329 803
     973 965 905 919 133 673 665 235 509 613 673 815 165 992 326
     322 148 972 962 286 255 941 541 265 323 925 281 601  95 973
     445 721  11 525 473  65 511 164 138 672  18 428 154 448 848
     414 456 310 312 798 104 566 520 302 248 694 976 430 392 198
     184 829 373 181 631 101 969 613 840 740 778 458 284 760 390
     821 461 843 513  17 901 711 993 293 157 274  94 192 156 574
      34 124   4 878 450 476 712 914 838 669 875 299 823 329 699
     815 559 813 459 522 788 168 586 966 232 308 833 251 631 107
     813 883 451 509 615  77 281 613 459 205 380 274 302  35 805]

Note that it is easy to do in Julia REPL, by copy-pasting the text
from the problem specification and just wrapping it with M = [ and ].

To solve the problem manually let us make the following observations:

  • Since every column has to be picked exactly once subtracting
    the same value from each element of some column does not affect the
    solution (the same holds for rows).
  • If in every row maximal element is in a different column then we can
    just pick these maximal elements in each row and these entries
    are the solution to our problem.

Using these two facts we will try to solve our problem. First let us
check if in our initial matrix M each row has a unique column where
it has a maximum value:

julia> findall(==(0), M .- maximum(M, dims=2))
15-element Vector{CartesianIndex{2}}:
 CartesianIndex(15, 2)
 CartesianIndex(2, 4)
 CartesianIndex(5, 4)
 CartesianIndex(11, 7)
 CartesianIndex(3, 8)
 CartesianIndex(4, 8)
 CartesianIndex(12, 8)
 CartesianIndex(13, 8)
 CartesianIndex(6, 9)
 CartesianIndex(14, 9)
 CartesianIndex(1, 10)
 CartesianIndex(10, 12)
 CartesianIndex(7, 14)
 CartesianIndex(8, 15)
 CartesianIndex(9, 15)

Unfortunately, this is not the case. We see that e.g. rows 2 and 5 have
maximum in column 4. Therefore we cannot trivially solve our problem.

However, let us try subtracting some values from columns of our
matrix M hoping that we will get the desired uniqueness.

The values we subtract from each column are:

julia> sub = [55 0 23 56 40 0 101 171 175 62 53 151 0 0 26]
1×15 Matrix{Int64}:
 55  0  23  56  40  0  101  171  175  62  53  151  0  0  26

Let us check them:

julia> M2 = M .- sub
15×15 Matrix{Int64}:
 -48   53  160  383  823  497  282   392  -96  911  234  -88  343  169  557
 572  343  750  903  903  767  372   -68  524  241  904  552  583  639  887
 392  283  440  -27  -17  487  362   822  -56  821  274  342  423  159  717
 162  623  -20  343  813  407    2   812  -86  401  237  365  212  462  324
 905  376  659  906  260  780  385   331  737  738  197  195  172  812  324
 815  456  169  106  553  473  814  -126  814  811  770  814  425  329  777
 918  965  882  863   93  673  564    64  334  551  620  664  165  992  300
 267  148  949  906  246  255  840   370   90  261  872  130  601   95  947
 390  721  -12  469  433   65  410    -7  -37  610  -35  277  154  448  822
 359  456  287  256  758  104  465   349  127  186  641  825  430  392  172
 129  829  350  125  591  101  868   442  665  678  725  307  284  760  364
 766  461  820  457  -23  901  610   822  118   95  221  -57  192  156  548
 -21  124  -19  822  410  476  611   743  663  607  822  148  823  329  673
 760  559  790  403  482  788   67   415  791  170  255  682  251  631   81
 758  883  428  453  575   77  180   442  284  143  327  123  302   35  779

julia> sol = findall(==(0), M2 .- maximum(M2, dims=2))
15-element Vector{CartesianIndex{2}}:
 CartesianIndex(6, 1)
 CartesianIndex(15, 2)
 CartesianIndex(8, 3)
 CartesianIndex(5, 4)
 CartesianIndex(4, 5)
 CartesianIndex(12, 6)
 CartesianIndex(11, 7)
 CartesianIndex(3, 8)
 CartesianIndex(14, 9)
 CartesianIndex(1, 10)
 CartesianIndex(2, 11)
 CartesianIndex(10, 12)
 CartesianIndex(13, 13)
 CartesianIndex(7, 14)
 CartesianIndex(9, 15)

Now we see that we have exactly one maximum value in each row and each of these values
is in a different column. Thus the solution to the problem can be calculated as
(I do not show the solution to encourage you to try solving the problem yourself):

sum(M[sol])

Now you might ask how one could get the sub vector?
You could find it by trial and error, or use a more systematic way.
Interestingly the problem we solve today is an important question
in operations research, and a specialized algorithm was developed to solve it.

Hungarian solution

The algorithm that can be used to solve this class of problems is called
Hungarian algorithm. It is implemented in Julia in the Hungarian.jl
package. I encourage you to study it, however, let me just mention that it uses
a refined version of the two observations we have made when developing the manual
solution.

The package is easy to use. You just need to remember that by default it minimizes
the sum, so we need to use the -M matrix. Here is how you can get
the solution (I show you the indices, but drop displaying the value of the solution):

julia> using Hungarian

julia> hungarian(-M)[1]
15-element Vector{Int64}:
 10
 11
  8
  5
  4
  1
 14
  3
 15
 12
  7
  6
 13
  9
  2

You might ask how we could check if our manual solution and the solution obtained
using the package match. You can do it as follows:

julia> getindex.(sol, 1) == hungarian(-M')[1]
true

All matches as expected.

Note that for the check I used the hungarian function with the transposition
of the M matrix as our cartesian indices are sorted by column number
(the reason is that Julia uses column major storage order) and by default
hungarian returns column indices sorted by row number.

Solver solution

What if we did not have the Hungarian.jl package?
In this case the problem can be solved using mixed integer programming.
I have decided to use the JuMP.jl and HiGHS.jl packages to get the answer
(as usual – the value of the solution is not shown):

using JuMP
import HiGHS
model = Model(HiGHS.Optimizer)
@variable(model, x[1:15, 1:15], Bin)
for i in 1:15
    @constraint(model, sum(x[i, j] for j in 1:15) == 1)
    @constraint(model, sum(x[j, i] for j in 1:15) == 1)
end
@objective(model, Max, sum(x[i, j] * M[i, j] for i in 1:15, j in 1:15))
optimize!(model)
value.(x)

I really enjoy using the JuMP.jl API for solving optimization problems.

As above let us check if the solution matches the solution we obtained manually:

julia> findall(>=(0.5), value.(x)) == sol
true

Note that I use 0.5 to separate 0 from 1 solutions as this is a
safe boundary value even if there were some numerical inaccuracies in the
returned solution.

Conclusions

I really enjoy solving Project Euler puzzles using Julia.
The syntax and package ecosystem that I have at hand make it
quite convenient. The resulting codes are usually short and easy to read.

If you liked the problem let me give you a challenge. Notice that
the sum of values we subtracted in the manual solution was:

julia> sum(sub)
913

The challenge for you is to find such non-negative entries of sub
that uniquely solve our problem (in the manual approach) and
minimize the sum of entries of sub. I hope you will enjoy solving
this extra puzzle!

Ref ref.

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/04/28/ref.html

Introduction

Today I want to discuss the Ref type defined in Base Julia.
The reason is that it is often used in practice, but it is not immediately
obvious what the design behind Ref is.

I will focus on an entry-level introduction to the topic and leave out
more advanced issues that are typically not needed when working with
Julia.

This post is written under Julia 1.9.0-rc2.

Where can you see Ref?

As a normal Julia user there are two cases, where you might encounter Ref:
broadcasting and allowing mutation.

Broadcasting

The first case is in broadcasting when you want to store some object in a
0-dimensional container that protects its contents from being broadcasted over.
Here is a typical example:

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> y = [2, 3, 4]
3-element Vector{Int64}:
 2
 3
 4

julia> Ref(x) .* y
3-element Vector{Vector{Int64}}:
 [2, 4, 6]
 [3, 6, 9]
 [4, 8, 12]

Note that I wrap x in Ref to ensure that the whole x vector is multiplied
by elements of y. If I omitted Ref I would get an elementwise product of
x and y:

julia> x .* y
3-element Vector{Int64}:
  2
  6
 12

The reason why Ref is used in such cases is that Ref makes a minimal impact on
the type of the result of the broadcasted operation. Consider this example:

julia> z = (2, 3, 4)
(2, 3, 4)

julia> [x] .* z
3-element Vector{Vector{Int64}}:
 [2, 4, 6]
 [3, 6, 9]
 [4, 8, 12]

julia> Ref(x) .* z
([2, 4, 6], [3, 6, 9], [4, 8, 12])

Here I protected x when multiplying it by elements of the tuple z.
I could protect x by wrapping it with a vector, but, as you can see
then the result of the operation would be vector of vectors. While
wrapping x in Ref produces a tuple of vectors as a result.
As you can see, using Ref made broadcasting mechanism use the type of
the other container to determine the output type, which is typically desirable.

Allowing mutation

The other use of Ref is when we have an immutable type that we want to be able
to mutate 😄. This might sound strange, but sometimes indeed it is useful.

Let me give you a simple example:

julia> struct X
           value::Int
           callcount::Base.RefValue{Int}

           X(x) = new(Int(x), Ref(0))
       end

julia> f(x::X) = (x.callcount[] += 1; x)
f (generic function with 1 method)

julia> x = X(10)
X(10, Base.RefValue{Int64}(0))

julia> f(x)
X(10, Base.RefValue{Int64}(1))

julia> f(x)
X(10, Base.RefValue{Int64}(2))

julia> f(x)
X(10, Base.RefValue{Int64}(3))

Here I defined the X type that stores a value, which I want to be immutable,
and an extra field callcount that counts how many times the function f was
called on this object. Since Int is immutable, I needed to wrap it with Ref
to achieve the mutability of this field.

As a side note this is not the only way to get this kind of effect. For example,
I could define a mutable struct with const field value. Still in some
cases Ref is a useful because it is mutable. Note that I accessed and updated
the value stored in Ref using empty indexing x.callcount[] (i.e. brackets
with no value inside them).

So what is hard about Ref?

In the last example I said I am talking about Ref, but in the definition of
the X type I used callcount::Base.RefValue{Int} instead. This is the tricky
part. Ref is a parametric abstract type. This means that no object can have
Ref type. Ref is a non-leaf node in the type tree in Julia. Let us check its
subtypes:

julia> subtypes(Ref)
6-element Vector{Any}:
 Base.CFunction
 Base.RefArray
 Base.RefValue
 Core.Compiler.RefValue
 Core.LLVMPtr
 Ptr

As you can see there are six types that are subtypes of Ref. And here comes
why I have said that I want our post today to be entry-level. I will only talk
about RefValue and RefArray. I leave out other options as they are
rarely needed (unless you are doing low-level stuff in Julia, but then probably
you do not need to read this post 😄).

The tricky thing is that when we write Ref(1) we do not get an object
whose type is Ref, but instead a RefValue (that is a subtype of Ref):

julia> v1 = Ref(1)
Base.RefValue{Int64}(1)

julia> v1[]
1

Similarly we can have a reference to an element of an array. In this case
we pass an array as a first argument to Ref and an index as a second one:

julia> v2 = Ref([2, 3, 4], 2)
Base.RefArray{Int64, Vector{Int64}, Nothing}([2, 3, 4], 2, nothing)

julia> v2[]
3

You can think of Ref as a convenient way to handle both cases (wrapping a value
and wrapping an element of an array) in a single syntax.

There is one difference between RefValue and RefArray though. RefValue
indeed guarantees mutability of the container as we have seen in the example
above with the X struct. Trying to mutate RefArray will try to mutate
the underlying array. Therefore the following code fails:

julia> v3 = Ref(2:4, 2)
Base.RefArray{Int64, UnitRange{Int64}, Nothing}(2:4, 2, nothing)

julia> v3[] = 10
ERROR: CanonicalIndexError: setindex! not defined for UnitRange{Int64}

While this code works:

julia> a = [2, 3, 4]
3-element Vector{Int64}:
 2
 3
 4

julia> v4 = Ref(a, 2)
Base.RefArray{Int64, Vector{Int64}, Nothing}([2, 3, 4], 2, nothing)

julia> v4[]
3

julia> v4[] = 100
100

julia> v4
Base.RefArray{Int64, Vector{Int64}, Nothing}([2, 100, 4], 2, nothing)

julia> a
3-element Vector{Int64}:
   2
 100
   4

and we can see that the a array was changed.

Conclusions

The major things to remember about Ref are:

  • Its main uses are in broadcasting and when we need a lightweight mutable container.
  • Ref is abstract, when you write Ref(x) you do not get a Ref instance. Instead
    you will get a RefValue (which is mutable).
  • There are other subtypes of Ref than just RefValue. You will rarely need them.
    Of the other options the one you might want to use most often is RefArray, which
    creates a reference to a single element of the underlying array.

I hope you found this post a useful ref. for Ref.