Author Archives: Blog by Bogumił Kamiński

The main thing in Julia 1.11

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/07/05/main.html

Introduction

This is my last blog post with the previews of an upcoming Julia 1.11 release.
The functionality I want to cover today is an option of defining an entry point to the Julia script.

The code was tested under Julia 1.11 RC1.

A traditional Julia script

Traditionally when writing a Julia script you assumed that when you run a julia some_script.jl command.
In this case Julia sequentially executes the contents of the some_script.jl file and terminates.

When I was writing Julia code that was meant to be executed in this way my typical approach was to always encapsulate all executed code in functions.
In this way we can avoid many problems that are introduced by writing code that is executed in global scope, including some of the common issues:

  • scope of variables (no need to think about the global keyword);
  • performance (code inside functions is compiled, thus fast);
  • an accidental use of the same name for different objects in global scope spaghetti code (I think everyone has been bit by this issue);
  • pollution of RAM memory (large objects that have bindings in global scope are kept alive and it is easy to forget to unbind them to alow garbage collection).

Therefore a typical structure of my code was:

...
some definitions of data structures and code inside functions
...

function main(ARGS)
    ...
    the operations I want to have executed by the script
    ...
end

main(ARGS)

This is a style that is natural for programmers used to such languages as e.g. C, where the main function is an entry point.

Script under Julia 1.11

Julia 1.11 adds an option to mark the main function as an entry point. It makes sure that main(ARGS) gets called after execution of the script.

It is quite easy to mark the main function as an entry point. It is enough to just replace main(ARGS) with (@main)(ARGS) in my example above.
Thus, starting from Julia 1.11 I can write my scripts as:

...
some definitions of data structures and code inside functions
...

function (@main)(ARGS)
    ...
    the operations I want to have executed by the script
    ...
end

This seemingly small change is in my opinion significant as it standardizes the way Julia scripts are written.
And such standardization is a good feature improving code readability and maintainability.
Additionally, this feature helps in unification of interactive and compiled workflows of using Julia.

Let me show a minimal working example of writing a script using the @main macro:

$ julia -e "using InteractiveUtils; (@main)(args) = versioninfo()"
Julia Version 1.11.0-rc1
Commit 3a35aec36d (2024-06-25 10:23 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × 12th Gen Intel(R) Core(TM) i7-1250U
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
$

In this example we invoke the versioninfo function inside the main(args) function defined using the @main macro.
Note that we did not have to explicitly call the main function in the code. It was invoked automatically because it has
been created using the @main macro.

Conclusions

Now I hope you know what @main macro does and how to use it in Julia 1.11. Enjoy scripting with Julia!

IdSet in Julia 1.11

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/06/28/idset.html

Introduction

We are now in RC1 phase of Julia 1.11.
One small but important addition it is making IdSet a public type.
Today I want to discuss when this type is useful.

The code was tested under Julia 1.11 RC1.

Equality in Julia

There are three basic ways to test equality in Julia:

  1. the == operator;
  2. the isequal function;
  3. the === operator.

I have ordered these comparison operators by their level of strictness.

The == operator is most loose. It can return true, false or missing. The missing value is returned if any of the compared values are missing (or recursively contain missing value). For floating point numbers it assumes that 0.0 is equal to -0.0 and that NaN is not equal to NaN. Let us see the last case in action as it can be surprising if you have never seen this before:

julia> NaN == NaN
false

Next is isequal that is more strict. It guarantees to return true or false. It treats all floating-point NaN values as equal to each other, treats -0.0 as unequal to 0.0, and missing as equal to missing. It compares objects by their value, not by their identity. So, for example, two different vectors having the same contents are considered equal:

julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> v2 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> isequal(v1, v2)
true

Finally we have ===, which is most strict. It returns true or false. However, true is returned if and only if the compared values are indistinguishable. They must have the same type. If their types are identical, mutable objects are compared by address in memory and immutable objects (such as numbers) are compared by contents at the bit level. Therefore the v1 and v2 vectors we created above are not equal when compared with ===:

julia> v1 === v2
false

You might ask about NaN. We saw that we talked about before. Here the situation is complicated. They can be equal or be not equal. Since NaN values are immutable === compares them on bit level. So we have:

julia> Float16(NaN) == Float32(NaN)
false

julia> isequal(Float16(NaN), Float32(NaN))
true

julia> Float16(NaN) === Float32(NaN)
false

julia> Float16(NaN) == Float16(NaN)
false

julia> isequal(Float16(NaN), Float16(NaN))
true

julia> Float16(NaN) === Float16(NaN)
true

Thus, you have to be careful. Each of the three comparison methods I discussed have their uses and it is well worth learning them.

Sets in Julia

Standard sets in Julia, created using the Set constructor use isequal to test for equality. Therefore we have:

julia> Set([v1, v2])
Set{Vector{Int64}} with 1 element:
  [1, 2, 3]

We see that v1 and v2 got de-duplicated because they are equal with respect to isequal since they have the same contents. This is often what the user wants.

However, sometimes we want to track actual objects (irrespective of their contents). This is especially important when working with mutable structures. In this case IdSet is useful:

julia> IdSet{Vector{Int}}([v1, v2])
IdSet{Vector{Int64}} with 2 elements:
  [1, 2, 3]
  [1, 2, 3]

Note that we needed to specify the type of the values stored in IdSet. As an exception the IdSet() is allowed (not requiring you to pass the stored object type specification) and in this case an empty IdSet{Any} is created.

Conclusions

Now you might ask when in practice IdSet is most useful. I needed it in my coding practice most often when I worked with nested mutable containers that potentially could contain circular references. In such case using IdSet allows you to easily keep track of the list of mutable objects already seen and avoid an infinite loop or stack overflow if you e.g. use recursion to work with such a deeply nested data structure.

Testing push! on Julia 1.11

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/06/21/push.html

Introduction

Currently Julia 1.11 is being in its beta testing phase.
One of the changes it introduces is redesign of internal representation of arrays.
This redesign, from the user perspective, promises to speed up certain operations.
One of the common ones that I use often is push!. Therefore today I decided to benchmark it.

The tests were performed under Julia 1.11.0-beta2 and Julia 1.10.1. The benchmarks use BenchmarkTools.jl 1.5.0.

The test

Here is the function we are going to use for our tests:

using BenchmarkTools

function test(n)
    x = Int[]
    for i in 1:n
        push!(x, i)
    end
    return x
end

This is the most basic test of the performance of push! operation.
I want to check the performance for various numbers of push! operations.

Let us run the tests first under Julia 1.11.0-beta2:

julia> @benchmark test(100)
BenchmarkTools.Trial: 10000 samples with 849 evaluations.
 Range (min … max):  129.800 ns …   1.682 μs  ┊ GC (min … max):  0.00% … 85.46%
 Time  (median):     194.582 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   232.033 ns ± 125.847 ns  ┊ GC (mean ± σ):  15.55% ± 19.08%

 Memory estimate: 1.94 KiB, allocs estimate: 4.

julia> @benchmark test(10_000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  14.400 μs …   6.167 ms  ┊ GC (min … max):  0.00% … 97.67%
 Time  (median):     30.300 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   51.585 μs ± 148.402 μs  ┊ GC (mean ± σ):  21.45% ± 10.84%

 Memory estimate: 326.41 KiB, allocs estimate: 14.

julia> @benchmark test(1_000_000)
BenchmarkTools.Trial: 808 samples with 1 evaluation.
 Range (min … max):  2.993 ms … 90.221 ms  ┊ GC (min … max):  0.00% … 95.38%
 Time  (median):     4.674 ms              ┊ GC (median):    19.53%
 Time  (mean ± σ):   6.176 ms ±  8.774 ms  ┊ GC (mean ± σ):  35.69% ± 20.33%

 Memory estimate: 17.41 MiB, allocs estimate: 24.

julia> @benchmark test(100_000_000)
BenchmarkTools.Trial: 6 samples with 1 evaluation.
 Range (min … max):  808.177 ms …   1.020 s  ┊ GC (min … max):  9.38% … 26.13%
 Time  (median):     959.266 ms              ┊ GC (median):    25.01%
 Time  (mean ± σ):   944.380 ms ± 81.448 ms  ┊ GC (mean ± σ):  22.25% ±  6.41%

 Memory estimate: 2.95 GiB, allocs estimate: 42.

Now the same tests under Julia 1.10.1:

julia> @benchmark test(100)
BenchmarkTools.Trial: 10000 samples with 199 evaluations.
 Range (min … max):  359.296 ns …  10.699 μs  ┊ GC (min … max): 0.00% … 82.66%
 Time  (median):     923.116 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   959.401 ns ± 347.833 ns  ┊ GC (mean ± σ):  2.21% ±  6.25%

 Memory estimate: 1.92 KiB, allocs estimate: 4.

julia> @benchmark test(10_000)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   47.700 μs …   4.938 ms  ┊ GC (min … max): 0.00% … 94.66%
 Time  (median):     103.200 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   133.997 μs ± 158.472 μs  ┊ GC (mean ± σ):  7.43% ±  6.81%

 Memory estimate: 326.55 KiB, allocs estimate: 9.

julia> @benchmark test(1_000_000)
BenchmarkTools.Trial: 504 samples with 1 evaluation.
 Range (min … max):  6.729 ms … 88.534 ms  ┊ GC (min … max): 0.00% … 91.83%
 Time  (median):     8.773 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.924 ms ±  5.161 ms  ┊ GC (mean ± σ):  7.30% ±  9.24%

 Memory estimate: 9.78 MiB, allocs estimate: 14.

julia> @benchmark test(100_000_000)
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.184 s …   1.394 s  ┊ GC (min … max): 8.36% … 6.56%
 Time  (median):     1.275 s              ┊ GC (median):    7.46%
 Time  (mean ± σ):   1.282 s ± 86.217 ms  ┊ GC (mean ± σ):  6.89% ± 5.29%

 Memory estimate: 1019.60 MiB, allocs estimate: 23.

Conclusions

From the tests we can see that:

  • The new implementation in Julia 1.11 is faster for various values of n. This is very nice.
  • The new implementation in Julia 1.11 does more allocations, has higher memory estimate, and, in consequence spends more time in garbage collection. This means that in cases when available RAM is scarce the code performance could be affected.