IdSet in Julia 1.11

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/06/28/idset.html

Introduction

We are now in RC1 phase of Julia 1.11.
One small but important addition it is making IdSet a public type.
Today I want to discuss when this type is useful.

The code was tested under Julia 1.11 RC1.

Equality in Julia

There are three basic ways to test equality in Julia:

  1. the == operator;
  2. the isequal function;
  3. the === operator.

I have ordered these comparison operators by their level of strictness.

The == operator is most loose. It can return true, false or missing. The missing value is returned if any of the compared values are missing (or recursively contain missing value). For floating point numbers it assumes that 0.0 is equal to -0.0 and that NaN is not equal to NaN. Let us see the last case in action as it can be surprising if you have never seen this before:

julia> NaN == NaN
false

Next is isequal that is more strict. It guarantees to return true or false. It treats all floating-point NaN values as equal to each other, treats -0.0 as unequal to 0.0, and missing as equal to missing. It compares objects by their value, not by their identity. So, for example, two different vectors having the same contents are considered equal:

julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> v2 = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> isequal(v1, v2)
true

Finally we have ===, which is most strict. It returns true or false. However, true is returned if and only if the compared values are indistinguishable. They must have the same type. If their types are identical, mutable objects are compared by address in memory and immutable objects (such as numbers) are compared by contents at the bit level. Therefore the v1 and v2 vectors we created above are not equal when compared with ===:

julia> v1 === v2
false

You might ask about NaN. We saw that we talked about before. Here the situation is complicated. They can be equal or be not equal. Since NaN values are immutable === compares them on bit level. So we have:

julia> Float16(NaN) == Float32(NaN)
false

julia> isequal(Float16(NaN), Float32(NaN))
true

julia> Float16(NaN) === Float32(NaN)
false

julia> Float16(NaN) == Float16(NaN)
false

julia> isequal(Float16(NaN), Float16(NaN))
true

julia> Float16(NaN) === Float16(NaN)
true

Thus, you have to be careful. Each of the three comparison methods I discussed have their uses and it is well worth learning them.

Sets in Julia

Standard sets in Julia, created using the Set constructor use isequal to test for equality. Therefore we have:

julia> Set([v1, v2])
Set{Vector{Int64}} with 1 element:
  [1, 2, 3]

We see that v1 and v2 got de-duplicated because they are equal with respect to isequal since they have the same contents. This is often what the user wants.

However, sometimes we want to track actual objects (irrespective of their contents). This is especially important when working with mutable structures. In this case IdSet is useful:

julia> IdSet{Vector{Int}}([v1, v2])
IdSet{Vector{Int64}} with 2 elements:
  [1, 2, 3]
  [1, 2, 3]

Note that we needed to specify the type of the values stored in IdSet. As an exception the IdSet() is allowed (not requiring you to pass the stored object type specification) and in this case an empty IdSet{Any} is created.

Conclusions

Now you might ask when in practice IdSet is most useful. I needed it in my coding practice most often when I worked with nested mutable containers that potentially could contain circular references. In such case using IdSet allows you to easily keep track of the list of mutable objects already seen and avoid an infinite loop or stack overflow if you e.g. use recursion to work with such a deeply nested data structure.