By: Blog by Bogumił Kamiński
Re-posted from: https://bkamins.github.io/julialang/2024/06/28/idset.html
Introduction
We are now in RC1 phase of Julia 1.11.
One small but important addition it is making IdSet
a public type.
Today I want to discuss when this type is useful.
The code was tested under Julia 1.11 RC1.
Equality in Julia
There are three basic ways to test equality in Julia:
- the
==
operator;
- the
isequal
function;
- the
===
operator.
I have ordered these comparison operators by their level of strictness.
The ==
operator is most loose. It can return true
, false
or missing
. The missing
value is returned if any of the compared values are missing (or recursively contain missing
value). For floating point numbers it assumes that 0.0
is equal to -0.0
and that NaN
is not equal to NaN
. Let us see the last case in action as it can be surprising if you have never seen this before:
Next is isequal
that is more strict. It guarantees to return true
or false
. It treats all floating-point NaN
values as equal to each other, treats -0.0
as unequal to 0.0
, and missing
as equal to missing
. It compares objects by their value, not by their identity. So, for example, two different vectors having the same contents are considered equal:
julia> v1 = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> v2 = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> isequal(v1, v2)
true
Finally we have ===
, which is most strict. It returns true
or false
. However, true
is returned if and only if the compared values are indistinguishable. They must have the same type. If their types are identical, mutable objects are compared by address in memory and immutable objects (such as numbers) are compared by contents at the bit level. Therefore the v1
and v2
vectors we created above are not equal when compared with ===
:
You might ask about NaN
. We saw that we talked about before. Here the situation is complicated. They can be equal or be not equal. Since NaN
values are immutable ===
compares them on bit level. So we have:
julia> Float16(NaN) == Float32(NaN)
false
julia> isequal(Float16(NaN), Float32(NaN))
true
julia> Float16(NaN) === Float32(NaN)
false
julia> Float16(NaN) == Float16(NaN)
false
julia> isequal(Float16(NaN), Float16(NaN))
true
julia> Float16(NaN) === Float16(NaN)
true
Thus, you have to be careful. Each of the three comparison methods I discussed have their uses and it is well worth learning them.
Sets in Julia
Standard sets in Julia, created using the Set
constructor use isequal
to test for equality. Therefore we have:
julia> Set([v1, v2])
Set{Vector{Int64}} with 1 element:
[1, 2, 3]
We see that v1
and v2
got de-duplicated because they are equal with respect to isequal
since they have the same contents. This is often what the user wants.
However, sometimes we want to track actual objects (irrespective of their contents). This is especially important when working with mutable structures. In this case IdSet
is useful:
julia> IdSet{Vector{Int}}([v1, v2])
IdSet{Vector{Int64}} with 2 elements:
[1, 2, 3]
[1, 2, 3]
Note that we needed to specify the type of the values stored in IdSet
. As an exception the IdSet()
is allowed (not requiring you to pass the stored object type specification) and in this case an empty IdSet{Any}
is created.
Conclusions
Now you might ask when in practice IdSet
is most useful. I needed it in my coding practice most often when I worked with nested mutable containers that potentially could contain circular references. In such case using IdSet
allows you to easily keep track of the list of mutable objects already seen and avoid an infinite loop or stack overflow if you e.g. use recursion to work with such a deeply nested data structure.