Re-posted from: https://bkamins.github.io/julialang/2022/08/19/union.html
Introduction
Today my post is about subtypes of concrete types. It is mostly academic, but I
hope it will be useful for readers wanting to get a better understanding of
corner cases Julia’s type system.
The post was written under Julia 1.8.0 (with special thanks for all people
who contributed to this release!).
What is a concrete type in Julia?
In Julia a type is concrete it can have a direct instance, that is, some type
T
is concrete if there exists at least one value v
such that
typeof(v) === T
.
For every type you can check whether it is concrete using the
isconcretetype
function.
Today I want to discuss the following sentence from the section on
Types from the Julia Manual in relation to concrete types:
One particularly distinctive feature of Julia’s type system is that concrete
types may not subtype each other: all concrete types are final and may only
have abstract types as their supertypes.
From this sentence some readers conclude that concrete types cannot have
subtypes. However, it is not the case. Concrete types in Julia can have subtypes
as long as these subtypes are not concrete.
You might ask does this ever happen in practice? The answer is that it happens
and here are the examples when it does.
The first one is Union{}
type. This type is not concrete and has no values.
However, it is a subtype of all types, including concrete types, for example:
julia> Union{} <: Int
true
julia> Union{} <: Vector{Missing}
true
The other case is Type{T}
type, where T
is a DataType
(i.e. if T
has
type DataType
). All common concrete types are subtypes of DataType
, e.g.
integers or vectors. So types like Int
or Vector{Missing}
have DataType
type:
julia> typeof(Int)
DataType
julia> typeof(Vector{Missing})
DataType
which means that DataType
must be concrete, and indeed it is:
julia> isconcretetype(DataType)
true
Although DataType
is concrete, it has Type{Int}
and Type{Vector{Missing}}
as its subtypes (and these types must not, and are not, concrete as we
discussed above):
julia> Type{Int} <: DataType
true
julia> Type{Vector{Missing}} <: DataType
true
julia> isconcretetype(Type{Int})
false
julia> isconcretetype(Type{Vector{Int}})
false
Why these subtyping considerations matter?
The most important lesson learned here is that in your code you should
not assume that concrete type cannot have subtypes, as it can (although these
subtypes cannot be concrete themselves). This observation is mostly relevant
for package developers, who need to write generic code.
However, there are some practical situations when one can be affected by these
subtyping rules. The most common is when one is working with missing values.
Assume that I generate some random matrix containing either 1
or missing
:
julia> using Random
julia> Random.seed!(1234);
julia> mat = rand([1, missing], 10, 3)
10×3 Matrix{Union{Missing, Int64}}:
1 missing 1
missing missing 1
1 1 missing
missing 1 1
1 1 missing
1 missing 1
missing missing 1
missing missing missing
1 missing missing
missing missing missing
Now, I want to compute the sums of its rows, while skipping missing values.
Here is how you can do it:
julia> [sum(skipmissing(row)) for row in eachrow(mat)]
10-element Vector{Int64}:
2
1
2
2
2
2
1
0
1
0
However, a very similar codes that follow do not work:
julia> [sum(skipmissing(identity.(row))) for row in eachrow(mat)]
ERROR: ArgumentError: reducing with add_sum over an empty collection of element type Union{} is not allowed.
julia> [sum(skipmissing([x for x in row])) for row in eachrow(mat)]
ERROR: ArgumentError: reducing with add_sum over an empty collection of element type Union{} is not allowed.
What is the reason for the difference? In the skipmissing(row)
case row
is a view and retains information about element type of the whole array, which
is Union{Missing, Int64}
, so it is able to properly compute sum even in the
case when all values in a row are missing.
On the other hand both identity.(row)
and [x for x in row]
materialize the
row and perform type narrowing. This type narrowing means that in rows that
only contain missing
values the information about Int64
is lost and we get
an error. Let us see it step by step:
julia> row = last(eachrow(mat))
3-element view(::Matrix{Union{Missing, Int64}}, 10, :) with eltype Union{Missing, Int64}:
missing
missing
missing
julia> x = identity.(row)
3-element Vector{Missing}:
missing
missing
missing
julia> eltype(skipmissing(x))
Union{}
As you can see, since skipmissing
strips the Missing
part from the source
vector element type, we are left with Union{}
.
Unfortunately, such errors happen from time to time when one works with
data having missing values. For such cases in Julia many (but not all) common
reduction functions support the init
keyword, so you can do:
julia> [sum(skipmissing(identity.(row)), init=0) for row in eachrow(mat)]
10-element Vector{Int64}:
2
1
2
2
2
2
1
0
1
0
and all is good even if type inference produces Union{}
.
Conclusions
The post today was less practical than usual. However, I hope you will find it
useful when Julia tries to take you into a deep dark type system forest where
2+2=5, and the path leading out is only wide enough for one (Mikhail Tal).