Subtypes of concrete types in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/08/19/union.html

Introduction

Today my post is about subtypes of concrete types. It is mostly academic, but I
hope it will be useful for readers wanting to get a better understanding of
corner cases Julia’s type system.

The post was written under Julia 1.8.0 (with special thanks for all people
who contributed to this release!).

What is a concrete type in Julia?

In Julia a type is concrete it can have a direct instance, that is, some type
T is concrete if there exists at least one value v such that
typeof(v) === T.
For every type you can check whether it is concrete using the
isconcretetype function.

Today I want to discuss the following sentence from the section on
Types from the Julia Manual in relation to concrete types:

One particularly distinctive feature of Julia’s type system is that concrete
types may not subtype each other: all concrete types are final and may only
have abstract types as their supertypes.

From this sentence some readers conclude that concrete types cannot have
subtypes. However, it is not the case. Concrete types in Julia can have subtypes
as long as these subtypes are not concrete.

You might ask does this ever happen in practice? The answer is that it happens
and here are the examples when it does.

The first one is Union{} type. This type is not concrete and has no values.
However, it is a subtype of all types, including concrete types, for example:

julia> Union{} <: Int
true

julia> Union{} <: Vector{Missing}
true

The other case is Type{T} type, where T is a DataType (i.e. if T has
type DataType). All common concrete types are subtypes of DataType, e.g.
integers or vectors. So types like Int or Vector{Missing} have DataType
type:

julia> typeof(Int)
DataType

julia> typeof(Vector{Missing})
DataType

which means that DataType must be concrete, and indeed it is:

julia> isconcretetype(DataType)
true

Although DataType is concrete, it has Type{Int} and Type{Vector{Missing}}
as its subtypes (and these types must not, and are not, concrete as we
discussed above):

julia> Type{Int} <: DataType
true

julia> Type{Vector{Missing}} <: DataType
true

julia> isconcretetype(Type{Int})
false

julia> isconcretetype(Type{Vector{Int}})
false

Why these subtyping considerations matter?

The most important lesson learned here is that in your code you should
not assume that concrete type cannot have subtypes, as it can (although these
subtypes cannot be concrete themselves). This observation is mostly relevant
for package developers, who need to write generic code.

However, there are some practical situations when one can be affected by these
subtyping rules. The most common is when one is working with missing values.

Assume that I generate some random matrix containing either 1 or missing:

julia> using Random

julia> Random.seed!(1234);

julia> mat = rand([1, missing], 10, 3)
10×3 Matrix{Union{Missing, Int64}}:
 1          missing  1
  missing   missing  1
 1         1          missing
  missing  1         1
 1         1          missing
 1          missing  1
  missing   missing  1
  missing   missing   missing
 1          missing   missing
  missing   missing   missing

Now, I want to compute the sums of its rows, while skipping missing values.
Here is how you can do it:

julia> [sum(skipmissing(row)) for row in eachrow(mat)]
10-element Vector{Int64}:
 2
 1
 2
 2
 2
 2
 1
 0
 1
 0

However, a very similar codes that follow do not work:

julia> [sum(skipmissing(identity.(row))) for row in eachrow(mat)]
ERROR: ArgumentError: reducing with add_sum over an empty collection of element type Union{} is not allowed.

julia> [sum(skipmissing([x for x in row])) for row in eachrow(mat)]
ERROR: ArgumentError: reducing with add_sum over an empty collection of element type Union{} is not allowed.

What is the reason for the difference? In the skipmissing(row) case row
is a view and retains information about element type of the whole array, which
is Union{Missing, Int64}, so it is able to properly compute sum even in the
case when all values in a row are missing.

On the other hand both identity.(row) and [x for x in row] materialize the
row and perform type narrowing. This type narrowing means that in rows that
only contain missing values the information about Int64 is lost and we get
an error. Let us see it step by step:

julia> row = last(eachrow(mat))
3-element view(::Matrix{Union{Missing, Int64}}, 10, :) with eltype Union{Missing, Int64}:
 missing
 missing
 missing

julia> x = identity.(row)
3-element Vector{Missing}:
 missing
 missing
 missing

julia> eltype(skipmissing(x))
Union{}

As you can see, since skipmissing strips the Missing part from the source
vector element type, we are left with Union{}.

Unfortunately, such errors happen from time to time when one works with
data having missing values. For such cases in Julia many (but not all) common
reduction functions support the init keyword, so you can do:

julia> [sum(skipmissing(identity.(row)), init=0) for row in eachrow(mat)]
10-element Vector{Int64}:
 2
 1
 2
 2
 2
 2
 1
 0
 1
 0

and all is good even if type inference produces Union{}.

Conclusions

The post today was less practical than usual. However, I hope you will find it
useful when Julia tries to take you into a deep dark type system forest where
2+2=5, and the path leading out is only wide enough for one (Mikhail Tal).