Re-posted from: https://bkamins.github.io/julialang/2021/11/12/so.html
Introduction
I have just hit 1000 answers for the [julia] tag on Stack Overflow so I felt
like writing about it. In order to have a complete one thousand and one stories
collection today I thought of writing about a feature of Julia that will show
you how challenging the design decisions that have to be made when designing
functions are. We will work with one of the most fundamental functions, the
sum
.
Before I go to the technical details let me go back to my recent post that I wrote about a model that Alan Edelman prepared for one of
classes during his studies. Recently I had an opportunity to discuss with him
about the exact context of creation of the model. Here you can read the summary
which I was lacking when I was writing the original post:
In 1980, Alan Edelman was a 17 year old freshman at Yale University where he
met his social science distribution requirement by taking Psychology 101
with Dr. Kenji Hakuta. There he learned about the famous
Kitty Genovese murder and the concept of diffusion of responsibility.
Having to write a paper for this freshman class, and being a “math person” he
figured why not take the idea of diffusion literally and write a paper about
that. He does not recall if he still has a copy of that paper, but it may be
in a box in the attic. He thinks he got an A on that paper.
The post was written under Julia 1.6.3.
Are you sure you understand how the sum
function works?
I will test your knowledge by example. The first task is to perform row-wise
summation of the following matrix:
julia> mat = fill(Int8(100), 5, 10)
5×10 Matrix{Int8}:
100 100 100 100 100 100 100 100 100 100
100 100 100 100 100 100 100 100 100 100
100 100 100 100 100 100 100 100 100 100
100 100 100 100 100 100 100 100 100 100
100 100 100 100 100 100 100 100 100 100
julia> sum(eachcol(mat))
5-element Vector{Int8}:
-24
-24
-24
-24
-24
julia> sum.(eachrow(mat))
5-element Vector{Int64}:
1000
1000
1000
1000
1000
If you are surprised by the result let me explain the situation. In the first
case sum
operates on whole vectors. In the second case sum
operates on
scalars. Why would this make the difference? The reason is that sum
does not
use +
for aggregation, but it employs the Base.add_sum
as the reduction
operator which is defined as follows:
add_sum(x, y) = x + y
add_sum(x::SmallSigned, y::SmallSigned) = Int(x) + Int(y)
add_sum(x::SmallUnsigned, y::SmallUnsigned) = UInt(x) + UInt(y)
add_sum(x::Real, y::Real)::Real = x + y
As you can see the add_sum
will promote the result to Int
or UInt
only if
it is passed scalar integers. Therefore if we pass it vectors of integers no
promotion happens.
Now for sure you know what will be the sum of the following vector:
julia> v = Integer[0x64; fill(Int8(100), 9)]
10-element Vector{Integer}:
0x64
100
100
100
100
100
100
100
100
100
Let us check:
julia> sum(v)
0xe8
Could you have guessed it? If yes, you probably assume that sum
is using
foldl
and you have noticed that Base.add_sum
does not perform promotion
to Int
or UInt
when you mix signed and unsigned integers.
Unfortunately, if this was your guess, you are wrong in general. Consider this
scenario:
julia> sum(Integer[0x64; fill(Int8(100), 9999)])
937536
The result we get is quite surprising. We could have expected:
julia> foldl(+, Integer[0x64; fill(Int8(100), 9999)])
0x40
as we know that Base.add_sum
will always fall back to +
in this case. This
would be consistent with the previous result.
What is the reason of the difference? Actually sum
does not use foldl
but
reduce
, and reduce
does not have a guaranteed order of summation. This
means that in the latter case we must have made some summation of two Int8
values using
(100)Base.add_sum
which promoted the result to Int
.
Maybe above you thought of calling foldl
with Base.add_sum
like ths?:
julia> foldl(Base.add_sum, Integer[0x64; fill(Int8(100), 9999)])
0x00000000000f4240
0x00000000000f4240
is just 1000000
(which is a correct result if we were
widening types always when doing the summation), but why do we get such a weird
value? The reason is that foldl
differs from sum
in the way it initializes
the summation. It uses the first element of the collection(0x64
in our case)
and promotes it to the type that woud be produced if this element were added
using Base.add_sum
to itself and we know UInt8
to UInt8
invokes promotion
to UInt
.
What else could go wrong? Try this:
julia> using Random
julia> Random.seed!(1234)
MersenneTwister(1234)
julia> x = rand(1000)
1000-element Vector{Float64}:
0.5908446386657102
0.7667970365022592
0.5662374165061859
0.4600853424625171
0.7940257103317943
0.8541465903790502
0.20058603493384108
0.2986142783434118
⋮
0.5762976355934157
0.08831200391130656
0.8994769043886504
0.8232831225471882
0.37869007913520947
0.7812366659068535
0.4651012221417914
julia> sum(x)
496.84883432553806
julia> sum(x, init=0.0)
496.84883432553846
As you can see the results are not identical (they differ at the second least
significant digit). What have messed up this time? By specifying init
keyword
argument, although 0.0
is a neutral in summation, we forced sum
to
use a different summation order again and for floats order of summation is
known to affect the result.
Wait – have I said that 0.0
is a neutral in summation? I have lied. See:
julia> isequal(sum([-0.0, -0.0]), sum([-0.0, -0.0], init=0.0))
false
because:
julia> sum([-0.0, -0.0])
-0.0
julia> sum([-0.0, -0.0], init=0.0)
0.0
At this point I start asking myself why in my math classes I was taught to use
real numbers and not IEEE 754 standard which seems to be at play much
more often in practice (at least if you are using computers). I will have to
pose this question Alan Edelman who is a professor of both mathematics and
computer science the next time I have the privilege of talking with him.
Conclusions
In this blog and on Stack Overflow I try to show users that Julia is a nice
language to work with (of which I am really convinced).
However, having written these 1000 stories that end good one wants to show at
least one story when the dark side shows up (of course the problems I have
discussed are not Julia specific, but their particular manifestation is a
consequence of design decisions that Julia developers made).
Can we do anything about the problems I have shown? There is one remedy and one
warning to keep in mind.
The remedy is the following: when working with collections in Julia always take
care to make sure they have homogeneous type of elements and this type is
chosen appropriately to the operation you want to perform. Except for very rare
situations there is little sense in mixing numbers of different types in one
collection so just do not do it.
The warning is the IEEE floating point arithmetic standard consequence: if you
work with floats better treat the result of operations on them as only
approximate.