Re-posted from: https://bkamins.github.io/julialang/2022/08/26/onebased.html
Introduction
Julia is a powerful language giving a lot of flexibility to developers.
This flexibility was one of the factors that convinced me to use Julia
as my go-to language in my work.
(source: https://www.facebook.com/nixcraft/photos/a.431194973560553/3484925554854131)
However, with great power comes great responsibility. New Julia users are
often overwhelmed by how much there is to learn: type system, multiple dispatch,
memory management, multi-threading, advanced indexing, …
This post is aimed at new users of Julia. Its goal is to present my personal
perspective how I think people should try to learn and understand Julia. In
short, I recommend you do not try to follow all advice from Julia experts on
idiomatic Julia code as it could quickly get overly complicated. Instead write
code that you can easily understand that follows few fundamental principles
(like avoiding global variables or writing type stable code). The advanced tips
related to writing fully generic code are mostly relevant if you want to start
writing packages in Julia, which is not what new users start with.
I will try to explain you this issue by discussing how I approach 1-based
indexing in Julia.
The post was written under Julia 1.8.0 and OffsetArrays.jl 1.12.7.
What is 1-based indexing?
If you read the Julia manual you learn that:
In Julia, indexing of arrays, strings, etc. is 1-based not 0-based.
This means that if you have some collection, like for example a Vector
or a
String
its first element has index of 1
. Here is an example:
julia> v = [5, 7, 13]
3-element Vector{Int64}:
5
7
13
julia> v[1]
5
julia> firstindex(v)
1
julia> s = "abc"
"abc"
julia> s[1]
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> firstindex(s)
1
The firstindex
function in Julia informs us what is a firs index in a
collection. In both cases it returned 1
.
If you are a new Julia user then I advise you that you assume that all
standard collections in Julia use 1
as a starting index. So, for example,
the following code for calculation of the sum of some vector is in my
opinion acceptable (assuming we want to accept only non-empty collections):
julia> function mysum(v::AbstractVector)
n = length(v)
n == 0 && throw(ArgumentError("the collection is empty"))
s = v[1]
for i in 2:n
s += v[i]
end
return s
end
mysum (generic function with 1 method)
julia> mysum(v)
25
Does Julia use 1-based indexing
However, when you show such a code to an experienced Julia developer you will
very quickly get a feedback that it is incorrect as vectors in Julia do not have
to use 1-based indexing. Instead you might get a recommendation of the code that
looks, for example, like this:
julia> function mysum2(v::AbstractVector)
n = length(v)
n == 0 && throw(ArgumentError("the collection is empty"))
s = v[begin]
for i in firstindex(v)+1:lastindex(v)
s += v[i]
end
return s
end
mysum2 (generic function with 1 method)
julia> mysum2(v)
25
This code will work with any AbstractVector
and correctly handle non 1-based
cases. Let us check it by creating a non 1-based vector:
julia> using OffsetArrays
julia> ov = OffsetArray([5, 7, 13], -2)
3-element OffsetArray(::Vector{Int64}, -1:1) with eltype Int64 with indices -1:1:
5
7
13
julia> firstindex(ov)
-1
julia> lastindex(ov)
1
julia> ov[-1]
5
Now check both functions we have defined above:
julia> mysum(ov)
ERROR: BoundsError: attempt to access 3-element
OffsetArray(::Vector{Int64}, -1:1) with eltype Int64
with indices -1:1 at index [2]
julia> mysum2(ov)
25
Indeed, as expected mysum
does not work correctly, but mysum2
is OK.
So why do I say that the mysum
function is acceptable. There are the following
reasons:
- In practice you are very unlikely to encounter OffsetArrays.jl, and if you
will, you will know it; all standard collections in Julia are 1-based. mysum
function is much easier to write and reason about thanmysum2
. To
write themysum
function you need to have only a minimal knowledge of Julia.
To writemysum2
you need to know much more.- Even if you pass a vector that is not 1-based to
mysum
then in the worst
case what you will get is an error, so in this case you will not get a wrong
result silently. (of course in general there could be a case that you silently
obtained an incorrect result, but this is not the case when you want to iterate
all elements of a vector, which is the most frequent use-case). - Descriptions of many algorithms in books assume 1-based indexing, so it is
easy to port them to code if we assume a fixed first index. Of course it is
usually relatively easy to change the algorithm to support other indexing
base, but it requires mental effort.
Now, if you want extra safety you can add a call to the
Base.require_one_based_indexing
function on top of your function like this:
julia> function mysum3(v::AbstractVector)
Base.require_one_based_indexing(v)
n = length(v)
n == 0 && throw(ArgumentError("the collection is empty"))
s = v[1]
for i in 2:n
s += v[i]
end
return s
end
mysum3 (generic function with 2 methods)
julia> mysum3(ov)
ERROR: ArgumentError: offset arrays are not supported
but got an array with index other than 1
Now you are sure that you only work with vectors supporting 1-based indexing
and using other vectors throw an informative error.
Practical examples of handling of 1-based indexing
Let me give you two practical examples what is done in some popular packages.
First let us look at the shuffle!
function implementation in the Random
module:
function shuffle!(r::AbstractRNG, a::AbstractArray)
require_one_based_indexing(a)
n = length(a)
n <= 1 && return a # nextpow below won't work with n == 0
@assert n <= Int64(2)^52
mask = nextpow(2, n) - 1
for i = n:-1:2
(mask >> 1) == i && (mask >>= 1)
j = 1 + rand(r, ltm52(i, mask))
a[i], a[j] = a[j], a[i]
end
return a
end
Note two things. First, although in general it would be possible to re-write
this code to accept non 1-based indices Julia developers decided not to do so (I
assume this is to make the code easier to read). However, there is more to it.
The second point is that I have never heard anyone complain that shuffle!
implementation should be changed. In general, of course it could, but the
reality is that during the last 10 years of Julia usage it seems that this
functionality gap was not hurting anyone enough to warrant fixing this design
flaw.
The second example is from DataFrames.jl. Some time ago we got a request to
correctly handle non 1-based indexed vectors as columns of a DataFrame
.
We could have done one of two things:
- allow non 1-based vectors and adjust all code to support this change;
- disallow non 1-based vectors and throw an error if the user tries to put
such a vector in aDataFrame
.
Again, we could have went the more flexible path of allowing non 1-based vectors,
but I preferred not to do so. The reasons are:
- We would need to increase the complexity of the API of DataFrames.jl and
additionally our users would constantly need to think and check if some
data frame uses 1-based or some other indexing; - Internal code would be more complex. Any change in the DataFrames.jl package
would need to be designed in a way that correctly handles non 1-based
indexing; - All tests in DataFrames.jl would need to be adjusted to add coverage of non
1-based indexed case for all functions.
I thought it was too much effort for too little benefit. It is much easier, both
for users and for developers to assume and enforce that we only accept as
columns vectors that use 1-based indexing.
Conclusions
When learning Julia I recommend you try to keep it simple. Learn the
functionality of Base Julia and how fundamental types defined in it work.
Do not try to write overly general code or try to increase the speed of code too
early (e.g. by using the @inbounds
macro – usually using it does not give much
benefit and at the same time introduces the risk of bugs in your code).
What I recommend is writing simple code but making sure it is correct for the
cases you expect to encounter. In particular make sure you write tests of your
code (it is really easy in Julia given the functionality of the Test
module).
After you get comfortable with using Julia you will learn that most of the
things you assume how it works can be changed. Then you will be ready to
go to the next level and write a fully generic code if you decide you need to.
In summary, assuming that Julia uses 1-based indexing is a safe enough bet that
is not likely to hurt you in practice (even if it is not 100% correct in
general). Therefore, when you learn Julia I recommend you make this assumption
and just enjoy the language. You will have time to learn advanced design
patterns later, after you get comfortable with the language.