Author Archives: Blog by Bogumił Kamiński

How does Tables.jl handle schema-less tables?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/09/01/tables.html

Introduction

Tables.jl is a fundamental package in the JuliaData ecosystem.

One of the key concepts used in Tables.jl is a schema of a table.
Schema is information about the column names and types of a table.
Having access to schema is useful as being able to query these properties
is constantly needed in practice.

In this post I want to discuss in what cases a table might not have
a schema information associated with it and how Tables.jl handles this case.

The post was written using Julia 1.9.2, Tables.jl 1.10.1, and DataFrames.jl 1.6.1.

What is a schema of a table?

Let us create some example tables and investigate their schema:

julia> df = DataFrame(a=[1, 2], b=[3.0, 4.0], c=[1, "a"])
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> Tables.schema(df)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> ntv = Tables.columntable(df)
(a = [1, 2], b = [3.0, 4.0], c = Any[1, "a"])

julia> Tables.schema(ntv)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> vnt = Tables.rowtable(df)
2-element Vector{NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}}:
 NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}((1, 3.0, 1))
 NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}((2, 4.0, "a"))

julia> Tables.schema(vnt)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

I have created a data frame df with three columns and then converted it into
a named tuple of vectors ntv and a vector of named tuples vnt.
All these three objects are Tables.jl tables.

In all cases we have checked that the Tables.schema properly identifies the schema
of the se tables. They have three columns :a, :b, and :c, and these columns
have types :Int64, Float64, and Any.

So in what case a table might not have a schema? To understand this consider the
following transformation:

julia> vnt2 = [(; collect(pairs(r))...) for r in vnt]
2-element Vector{NamedTuple{(:a, :b, :c)}}:
 (a = 1, b = 3.0, c = 1)
 (a = 2, b = 4.0, c = "a")

julia> Tables.schema(vnt2)

julia> isnothing(Tables.schema(vnt2))
true

Let us understand what happens in this code. With
(; collect(pairs(r))...) operation I make the type of the fields
of the named tuples stored in the vnt vector concrete.
To understand this better note:

julia> typeof(vnt[1])
NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}

julia> typeof((; collect(pairs(vnt[1]))...))
NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Int64}}

Note that the type of the :c field is Any and Int64 respectively.

This transformation means that the vnt2 vector itself does not, in consequence,
have a concrete element type:

julia> eltype(vnt2)
NamedTuple{(:a, :b, :c)}

This type only specifies column names, but not their types. In such a situation
the Tables.schema function returns nothing signaling to the user that the schema
of a table is unknown.

You might think that this is an artificial example but consider the following common case:

julia> vnt3 = [(a=1, b=2), (a=missing, b=3)]
2-element Vector{NamedTuple{(:a, :b)}}:
 (a = 1, b = 2)
 (a = missing, b = 3)

julia> Tables.schema(vnt3)

julia> isnothing(Tables.schema(vnt3))
true

As you can see when we have a vector of named tuples, a mix in of missing values
makes it schema-less.

How does Tables.jl handle schema-less tables? Part 1

If Tables.jl detects a schema-less table it tries to dynamically detect its schema
when functions defined in Tables.jl are called. The three key functions here are:

  • Tables.columns: returns an object that can be queried column-wise;
  • Tables.rows: returns an object that can be queried row-wise;
  • Tables.dictrowtable: returns a row-wise vector that performs “column unioning”.

You might wonder what this “column unioning” part means. We will get to it in a second.

First let us investigate how these functions work on our vnt2 object that is schema-less:

julia> Tables.columns(vnt2)
Tables.CopiedColumns{NamedTuple{(:a, :b, :c), Tuple{Vector{Int64}, Vector{Float64}, Vector{Any}}}} with 2 rows, 3 columns, and schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> Tables.schema(Tables.columns(vnt2))
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> Tables.rows(vnt2)
2-element Vector{NamedTuple{(:a, :b, :c)}}:
 (a = 1, b = 3.0, c = 1)
 (a = 2, b = 4.0, c = "a")

julia> Tables.schema(Tables.rows(vnt2))

julia> isnothing(Tables.schema(Tables.rows(vnt2)))
true

julia> Tables.dictrowtable(vnt2)
Tables.DictRowTable([:a, :b, :c], Dict{Symbol, Type}(:a => Int64, :b => Float64, :c => Union{Int64, String}), Dict{Symbol, Any}[Dict(:a => 1, :b => 3.0, :c => 1), Dict(:a => 2, :b => 4.0, :c => "a")])

julia> Tables.schema(Tables.dictrowtable(vnt2))
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Union{Int64, String}

As you can see Tables.columns performed eltype detection and determined the type of column :c as Any.
The Tables.rows produces a schema-less object as for vnt2 object calling Tables.rows on it just returns the input.
Finally Tables.dictrowtable performs a narrower eltype for column :c which is Union{Int64, String}.
All these results are technically correct, but it is important to note that they start to matter when we create
a data frame from the result:

julia> DataFrame(Tables.columns(vnt2))
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> DataFrame(vnt2)
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> DataFrame(Tables.dictrowtable(vnt2))
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Union…
─────┼────────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

In this case for Tables.columns and Tables.rows we have Any as element type for :c.
While for Tables.dictrowtable we get the Union. This difference might matter to you
if you wanted to later change data stored in :c column.

However, the key difference between Tables.dictrowtable and other methods is visible
in case when we have heterogeneous column list data.

How does Tables.jl handle schema-less tables? Part 2

What is the data that has a heterogeneous column list?
It is relatively common in practice. Let me create two examples:

julia> h1 = [(; a=1), (; a=2, b=3)]
2-element Vector{NamedTuple}:
 (a = 1,)
 (a = 2, b = 3)

julia> h2 = [(; a=1), (; b=3)]
2-element Vector{NamedTuple{names, Tuple{Int64}} where names}:
 (a = 1,)
 (b = 3,)

Both h1 and h2 have a different set of columns. How does the
Tables.jl handle them? Let us check by creating a data frame from them:

julia> DataFrame(Tables.columns(h1))
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> DataFrame(Tables.rows(h1))
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> DataFrame(Tables.dictrowtable(h1))
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼────────────────
   1 │     1  missing
   2 │     2        3

For the h1 input we see that using Tables.columns and Tables.rows drops the :b
column while Tables.dictrowtable keeps it and fills missing entries with missing.
This behavior is called “column unioning”.

Now let us check the h2 case:

julia> DataFrame(Tables.columns(h2))
ERROR: type NamedTuple has no field a

julia> DataFrame(Tables.rows(h2))
ERROR: type NamedTuple has no field a

julia> DataFrame(Tables.dictrowtable(h2))
2×2 DataFrame
 Row │ a        b
     │ Int64?   Int64?
─────┼──────────────────
   1 │       1  missing
   2 │ missing        3

Now we have an even stranger behavior. The first two calls error, while the
Tables.dictrowtable works by “column unioning”. Why do the first two cases error?
The reason is that DataFrame constructor internally always calls Tables.columns
on a passed table. And Tables.columns when doing column detection assumes that
the list of columns for row-wise table is given by its first row. So even
without calling DataFrame we get an error when we do:

julia> Tables.columns(h2)
ERROR: type NamedTuple has no field a

In summary we have two cases of behaviors for row-wise tables:

  • assume the columns are given in the first row of a table;
    this is what Tables.columns does, and, in consequence DataFrame
    constructor;
  • perform “column unioning”; this is what Tables.dictrowtable does.

Conclusions

The take-away from the examples we have given today are as follows:

  • by default DataFrame(row_wise_table) inherits the Tables.columns behavior and assumes
    that the set of columns of the passed row-wise table is given by its first row;
  • if you want to override the default behavior, and want to perform “column unioning” instead
    call DataFrame(Tables.dictrowtable(row_wise_table)); this option is useful if you have
    data that has heterogeneous column lists in different rows.

The examples given today were a bit low-level, but I hope they improved your understanding
how the Tables.jl functions handle different tabular data sources.

How does Tables.jl handle schema-less tables?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/09/01/tables.html

Introduction

Tables.jl is a fundamental package in the JuliaData ecosystem.

One of the key concepts used in Tables.jl is a schema of a table.
Schema is information about the column names and types of a table.
Having access to schema is useful as being able to query these properties
is constantly needed in practice.

In this post I want to discuss in what cases a table might not have
a schema information associated with it and how Tables.jl handles this case.

The post was written using Julia 1.9.2, Tables.jl 1.10.1, and DataFrames.jl 1.6.1.

What is a schema of a table?

Let us create some example tables and investigate their schema:

julia> df = DataFrame(a=[1, 2], b=[3.0, 4.0], c=[1, "a"])
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> Tables.schema(df)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> ntv = Tables.columntable(df)
(a = [1, 2], b = [3.0, 4.0], c = Any[1, "a"])

julia> Tables.schema(ntv)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> vnt = Tables.rowtable(df)
2-element Vector{NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}}:
 NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}((1, 3.0, 1))
 NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}((2, 4.0, "a"))

julia> Tables.schema(vnt)
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

I have created a data frame df with three columns and then converted it into
a named tuple of vectors ntv and a vector of named tuples vnt.
All these three objects are Tables.jl tables.

In all cases we have checked that the Tables.schema properly identifies the schema
of the se tables. They have three columns :a, :b, and :c, and these columns
have types :Int64, Float64, and Any.

So in what case a table might not have a schema? To understand this consider the
following transformation:

julia> vnt2 = [(; collect(pairs(r))...) for r in vnt]
2-element Vector{NamedTuple{(:a, :b, :c)}}:
 (a = 1, b = 3.0, c = 1)
 (a = 2, b = 4.0, c = "a")

julia> Tables.schema(vnt2)

julia> isnothing(Tables.schema(vnt2))
true

Let us understand what happens in this code. With
(; collect(pairs(r))...) operation I make the type of the fields
of the named tuples stored in the vnt vector concrete.
To understand this better note:

julia> typeof(vnt[1])
NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Any}}

julia> typeof((; collect(pairs(vnt[1]))...))
NamedTuple{(:a, :b, :c), Tuple{Int64, Float64, Int64}}

Note that the type of the :c field is Any and Int64 respectively.

This transformation means that the vnt2 vector itself does not, in consequence,
have a concrete element type:

julia> eltype(vnt2)
NamedTuple{(:a, :b, :c)}

This type only specifies column names, but not their types. In such a situation
the Tables.schema function returns nothing signaling to the user that the schema
of a table is unknown.

You might think that this is an artificial example but consider the following common case:

julia> vnt3 = [(a=1, b=2), (a=missing, b=3)]
2-element Vector{NamedTuple{(:a, :b)}}:
 (a = 1, b = 2)
 (a = missing, b = 3)

julia> Tables.schema(vnt3)

julia> isnothing(Tables.schema(vnt3))
true

As you can see when we have a vector of named tuples, a mix in of missing values
makes it schema-less.

How does Tables.jl handle schema-less tables? Part 1

If Tables.jl detects a schema-less table it tries to dynamically detect its schema
when functions defined in Tables.jl are called. The three key functions here are:

  • Tables.columns: returns an object that can be queried column-wise;
  • Tables.rows: returns an object that can be queried row-wise;
  • Tables.dictrowtable: returns a row-wise vector that performs “column unioning”.

You might wonder what this “column unioning” part means. We will get to it in a second.

First let us investigate how these functions work on our vnt2 object that is schema-less:

julia> Tables.columns(vnt2)
Tables.CopiedColumns{NamedTuple{(:a, :b, :c), Tuple{Vector{Int64}, Vector{Float64}, Vector{Any}}}} with 2 rows, 3 columns, and schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> Tables.schema(Tables.columns(vnt2))
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Any

julia> Tables.rows(vnt2)
2-element Vector{NamedTuple{(:a, :b, :c)}}:
 (a = 1, b = 3.0, c = 1)
 (a = 2, b = 4.0, c = "a")

julia> Tables.schema(Tables.rows(vnt2))

julia> isnothing(Tables.schema(Tables.rows(vnt2)))
true

julia> Tables.dictrowtable(vnt2)
Tables.DictRowTable([:a, :b, :c], Dict{Symbol, Type}(:a => Int64, :b => Float64, :c => Union{Int64, String}), Dict{Symbol, Any}[Dict(:a => 1, :b => 3.0, :c => 1), Dict(:a => 2, :b => 4.0, :c => "a")])

julia> Tables.schema(Tables.dictrowtable(vnt2))
Tables.Schema:
 :a  Int64
 :b  Float64
 :c  Union{Int64, String}

As you can see Tables.columns performed eltype detection and determined the type of column :c as Any.
The Tables.rows produces a schema-less object as for vnt2 object calling Tables.rows on it just returns the input.
Finally Tables.dictrowtable performs a narrower eltype for column :c which is Union{Int64, String}.
All these results are technically correct, but it is important to note that they start to matter when we create
a data frame from the result:

julia> DataFrame(Tables.columns(vnt2))
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> DataFrame(vnt2)
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Any
─────┼─────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

julia> DataFrame(Tables.dictrowtable(vnt2))
2×3 DataFrame
 Row │ a      b        c
     │ Int64  Float64  Union…
─────┼────────────────────────
   1 │     1      3.0  1
   2 │     2      4.0  a

In this case for Tables.columns and Tables.rows we have Any as element type for :c.
While for Tables.dictrowtable we get the Union. This difference might matter to you
if you wanted to later change data stored in :c column.

However, the key difference between Tables.dictrowtable and other methods is visible
in case when we have heterogeneous column list data.

How does Tables.jl handle schema-less tables? Part 2

What is the data that has a heterogeneous column list?
It is relatively common in practice. Let me create two examples:

julia> h1 = [(; a=1), (; a=2, b=3)]
2-element Vector{NamedTuple}:
 (a = 1,)
 (a = 2, b = 3)

julia> h2 = [(; a=1), (; b=3)]
2-element Vector{NamedTuple{names, Tuple{Int64}} where names}:
 (a = 1,)
 (b = 3,)

Both h1 and h2 have a different set of columns. How does the
Tables.jl handle them? Let us check by creating a data frame from them:

julia> DataFrame(Tables.columns(h1))
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> DataFrame(Tables.rows(h1))
2×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2

julia> DataFrame(Tables.dictrowtable(h1))
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64?
─────┼────────────────
   1 │     1  missing
   2 │     2        3

For the h1 input we see that using Tables.columns and Tables.rows drops the :b
column while Tables.dictrowtable keeps it and fills missing entries with missing.
This behavior is called “column unioning”.

Now let us check the h2 case:

julia> DataFrame(Tables.columns(h2))
ERROR: type NamedTuple has no field a

julia> DataFrame(Tables.rows(h2))
ERROR: type NamedTuple has no field a

julia> DataFrame(Tables.dictrowtable(h2))
2×2 DataFrame
 Row │ a        b
     │ Int64?   Int64?
─────┼──────────────────
   1 │       1  missing
   2 │ missing        3

Now we have an even stranger behavior. The first two calls error, while the
Tables.dictrowtable works by “column unioning”. Why do the first two cases error?
The reason is that DataFrame constructor internally always calls Tables.columns
on a passed table. And Tables.columns when doing column detection assumes that
the list of columns for row-wise table is given by its first row. So even
without calling DataFrame we get an error when we do:

julia> Tables.columns(h2)
ERROR: type NamedTuple has no field a

In summary we have two cases of behaviors for row-wise tables:

  • assume the columns are given in the first row of a table;
    this is what Tables.columns does, and, in consequence DataFrame
    constructor;
  • perform “column unioning”; this is what Tables.dictrowtable does.

Conclusions

The take-away from the examples we have given today are as follows:

  • by default DataFrame(row_wise_table) inherits the Tables.columns behavior and assumes
    that the set of columns of the passed row-wise table is given by its first row;
  • if you want to override the default behavior, and want to perform “column unioning” instead
    call DataFrame(Tables.dictrowtable(row_wise_table)); this option is useful if you have
    data that has heterogeneous column lists in different rows.

The examples given today were a bit low-level, but I hope they improved your understanding
how the Tables.jl functions handle different tabular data sources.

Infiltrator.jl: a no-nonsense debugging

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/08/25/infiltrate.html

Introduction

During JuliaCon 2023 I have had several
discussions about setup of my working environment when
I develop Julia code. One of the common questions was
tools that I use to debug my code.

With Julia an important aspect of the debugging tool
one uses is to ensure that it does not have a significant
impact on the performance of code execution. Keeping this
condition in mind, I find Infiltrator.jl
quite convenient. In this post I want to show an example
how this package can be used.

The post was written using Julia 1.9.2 and Infiltrator.jl 1.6.4.
Note that you should use the examples that I present in the terminal
using the standard Julia REPL.

An example project

Let us write a simple function that merges two sorted vectors into
a new sorted vector:

function mergesorted(x::T, y::T) where {T<:Vector}
    @assert issorted(x) && issorted(y)
    z = T(undef, length(x) + length(y))
    ix, iy, iz = 1, 1, 1
    for iz in eachindex(z)
        if x[ix] < y[iy]
            z[iz] = x[ix]
            ix += 1
        else
            z[iz] = y[iy]
            iy += 1
        end
    end
    return z
end

Now test it on some example data:

julia> mergesorted([1, 3], [2, 4])
ERROR: BoundsError: attempt to access 2-element Vector{Int64} at index [3]

We see that we get a problem when we try to get an element from a
vector with a too large index. Let us try to infiltrate this issue
but providing an appropriate condition when we want to investigate
the state of our function:

using Infiltrator
function mergesorted(x::T, y::T) where {T<:Vector}
    @assert issorted(x) && issorted(y)
    z = T(undef, length(x) + length(y))
    ix, iy, iz = 1, 1, 1
    for iz in eachindex(z)
        @infiltrate ix > length(x) || iy > length(y)
        if x[ix] < y[iy]
            z[iz] = x[ix]
            ix += 1
        else
            z[iz] = y[iy]
            iy += 1
        end
    end
    return z
end

The magic of @infiltrate ix > length(x) || iy > length(y)
is that the execution of the mergesorted function will
be interrupted at the moment when the passed condition is met.
Our condition is that either ix or iy index gets too large.

Let us run our test with an updated definition of the function:

julia> mergesorted([1, 3], [2, 4])
Infiltrating mergesorted(x::Vector{Int64}, y::Vector{Int64})
  at REPL[11]:6

infil> @locals
- x::Vector{Int64} = [1, 3]
- T::DataType = Vector{Int64}
- iz::Int64 = 4
- y::Vector{Int64} = [2, 4]
- z::Vector{Int64} = [1, 2, 3, 2041652468496]
- iy::Int64 = 2
- ix::Int64 = 3

infil> @continue

ERROR: BoundsError: attempt to access 2-element Vector{Int64} at index [3]

We see that we have a problem that we get beyond the end of the
x vector while the last element of the y vector is still not
processed. It looks that we need to check if we are beyond
the end of the x vector, and if it is the case jump right
to the else part of the code:

function mergesorted(x::T, y::T) where {T<:Vector}
    @assert issorted(x) && issorted(y)
    z = T(undef, length(x) + length(y))
    ix, iy, iz = 1, 1, 1
    for iz in eachindex(z)
        @infiltrate ix > length(x) || iy > length(y)
        if ix <= length(x) && x[ix] < y[iy]
            z[iz] = x[ix]
            ix += 1
        else
            z[iz] = y[iy]
            iy += 1
        end
    end
    return z
end

Let us run the code:

julia> mergesorted([1, 3], [2, 4])
Infiltrating mergesorted(x::Vector{Int64}, y::Vector{Int64})
  at REPL[13]:6

infil> @locals
- x::Vector{Int64} = [1, 3]
- T::DataType = Vector{Int64}
- iz::Int64 = 4
- y::Vector{Int64} = [2, 4]
- z::Vector{Int64} = [1, 2, 3, 8589934594]
- iy::Int64 = 2
- ix::Int64 = 3

infil> @continue

4-element Vector{Int64}:
 1
 2
 3
 4

This time things seem to work as expected. Let us thus turn-off
infiltration and run some randomized tests:

function mergesorted(x::T, y::T) where {T<:Vector}
    @assert issorted(x) && issorted(y)
    z = T(undef, length(x) + length(y))
    ix, iy, iz = 1, 1, 1
    for iz in eachindex(z)
        if ix <= length(x) && x[ix] < y[iy]
            z[iz] = x[ix]
            ix += 1
        else
            z[iz] = y[iy]
            iy += 1
        end
    end
    return z
end

And run the tests:

julia> using Random

julia> using Test

julia> Random.seed!(1234);

julia> for i in 1:10
           x = sort!(rand(rand(1:10)))
               y = sort!(rand(rand(1:10)))
           @assert mergesorted(x, y) == sort!([x; y])
       end
ERROR: BoundsError: attempt to access 3-element Vector{Float64} at index [4]

As a side note, the rand(rand(1:10)) is a convenient pattern for generating
random vectors of random length.

Going back to our main topic we see that we still get a problem.
How to diagnose it? This time, as an example,
let me show how to turn infiltration on when an error happens:

function mergesorted(x::T, y::T) where {T<:Vector}
    @assert issorted(x) && issorted(y)
    z = T(undef, length(x) + length(y))
    ix, iy, iz = 1, 1, 1
    for iz in eachindex(z)
        try
            if ix <= length(x) && x[ix] < y[iy]
                z[iz] = x[ix]
                ix += 1
            else
                z[iz] = y[iy]
                iy += 1
            end
        catch e
            @infiltrate
            rethrow(e)
        end
    end
    return z
end

Now run the same test code:

julia> Random.seed!(1234);

julia> for i in 1:10
           x = sort!(rand(rand(1:10)))
               y = sort!(rand(rand(1:10)))
           @assert mergesorted(x, y) == sort!([x; y])
       end
Infiltrating mergesorted(x::Vector{Float64}, y::Vector{Float64})
  at REPL[31]:15

infil> @locals
- x::Vector{Float64} = [0.6932923170086805, 0.7600131804670265]
- T::DataType = Vector{Float64}
- iz::Int64 = 4
- y::Vector{Float64} = [0.11679226454435099, 0.20295936651757684, 0.43552097477865936]
- e::BoundsError = BoundsError([0.11679226454435099, 0.20295936651757684, 0.43552097477865936], (4,))
- z::Vector{Float64} = [0.11679226454435099, 0.20295936651757684, 0.43552097477865936, 5.0e-324, 5.0e-324]
- iy::Int64 = 4
- ix::Int64 = 1

infil> @continue

ERROR: BoundsError: attempt to access 3-element Vector{Float64} at index [4]

Ah – we can now see where the problem is. We have not covered the case when iy is greater than
length of y. It seems we are close. Let us do a final attempt:

function mergesorted(x::T, y::T) where {T<:Vector}
    @assert issorted(x) && issorted(y)
    z = T(undef, length(x) + length(y))
    ix, iy, iz = 1, 1, 1
    for iz in eachindex(z)
        if ix <= length(x) && (iy > length(y) || x[ix] < y[iy])
            z[iz] = x[ix]
            ix += 1
        else
            z[iz] = y[iy]
            iy += 1
        end
    end
    return z
end

We are ready for an even more comprehensive test:

julia> Random.seed!(1234);

julia> for i in 1:10_000
           x = sort!(rand(rand(0:10)))
               y = sort!(rand(rand(0:10)))
           @assert mergesorted(x, y) == sort!([x; y])
       end

This time we got no errors, so we can be relatively confident that our code works well.

Conclusions

I find Infiltrator.jl useful because it is a lightweight solution.
I did not discuss all of its features. However, even these minimal examples
I have shown are quite nice in practice:

  1. You can use @infiltrate with a condition.
  2. You can put @infiltrate in a try-catch-end block
    to be able to infiltrate into the state of your computations
    at the moment an exception is thrown (a thing that I quite often need in practice).

Happy debugging!