Author Archives: Blog by Bogumił Kamiński

Boost your time to market with Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/11/24/pe104.html

Introduction

Today my post was inspired by an interesting write-up by Christopher Rackauckas:
ChatGPT performs better on Julia than Python (and R) for Large Language Model (LLM) Code Generation. Why?.
The general take away of this text is that writing Julia code with LLM support is efficient.

A thing that I think is relevant here is that when you write code you typically want to get your job done as fast as possible.
Getting the job done typically involves three steps (possibly repeated):

  • Thinking about the algorithm.
  • Implementing it.
  • Running the code.

Each of these steps takes time. And you, usually, want to get a solution in a minimum possible total time.

My experience is that Julia is really fast to code with once you learn it (and as Chris explained using LLMs
boosts this time even further).

Since Julia is compiled it also offers fast code execution times.

So we are left with algorithm design time. And this is the point that lead me to write this post.
The reason is that my experience is that I often decide to use non-optimal algorithms, because
with Julia I know my code will be reasonably fast anyway. This is especially the case if I can
see a simple solution that uses basic functionalities in-built into Julia and do not require too much thought.

To test-drive this assertion today I chose a Project Euler problem that I have not solved before and decided
to write down my thought process when solving it. My choice was problem 104, because it was a first relatively easy
(so I assumed I would not need to spend too much time on it) problem I have not solved yet.

The code was written under Julia 1.9.2.

The problem

The Project Euler problem 104 is stated as follows (abbreviated):

Find the number of the smallest Fibonacci number for which
its first nine digits contain all the digits 1 to 9 (not necessarily in order)
and the same condition holds for its last nine digits.

Thinking about the algorithm

The observations I made are the following:

  • I assumed that the problem is not trivial – I need to do some optimizations to solve it in a reasonable time.
  • The last nine digits are easier; I can just track values modulo 10^9 and easily fit the data into Int type.
  • The number formed by last nine digits must be divisible by 9 as the sum of numbers from 1 to 9 is 45.
  • The first nine digits are harder. I could get them by using the Binet’s Formula, but this would require analysis of round-off error of floating-point operations.
    This would take thinking time, so maybe it is enough to just use the BigInt values and check them only if the Int based test passes.
  • The good thing is tha Julia ships with the digits function so I can easily get a vector of digits of a given number.
  • I need to combine Int and BigInt calculations to cut down the processing time. Most of the time it should be enough to analyze just Int values.

As you can see, I decided not to invest too much into the thinking time,
hoping that the implementation of the above algorithm will be easy and its run time will be acceptable.

Implementing it

The following code contains the implementation of the algorithm following the observations I have described above:

function pe104()
    big1, big2 = big(1), big(1)
    small1, small2 = 1, 1
    k = 2
    while true
        k += 1
        big1, big2 = big1 + big2, big1
        small1, small2 = (small1 + small2) % 10^9, small1

        small1 % 9 == 0 &&
        sort!(digits(small1)) == 1:9 &&
        sort!(last(digits(big1), 9)) == 1:9 &&
        return k
    end
end

Note that we perform computations both on Int values (small1 and small2) and on BigInt values (big1 and big2).
In the code we do three tests to decide if a given k is good. The tests are ordered in the increasing order of computational cost.
Note that since the Int values are computed modulo 10^9 I do not need to do any trimming of vector of digits returned by digits(small1).

Running the code

Let us check how much time it takes to find the solution:

julia> @time pe104();
 16.634432 seconds (696.82 k allocations: 4.441 GiB, 3.01% gc time)

Note that I put ; at the end of the line to suppress printing of the solution (in hope to encourage you to run the code yourself).
The point is that the run time of the code is just a few seconds. The code is clearly not optimal and could be significantly faster.
However, I am sure that speeding it up would take me more time than I would save in run time. Thus, I am happy with what I got.

Conclusions

In general Julia allows you to write efficient code. There are many applications where it is crucial and it is worth to spend a lot of time
on optimization of run-time. This is especially true if you write your code once and run it millions of times.

However, in many cases, like the one presented today, Julia is fast enough so even code that is not fully performant is good enough.
Often such simpler code takes: less time to design, less time to implement, and less time to prove its correctness.

Is vector pooling considered harmful?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/11/17/pooling.html

Introduction

PooledArrays.jl is a package providing a custom array type that can be used for compression of your data if it has a few unique elements.
In this post I want to explain the design of PooledArray object and discuss how it affects its performance.

The post was written under Julia 1.9.2, PooledArrays.jl 1.4.3, and DataFrames.jl 1.6.1.

How pooling works

The definition of the PooledArray type is the following:

mutable struct PooledArray{T, R<:Integer, N, RA} <: AbstractArray{T, N}
    refs::RA
    pool::Vector{T}
    invpool::Dict{T,R}
    # refcount[] is 1 if only one PooledArray holds a reference to pool and invpool
    refcount::Threads.Atomic{Int}
end

It represents a N-dimensional array with element type T.
The internal representation of data is that each unique value in a PooledArray gets an integer
representation (reference) of type R. Then for each element of the array the refs field is an array
that is AbstractArray{R, N} and keeps track of what reference value is stored for each entry of an array.

To be able to efficiently work with these reference numbers PooledArray stores two fields:

  • pool that gives information how reference numbers are mapped to actual values of type T stored in the array;
  • invpool that gives an inverse information on what is a reference number of some value of type T.

Since many operations on pooled arrays do not change pool and invpool the PooledArray has an extra optimization
that automatically ensures that two arrays that can be proven to have the same pool and invpool share them.
The refcount field is used to keep track how many such arrays exist. This is used in case when we modify
pool and invpool of some array and want to make sure that we do not modify another PooledArray by accident.

Let us check these properties in practice.

Test driving pooled arrays

Let us create a simple PooledArray first:

julia> using PooledArrays

julia> pa1 = PooledArray(["a", "b", "a", "b", "a", "b"])
6-element PooledVector{String, UInt32, Vector{UInt32}}:
 "a"
 "b"
 "a"
 "b"
 "a"
 "b"

julia> pa1.refs
6-element Vector{UInt32}:
 0x00000001
 0x00000002
 0x00000001
 0x00000002
 0x00000001
 0x00000002

julia> pa1.pool
2-element Vector{String}:
 "a"
 "b"

julia> pa1.invpool
Dict{String, UInt32} with 2 entries:
  "b" => 0x00000002
  "a" => 0x00000001

julia> pa1.refcount
Base.Threads.Atomic{Int64}(1)

We can see that, by default, each entry is recorded as 4-byte UInt32.
Additionally the pa1.refcount tells us that only this pooled array
uses the pool and refpool objects that it references to.

Let us first check what happens when we operate on this array:

julia> pa2 = pa1[[1, 2, 3]]
3-element PooledVector{String, UInt32, Vector{UInt32}}:
 "a"
 "b"
 "a"

julia> pa2.refcount
Base.Threads.Atomic{Int64}(2)

julia> pa1.refcount
Base.Threads.Atomic{Int64}(2)

julia> pa2.pool === pa1.pool
true

julia> pa2.invpool === pa1.invpool
true

As you can see, since pa2 subsets pa1 we knew that they can share their pool and invpool.
The refcount field tells us that two objects reuse them.

Let us now modify the pool of pa2:

julia> pa2[1] = "c"
"c"

julia> pa2.refcount
Base.Threads.Atomic{Int64}(1)

julia> pa1.refcount
Base.Threads.Atomic{Int64}(1)

julia> pa2.pool
3-element Vector{String}:
 "a"
 "b"
 "c"

julia> pa1.pool
2-element Vector{String}:
 "a"
 "b"

As you can see the pools got automatically decoupled and refcount is adjusted accordingly.

In summary, the benefit of pool-sharing is that we can very fast subset PooledArrays without having to re-create pool and invpool.
This makes working with PooledArray fast as long as we do not change the set of values we store in them.

The second important design aspect of PooledArray is the R type. As I have said, by default it is UInt32. However,
for small pools this is inefficient. Therefore you can write:

julia> pa3 = PooledArray(pa1, compress=true)
6-element PooledVector{String, UInt8, Vector{UInt8}}:
 "a"
 "b"
 "a"
 "b"
 "a"
 "b"

julia> Base.summarysize(pa1)
570

julia> Base.summarysize(pa3)
504

As you can see, you can use the compress=true keyword argument to automatically pick the minimal size of R type that is able to keep the pool at hand.
In our case it is UInt8, which would save a lot of memory in case of large arrays.
Why do we use UInt32 by default then?
The reason is that this type is typically efficient enough memory-wise and at the same time it ensures a pool that is large enough in most scenarios.
For example, the limitation of the UInt8 pool is that it can store up to 255 values only:

julia> pa4 = PooledArray(1:255, compress=true);

julia> pa4[1]=256
ERROR: You're using a PooledArray with ref type UInt8, which can only hold 255 values,
and you just tried to add the 256th reference.  Please change the ref type
to a larger int type, or use the default ref type (UInt32).

So you have a tradeoff here. If you are sure you will not change your pool then compress=true is a safe option.
If you know you might need to change the pool you need to pick the R type more carefully.

What are the benefits of pooled arrays?

There are two types of benefits of PooledArray. The first is memory footprint, the second is performance.
Let me explain them by example.

First create two large vectors storing strings:

julia> v1 = ["x$(isodd(i))" for i in 1:10^6];

julia> v2 = PooledArray(v1, compress=true);

julia> Base.summarysize(v1)
21500040

julia> Base.summarysize(v2)
1000507

As you can see there is a significant compression gain in our example by using a PooledArray.

The second benefit is performance, especially in combination with DataFrames.jl:

julia> using DataFrames

julia> df = DataFrame(; v1, v2)
1000000×2 DataFrame
     Row │ v1      v2
         │ String  String
─────────┼────────────────
       1 │ xtrue   xtrue
       2 │ xfalse  xfalse
       3 │ xtrue   xtrue
    ⋮    │   ⋮       ⋮
  999998 │ xfalse  xfalse
  999999 │ xtrue   xtrue
 1000000 │ xfalse  xfalse
       999994 rows omitted

Now let us perform two example operations (I am measuring the second timing to avoid counting compilation time).

The first is aggregation:

julia> combine(groupby(df, :v1), nrow);

julia> @time combine(groupby(df, :v1), nrow);
  0.025122 seconds (201 allocations: 31.271 MiB)

julia> combine(groupby(df, :v2), nrow);

julia> @time combine(groupby(df, :v2), nrow);
  0.002766 seconds (227 allocations: 7.643 MiB)

As you can see doing aggregation when grouping by PooledArray is much faster.

The second example is innerjoin:

julia> df_ref = df[1:2, :]
2×2 DataFrame
 Row │ v1      v2
     │ String  String
─────┼────────────────
   1 │ xtrue   xtrue
   2 │ xfalse  xfalse

julia> df_ref.val = 1:2
1:2

julia> innerjoin(df, df_ref, on=:v1, makeunique=true);

julia> @time innerjoin(df, df_ref, on=:v1, makeunique=true);
  0.057885 seconds (248 allocations: 36.741 MiB, 23.00% gc time)

julia> innerjoin(df, df_ref, on=:v2, makeunique=true);

julia> @time innerjoin(df, df_ref, on=:v2, makeunique=true);
  0.024692 seconds (265 allocations: 43.416 MiB)

And again we see that joining on :v2 is faster than on :v1.

When using pooled arrays might not be a good idea?

There are three cases when you might not see benefits from using PooledArray.

The first is when you have many unique values in your data. Then you have to pay the price
of storing refs, pool, and invpool objects and all of them will be large.

The second is if you have a value that you store that has a small memory footprint, e.g. Bool
and you did not use compress=true. In such a case refs will take more memory than original data would.

The third case is when you create multiple copies of a PooledArray and modify its pool. In such a case
the cost of copying of the pool and invpool fields might be non-negligible. Let me show you a practical
example of the third situation:

julia> df2 = DataFrame(id=1:10^6, v=PooledArray(repeat(["x$i" for i in 1:1000], 1000)))
1000000×2 DataFrame
     Row │ id       v
         │ Int64    String
─────────┼─────────────────
       1 │       1  x1
       2 │       2  x2
       3 │       3  x3
    ⋮    │    ⋮       ⋮
  999998 │  999998  x998
  999999 │  999999  x999
 1000000 │ 1000000  x1000
        999994 rows omitted

julia> df3 = DataFrame(id=1:10^6, v=repeat(["x$i" for i in 1:1000], 1000));

Note that df2 and df3 are identical except that in df2 the :v column is pooled and in df3 it is not.

Now let us test outerjoin on this data:

julia> outerjoin(df2, df2, on=:id, makeunique=true);

julia> @time outerjoin(df2, df2, on=:id, makeunique=true);
  0.065559 seconds (326 allocations: 30.951 MiB)

julia> outerjoin(df3, df3, on=:id, makeunique=true);

julia> @time outerjoin(df3, df3, on=:id, makeunique=true);
  0.036927 seconds (274 allocations: 38.400 MiB)

Note that working with non-pooled data is faster. If we check innerjoin this is not the case:

julia> innerjoin(df2, df2, on=:id, makeunique=true);

julia> @time innerjoin(df2, df2, on=:id, makeunique=true);
  0.018188 seconds (210 allocations: 30.528 MiB)

julia> innerjoin(df3, df3, on=:id, makeunique=true);

julia> @time innerjoin(df3, df3, on=:id, makeunique=true);
  0.029364 seconds (206 allocations: 38.157 MiB)

What is going on here? Let us look at the output:

julia> outerjoin(df2, df2, on=:id, makeunique=true)
1000000×3 DataFrame
     Row │ id       v        v_1
         │ Int64    String?  String?
─────────┼───────────────────────────
       1 │       1  x1       x1
       2 │       2  x2       x2
       3 │       3  x3       x3
    ⋮    │    ⋮        ⋮        ⋮
  999998 │  999998  x998     x998
  999999 │  999999  x999     x999
 1000000 │ 1000000  x1000    x1000
                  999994 rows omitted

julia> innerjoin(df2, df2, on=:id, makeunique=true)
1000000×3 DataFrame
     Row │ id       v       v_1
         │ Int64    String  String
─────────┼─────────────────────────
       1 │       1  x1      x1
       2 │       2  x2      x2
       3 │       3  x3      x3
    ⋮    │    ⋮       ⋮       ⋮
  999998 │  999998  x998    x998
  999999 │  999999  x999    x999
 1000000 │ 1000000  x1000   x1000
                999994 rows omitted

Note that outerjoin changes the element type of v and v_1 columns, so pool and invpool need to be re-created twice,
which takes time and memory.
In innerjoin the element type is not changed so pool and invpool in the output are reused from the source data frame.

Conclusions

In summary:

  • PooledArray is useful if you have data that has many duplicates.
  • The benefits of using PooledArray are lower memory footprint and the fact that some operations on it can be faster (e.g. groupby in DataFrames.jl).
  • The biggest benefit of PooledArray is when you do not change its pool of values. In such a case the pool and invpool objects are created only once
    and are reused in arrays derived from the source array.
  • Remember to carefully choose the type of the reference used in PooledArray. By default it is UInt32, but you can pick a smaller type to get even better compression
    at the expense of smaller number of unique values that your pooled array can store.

I hope you find the tips shared in this post useful in your data analysis processes.

Basic data structures drills in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/11/10/pe215.html

Introduction

Recently I have written several posts on tips & tricks regarding the usage of Base Julia functionalities.
Today I thought of presenting a solution to a bigger problem that would feature some of the basic data structures available in Julia.

For this I have chosen to solve the Project Euler problem 215.

The post was written under Julia 1.9.2, Graphs.jl 1.9.0, and Chain.jl 0.5.0.

The problem

The Project Euler problem 215 is stated as follows:

Consider the problem of building a wall out of 2×1 and 3×1 bricks (horizontal x vertical dimensions)
such that, for extra strength, the gaps between horizontally-adjacent bricks never line up in consecutive layers,
i.e. never form a “running crack”.
There are eight ways of forming a crack-free 9×3 wall, written W(9, 3) = 8. Calculate W(32, 10).

(I encourage you to visit the source web page for a visualization.)

The solution approach

The basic approach to the solution of the problem can be done in the following steps:

  1. Compute the list of ways a single layer of the wall can be built.
  2. Compute which layers can be adjacent in the wall.
  3. Compute the number of possibilities of building the full wall.

Let us go through these steps on the 9×3 case for which we know the solution.

Compute the list of ways a single layer can be built

For the first step we define the following function:

function list_valid_layers(n::Integer)
    function build(part, len)
        if n-1 <= len + 2 <= n
            push!(valid, BitSet(part))
        end
        for i in 2:3
            if len + i < n
                push!(part, len + i)
                build(part, len + i)
                pop!(part)
            end
        end
    end

    valid = BitSet[]
    build(Int[], 0)
    return valid
end

It takes the layer length n as an input and returns a vector of possible compositions of a layer.
Note that in order to identify the composition of a layer it is enough to keep track of where
the “cracks” in the wall are. For performance we keep this information in BitSet.

The list of compositions is created recursively in the build inner functions. We first check if the
wall would be finished by adding a brick of width 2 or 3. If yes, we store a BitSet pattern of cracks
in the valid vector. If not we try to add another brick of width 2 or 3 to the wall and call the
build function recursively. Note the use of push! and pop! functions in this part of code. It allows
us to reuse a single part vector to keep track of the state of the computations.

Let us check the list of valid layers of width 9:

julia> v9 = list_valid_layers(9)
5-element Vector{BitSet}:
 BitSet([2, 4, 6])
 BitSet([2, 4, 7])
 BitSet([2, 5, 7])
 BitSet([3, 5, 7])
 BitSet([3, 6])

We see that we have 5 such ways. In one of them there are 3 bricks of with 3.
In the remaining we have a single brick of with 3 and four bricks of width 2 (giving 4 ways to build a layer).

Compute which layers can be adjacent in the wall

Two layers can be adjacent in our wall if the intersection of their cracks is empty. Let us keep track of this information
in a graph:

julia> using Graphs

julia> function build_graph(valid::Vector)
           g = Graph(length(valid))
           for i in 1:length(valid), j in i+1:length(valid)
               isempty(valid[i] ∩ valid[j]) && add_edge!(g, i, j)
           end
           return g
       end
build_graph (generic function with 1 method)

julia>

julia> g9 = build_graph(v9)
{5, 3} undirected simple Int64 graph

julia> collect(edges(g9))
3-element Vector{Graphs.SimpleGraphs.SimpleEdge{Int64}}:
 Edge 1 => 4
 Edge 2 => 5
 Edge 3 => 5

We see that there are only three allowed pairs of adjacent layers:

  • [2, 4, 6] and [3, 5, 7];
  • [2, 4, 7] and [3, 6];
  • [2, 5, 7] and [3, 6].

Note that when building a graph we used a simple graph. The reason is that the adjacency relation is symmetric
(if layer i can follow layer j then the opposite is also true).

Finally note that I iterate i and j assuming 1-based indexing of valid. But this assumption is met as I
require that it is a Vector.

Compute the number of possibilities of building the full wall

What is left is computing the final number of allowed layer compositions.
Here is a function that performs this task:

function count_layers(graph::SimpleGraph, depth::Integer)
    layer_count = fill(1, nv(graph), depth)
    for k in 2:depth, i in 1:nv(graph)
        layer_count[i, k] = sum(layer_count[j, k-1] for j in neighbors(graph, i))
    end
    return layer_count
end

Note that we keep the result in a layer_count matrix. Its layer_count[i, k] entry
tells us in how many ways we can construct a wall having k layers that ends with layer i
(where i is the location of layer in the vector returned by list_valid_layers).

To get the desired result we will need to count the sum of all possibilities from the final layer:

julia> count_layers(g9, 3)
5×3 Matrix{Int64}:
 1  1  1
 1  1  2
 1  1  2
 1  1  1
 1  2  2

julia> sum(count_layers(g9, 3)[:, end])
8

We can see that the result is 8, which is the same as in the problem statement.
So, hopefully we are ready to solve the problem of computing W(32, 10).

Computation of W(32, 10)

To compute W(32, 10) we need to invoke the functions we have defined in this post in sequence.
This kind of task is a natural application of Chain.jl @chain function. So let us test it:

using Chain

@chain 32 begin
    list_valid_layers
    build_graph
    count_layers(10)
    sum(_[:, end])
end

As usual in this blog, I am not presenting the answer, to encourage you to try solving the puzzle yourself.
However, let me remark that the obtained number is relatively large, so you should carefully check if we do not hit the Int64 overflow issue.
If you are not sure what the problem could be check out my recent post, where I explain it.

Conclusions

As a conclusion let us check how long it takes to solve the problem on my laptop:

julia> @elapsed @chain 32 begin
           list_valid_layers
           build_graph
           count_layers(10)
           sum(_[:, end])
       end
0.4039108

The timing of around 0.4 seconds is not that bad, bearing in mind that we used a brute force way to solve the problem.
However, to get this timing keep in mind the following data structures that we used
(not all of them were performance bottleneck in this problem, but I think it is worth to recap them):

  • push! and pop! updates of a vector in a recursion (limiting the number of allocations we needed to make);
  • use of BitSet to keep track of the “cracks” in a layer (taking advantage of the fact that our data was small);
  • use of SimpleGraph to efficiently navigate the allowed links between walls;
  • use of a matrix to incrementally count the number of allowed options.

In all of these steps we took advantage of the fact that for loops in Julia are fast. Note that I took care
to put the code inside functions not only to have reusable components, but also to make sure that
Julia will efficiently execute the code.

Enjoy!