Learning to zip stuff in Julia

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2023/10/27/zip.html

Introduction

Today I want to discuss the basics of the design of the zip function in Julia.
This is a relatively introductory topic but, in my experience, wrong usage of this function can lead to hard to catch bugs.

The post was written under Julia 1.9.2.

What does zip do?

Let us start with the description of the zip function from its documentation:

Run multiple iterators at the same time, until any of them is exhausted.
The value type of the zip iterator is a tuple of values of its subiterators.

If it is not fully clear what it does let me give two examples:

julia> [v for v in zip(1:3, 11:13)]
3-element Vector{Tuple{Int64, Int64}}:
 (1, 11)
 (2, 12)
 (3, 13)

julia> [v for v in zip("abcdef", Iterators.cycle([1, 2]))]
6-element Vector{Tuple{Char, Int64}}:
 ('a', 1)
 ('b', 2)
 ('c', 1)
 ('d', 2)
 ('e', 1)
 ('f', 2)

The examples show the following nice features of the zip function:

  • It can work with any iterator, not just e.g. vectors.
    In the second example we used a string and a special cyclic iterator from Iterators module.
    Importantly we can work with iterators that are not indexable.
  • The zip function works until the shortest iterator of all passed to it is exhausted.
    It is often useful when we have some of the iterators that are infinite.
    Again, in our second example Iterators.cycle([1, 2])) is infinite and allowed us to clearly indicate even and odd characters in the string.

The last nice feature of zip is that it is lazy. To understand what it means check this:

julia> zip(1:3, 11:13)
zip(1:3, 11:13)

julia> typeof(zip(1:3, 11:13))
Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}}}

We can see that the zip function by itself does not do any work. Instead, it returns an iterator that can be queried.
That is why in the first examples I used comprehensions to show the values produced by the iterator returned by zip.
Why is it useful? Being lazy avoids excessive memory allocation, which is often important, especially if you work with large data.

Why is zip risky?

So what are the potential problems with zip in practice? The issue is that we often want to use it in scenarios
when we would want to be sure that all iterators passed to zip have the same length. Unfortunately zip does not check it.

Here is a simple example of a problematic definition of a function comupting a dot product:

julia> dot_bad(x, y) = sum(a * b for (a, b) in zip(x, y))
dot_bad (generic function with 1 method)

julia> dot_bad(1:3, 4:6)
32

julia> dot_bad(1:3, 4:7)
32

julia> dot_bad(1:4, 4:6)
32

Typically with such a function we would want it to error if it gets input iterators of unequal length.

In practice, if it is required that the passed iterators must have equal length, they usually support the length function.
Therefore often fix of the definition that is similar to the following is enough:

julia> function dot_safer(x, y)
           @assert length(x) == length(y)
           return sum(a * b for (a, b) in zip(x, y))
       end
dot_safer (generic function with 1 method)

julia> dot_safer(1:3, 4:6)
32

julia> dot_safer(1:3, 4:7)
ERROR: AssertionError: length(x) == length(y)

julia> dot_safer(1:4, 4:6)
ERROR: AssertionError: length(x) == length(y)

Conclusions

In summary: if you use zip always ask yourself a question if you assume that the passed
iterators have the same length. If you do – make sure to check it in your code explicitly.