How to safely use the vec and reshape functions in Julia?

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2022/08/12/vec.html

Introduction

Julia users often want to squeeze-out maximum performance from their programs.
In the search for efficiency, they soon discover the vec and reshape
functions that allow for changing of the shape of the input array without
copying data. In this post I want do discuss how these functions work
and share with you the rules I use when deciding if I want to use them.

The post was written under Julia 1.7.2.

The contract

When you first learn some function you must look up its contract in its
docstring. Let us check vec and reshape (I abbreviated the docstrings to
focus on the key parts):

help?> vec
  vec(a::AbstractArray) -> AbstractVector

  Reshape the array a as a one-dimensional column vector.
  Return a if it is already an AbstractVector.
  The resulting array shares the same underlying data as a,
  so it will only be mutable if a is mutable,
  in which case modifying one will also modify the other.

help?> reshape
search: reshape promote_shape

  reshape(A, dims...) -> AbstractArray

  Return an array with the same data as A,
  but with different dimension sizes or number of dimensions.
  The two arrays share the same underlying data, so that the result is mutable
  if and only if A is mutable,
  and setting elements of one alters the values of the other.

In short, both functions allow you to change the shape of some array without
copying of the data. vec always returns a vector, while reshape is more
flexible and allows you to produce an array of any dimension.

Let me show you some use cases of these functions. First, assume I want to
produce a cartesian product of two collections:

julia> collect(Iterators.product('a':'b', 1:3))
2×3 Matrix{Tuple{Char, Int64}}:
 ('a', 1)  ('a', 2)  ('a', 3)
 ('b', 1)  ('b', 2)  ('b', 3)

By default the collect function produced me a matrix. If for some reason
I needed a vector instead I could write:

julia> vec(collect(Iterators.product('a':'b', 1:3)))
6-element Vector{Tuple{Char, Int64}}:
 ('a', 1)
 ('b', 1)
 ('a', 2)
 ('b', 2)
 ('a', 3)
 ('b', 3)

The important benefit of this operation is that vec is non-copying so adding
this step is efficient. Let me give you another example, this time using
broadcasting:

julia> string.(['a' 'b'], 1:3)
3×2 Matrix{String}:
 "a1"  "b1"
 "a2"  "b2"
 "a3"  "b3"

julia> vec(string.(['a' 'b'], 1:3))
6-element Vector{String}:
 "a1"
 "a2"
 "a3"
 "b1"
 "b2"
 "b3"

Now let us have a look at reshape:

julia> reshape(1:6, 2, 3)
2×3 reshape(::UnitRange{Int64}, 2, 3) with eltype Int64:
 1  3  5
 2  4  6

Why reshape would be useful? Consider for example a simple function changing
pairs of consecutive elements of a vector into a tuple. One of the ways
(for sure not the only way) to implement this would be:

julia> totuples(x::AbstractVector) = Tuple.(eachcol(reshape(x, 2, :)))
totuples (generic function with 1 method)

julia> totuples(1:6)
3-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)
 (5, 6)

The dangers

While the vec and reshape functions can be useful there are some risks
of using them. Let me discuss some common pitfalls.

The first is that when you reshape a collection you may leave a permanent mark
in the source that it was used in reshape (even though reshape has no !
as its suffix). This can lead to hard to catch bugs. Let us check the following
code:

julia> x = [1, 2, 3, 4]
4-element Vector{Int64}:
 1
 2
 3
 4

julia> totuples(x)
2-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)

julia> push!(x, 5)
ERROR: cannot resize array with shared data

As you can see, although the use of reshape was done in the totuples
function and the reshaped matrix we created there with reshape(x, 2, :)
is already out of scope the fact that we used reshape on x permanently
disallows its resizing.

The second risk is that vec and reshape may, or may not, create a new
object, as they might just return a source object. Let us check the following
code that extends the original totuples function to accept any
AbstractArray. In the code I write y = vec(x), but the same behavior
would be present with y = reshape(x, :).

julia> function totuples2(x::AbstractArray)
           y = vec(x)
           isodd(length(x)) && push!(y, last(y))
           return totuples(y)
       end
totuples2 (generic function with 1 method)

julia> x = [1;;]
1×1 Matrix{Int64}:
 1

julia> totuples2(x)
1-element Vector{Tuple{Int64, Int64}}:
 (1, 1)

julia> x
1×1 Matrix{Int64}:
 1

julia> x = [1]
1-element Vector{Int64}:
 1

julia> totuples2(x)
1-element Vector{Tuple{Int64, Int64}}:
 (1, 1)

julia> x
2-element Vector{Int64}:
 1
 1

As you can see our totuples2 function left [1;;] unchanged, since it is
a matrix, but [1] was updated. The reason is that vec(x) in this case just
returned its argument.

Finally, as an another application of the same rule, note that that the vec
function (and similarly reshape when reshaping to a vector), may or may not
produce a vector that can be resized:

julia> totuples2([1 2; 3 4])
2-element Vector{Tuple{Int64, Int64}}:
 (1, 3)
 (2, 4)

julia> totuples2(reshape(1:4, 2, 2))
2-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)

julia> totuples2([1;;])
1-element Vector{Tuple{Int64, Int64}}:
 (1, 1)

julia> totuples2(reshape(1:1, 1, 1))
ERROR: MethodError: no method matching resize!(::UnitRange{Int64}, ::Int64)

As you can see, this is tricky, as the function you use, like totuples2 in our
case, might throw an error only in some cases, but work in other cases. In the
totuples2 case the reason is that we use push! only if the length of the
collection is odd.

The point of all these examples is that code using vec or reshape can lead
to hard-to-diagnose errors. The reason is that you might notice the problems
caused by using them much later in the code than when you used them.

Conclusions

The vec and reshape functions are nice utilities and I use them quite often.
However, to safely use them I always follow the following two rules:

  • when writing a function never use reshape on an array that is an argument
    of the function; the reason is, that in some cases reshape will silently
    leave the “no resize” mark on the source vector; if I use reshape I make
    sure that its source is always some short lived object from a local scope;
  • do not resize the vector produced by vec/reshape as such resizing may,
    or may not affect the source and keeping a mental record if this is the case
    is hard (and most likely readers of such code will not be able to easily
    know it); as a softer rule I generally avoid any mutation of the output of
    vec/reshape as it is not guaranteed that it will be mutable.

In short: the best uses for vec and reshape are situations when their source
is a short lived object and you do not want to mutate their output in any way.