New Ways to Compile Julia

By: Dr. Jeff Bezanson

Re-posted from: https://info.juliahub.com/blog/new-ways-to-compile-julia-blog

At JuliaCon Local in Eindhoven in December 2023, Jeff Bezanson, co-creator of Julia and co-founder of JuliaHub, presented a talk on the state of Julia binaries and the goal to generate smaller ones. A common request by developers in the Julia developer survey is the ability to produce smaller, self-contained binaries. In the Eindhoven talk, Jeff shared a graph showing the progression of binary sizes generated by PackageCompiler.jl. There has been significant progress since then, and the latest update shows a huge drop in size, thanks to the trimming mechanism.

In-Place Modifications

By: Jonathan Carroll

Re-posted from: https://jcarroll.com.au/2024/09/25/in-place-modifications/

In this post I explore some differences between R, python, julia, and APL in
terms of mutability, and try to make something that probably shouldn’t exist.

I watched this code_report video which describes
a leetcode problem;

You are given an integer array nums, an integer k, and an integer multiplier.

You need to perform k operations on nums. In each operation:

  • Find the minimum value x in nums. If there are multiple occurrences of the minimum value, select the one that appears first.
  • Replace the selected minimum value x with x * multiplier.

Return an integer array denoting the final state of nums after performing all k operations.

Conor’s python solution in the video was

def getFinalState(nums, k, m): 
  for _ in range(k): 
    i = nums.index(min(nums)) 
    nums[i] *= m
  return nums

x = [2, 1, 3, 5, 6]
k = 5
mult = 2

getFinalState(x, k, mult)
## [8, 4, 6, 5, 6]

and, as always, I wanted to see how I’d do that in R. I came up with this

getFinalState = function(nums, k, m) {
  for (i in 1:k) {
    m <- which.min(nums)[1]
    nums[m] <- mult * nums[m]
  }
  nums
}

x <- c(2, 1, 3, 5, 6)
k <- 5
mult <- 2

getFinalState(x, k, mult)
## [1] 8 4 6 5 6

It’s worth noting that I can’t use a map in this function because iterations
are dependent; the minimum value at any iteration depends on the previous
values.

I also had a chance to discuss this solution with some APL’ers at a meetup and
a J solution was presented, but I don’t think I wrote it down.

My solution is nearly word-for-word the same as the python solution with a
couple of notable exceptions arising from the difference between the two
languages:

First, R has which.min() as a built-in rather than needing to query the index
of the minimum value (and two references to nums). Also, R has no compound
assignment like x *= 2 which modifies in-place – the closest thing I can think
of is the %<>% operator in {magrittr} (not re-exported in {dplyr} because this
behaviour is considered bad practice in R, despite not really being “in-place”)

library(magrittr)

m <- data.frame(x = 1:6, y = letters[1:6])
m
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
m %<>% head(2)
m
##   x y
## 1 1 a
## 2 2 b

although I can certainly see the case for it – this operator avoids repeating
the variable being used and assigned, because the alternative using the
traditional pipe is

m <- data.frame(x = 1:6, y = letters[1:6])
m
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
m <- m %>% head(2)
m
##   x y
## 1 1 a
## 2 2 b

One could argue that writing out even a longer variable name twice still makes
it clear that shadowing is taking place; the value is being overwritten with
a new value, but it does feel a little frustrating to have to type it out twice

important_variable <- important_variable * 2

Back to my R solution, the indexing at a specific set of values got me thinking
that it would be clean if we could pass a function to [ so that we could
write

nums[which.min] <- value

(maybe not so much for this example where m is used twice, but it piqued my
interest)

Let’s say I want to set all the even values of a vector to some other value.
That’s easy enough to do

x[x %% 2 == 0] <- 0

but I don’t love that it requires two references to x, which may (should?) be
a much longer name

important_variable[important_variable %% 2 == 0] <- 0

I want something like x[f] <- y to set the values of x where f(x) is
TRUE to y. This seemed like it might be possible, maybe with a function
method to [<-, but [<- dispatches on the class of x, not what’s inside
[, so no dice. In theory (which will never happen) the built-in [<- could
have some branch logic for dealing with a function passed as the indices to be
modified, but I’m not about to go rebuilding R from source myself just to play
with that idea.

Nonetheless, if I define some functions that do accomplish this

is_even <- function(z) z %% 2 == 0

set_if <- function(x, f, value) {
  x[f(x)] <- value
  x
}

then I can try this out on a vector

a <- 1:10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
set_if(a, is_even, 0)
##  [1] 1 0 3 0 5 0 7 0 9 0
a # unchanged
##  [1]  1  2  3  4  5  6  7  8  9 10

It works, but I’m back to having to write a <- do_stuff(a) because a isn’t
actually modified by this function.

Ideally, my function would operate the same as this does

a <- 1:10
a[is_even(a)] <- 0
a
##  [1] 1 0 3 0 5 0 7 0 9 0

which does modify a in-place; R is not entirely pure, and does occasionally
allow what looks like direct mutation, though under the hood, it’s not – a new
object is actually created

# not using a range e.g. 1:n because that's internally 
# a "compact" representation
a <- c(2, 3, 4)
.Internal(inspect(a))
## @63a4d9b05be8 14 REALSXP g0c3 [REF(2)] (len=3, tl=0) 2,3,4
a[2] <- 9
.Internal(inspect(a))
## @63a4d9b0fbf8 14 REALSXP g0c3 [REF(1)] (len=3, tl=0) 2,9,4

Note that the memory address has changed.

If I was working with a language which did support (enable?) modify-in-place
then that might look like

def is_even(x):
   return x % 2 == 0

def set_if(x, f, value):
     for i in range(len(x)):
         if f(x[i]):
             x[i] = value

a = list(range(10))
a
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
set_if(a, is_even, 0)
a
## [0, 1, 0, 3, 0, 5, 0, 7, 0, 9]

Now, that’s not always a great thing. In such a language with mutable structures
(e.g. lists) we can do maddening things like this

x = [3, 4, 5]
y = x
y is x
## True
y[1] = 9
x # still 'bound' to y
## [3, 9, 5]

Here, is means “are these two things identical in the sense of referring to
the same block of memory”, noting that literals (e.g. single numbers) are
referenced that way, but tuples aren’t

abc = (11, 99)
xyz = (11, 99)
abc is xyz
## False
abc == xyz
## True

The big question is can I hack together some solution that does work in-place
in R? Yeah, with some ill-advised calls

set_if <- function(x, f, value) {
  # can't use <<- because the value passed in as the x argument isn't 
  # necessarily named 'x' in the parent scope
  .x <- x
  .x[f(.x)] <- value
  e <- parent.env(environment())
  assign(deparse(substitute(x)), .x, pos = e)
  invisible(.x)
}


a <- 1:10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
set_if(a, is_even, 0)
a
##  [1] 1 0 3 0 5 0 7 0 9 0

As I note in the comment there, I can’t use the super-assignment arrow <<-
inside this function because I don’t know the name of the variable I’m updating;
it needs to be deparsed from the incoming argument.

This means that it works regardless of the name of the variable being modified

b <- 10:20
b
##  [1] 10 11 12 13 14 15 16 17 18 19 20
set_if(b, is_even, 0)
b
##  [1]  0 11  0 13  0 15  0 17  0 19  0

I tried to think of some other languages which might support this sort of in-place
set_if(x, f, value) modification and (Dyalog) APL was worth a thought.

    ⍝ create a vector from 1 to 10
    x←⍳10
    x
1 2 3 4 5 6 7 8 9 10

    ⍝ the function {0=2|⍵} calculates a boolean vector with 
    ⍝ 1 where the value is even
    {0=2|⍵} x
0 1 0 1 0 1 0 1 0 1

    ⍝ the `@` operator takes a value (or function) on the left and 
    ⍝ a function (or boolean values) on the right and applies it to the 
    ⍝ other argument on the right
    0@{0=2|⍵} x 
1 0 3 0 5 0 7 0 9 0

    ⍝ alternatively a point-free function defined as the negation (`~`) of a 
    ⍝ binding (`∘`) of the value 2 to modulo (`|`); the negation is needed
    ⍝ otherwise this returns the result of the modulo, not where it is 0
    0@(~2∘|)⍳10
1 0 3 0 5 0 7 0 9 0

    ⍝ x is, however, unchanged as APL is typically immutable
    x
1 2 3 4 5 6 7 8 9 10

So there’s no way to do the in-place modification. it is nice, though, that
0@(~2∘|)x only refers to x once.

Julia makes a nice distinction between functions which mutate arguments and
those which don’t; (by convention) the former are named ending with an
exclamation mark, e.g.

vec = collect(1:5)
## 5-element Vector{Int64}:
##  1
##  2
##  3
##  4
##  5
# non-mutating
reverse(vec)
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1
vec
## 5-element Vector{Int64}:
##  1
##  2
##  3
##  4
##  5
# mutating
reverse!(vec)
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1
vec
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1

In julia, the iseven() function is already built-in, but vectorisation is
explicit via a broadcast operator . and the setting of even values to 0
looks like

x = collect(1:10);
x[iseven.(x)] .= 0;
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

which looks very much like the R version with some dots where scalar functions
are vectorised. If I don’t use the last . to perform vectorised assignment,
the error tells me that the failure involved the setindex! function which does
sound like what I want, but this doesn’t work

setindex!(x, 0, iseven.(x))

because it’s trying to assign the value 0 multiple times and I only provided one
of them. Instead,

x = collect(1:10);
setindex!(x, zeros(Int8, 5), iseven.(x));
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

does work, but I had to manually count how many 0 entries this requires, so the
[ approach seems cleaner. Either way, I’ve had to explicitly calculate
iseven(x) and pass that result somewhere.

Since Julia allows users to extend methods, I could do that modification myself!

import Base.setindex! 
  
function setindex!(A::Vector{Int64}, v::Int64, f::Function) 
  A[f.(A)] .= v
end
## setindex! (generic function with 240 methods)
x = collect(1:10);
setindex!(x, 0, iseven);
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

which I could just as easily call set_if!

set_if! = setindex!;
x = collect(1:10);
set_if!(x, 0, iseven);
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

Nice! I do wonder if I can “hack” (ahem, extend) Julia’s [ to get my prized
x[f] = 0 solution but I doubt it’s worth it when the above does the right
thing.

I don’t imagine I’ll package up my set_if() anywhere, and I should probably
even avoid using it myself, but it’s been an interesting journey thinking about
this stuff. Maybe there’s a better way to do it? Maybe there’s a language which
better supports something like that? If you know, or you have comments or
suggestions, I can be found on
Mastodon or use the comment section below.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.3 (2024-02-29)
##  os       Pop!_OS 22.04 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_AU.UTF-8
##  ctype    en_AU.UTF-8
##  tz       Australia/Adelaide
##  date     2024-09-25
##  pandoc   3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.19    2024-02-01 [1] CRAN (R 4.3.3)
##  bookdown      0.36    2023-10-16 [1] CRAN (R 4.3.2)
##  bslib         0.8.0   2024-07-29 [1] CRAN (R 4.3.3)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
##  callr         3.7.3   2022-11-02 [3] CRAN (R 4.2.2)
##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.3)
##  crayon        1.5.2   2022-09-29 [3] CRAN (R 4.2.1)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.2)
##  digest        0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
##  ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
##  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.3.3)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
##  fs            1.6.4   2024-04-25 [1] CRAN (R 4.3.3)
##  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.3)
##  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.2)
##  httpuv        1.6.12  2023-10-23 [1] CRAN (R 4.3.2)
##  icecream      0.2.1   2023-09-27 [1] CRAN (R 4.3.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.3)
##  JuliaCall     0.17.5  2022-09-08 [1] CRAN (R 4.3.3)
##  knitr         1.48    2024-07-07 [1] CRAN (R 4.3.3)
##  later         1.3.1   2023-05-02 [1] CRAN (R 4.3.2)
##  lattice       0.22-5  2023-10-24 [4] CRAN (R 4.3.1)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
##  magrittr    * 2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
##  Matrix        1.6-5   2024-01-11 [4] CRAN (R 4.3.3)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.3)
##  mime          0.12    2021-09-28 [1] CRAN (R 4.3.3)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.2)
##  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.2)
##  pkgload       1.3.3   2023-09-22 [1] CRAN (R 4.3.2)
##  png           0.1-8   2022-11-29 [1] CRAN (R 4.3.2)
##  prettyunits   1.2.0   2023-09-24 [3] CRAN (R 4.3.1)
##  processx      3.8.3   2023-12-10 [3] CRAN (R 4.3.2)
##  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.2)
##  promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.2)
##  ps            1.7.6   2024-01-18 [3] CRAN (R 4.3.2)
##  purrr         1.0.2   2023-08-10 [3] CRAN (R 4.3.1)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.3)
##  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.2)
##  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.2)
##  reticulate    1.34.0  2023-10-12 [1] CRAN (R 4.3.2)
##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.3.3)
##  rmarkdown     2.28    2024-08-17 [1] CRAN (R 4.3.3)
##  rstudioapi    0.15.0  2023-07-07 [3] CRAN (R 4.3.1)
##  sass          0.4.9   2024-03-15 [1] CRAN (R 4.3.3)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
##  shiny         1.7.5.1 2023-10-14 [1] CRAN (R 4.3.2)
##  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.3.3)
##  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.3.3)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.2)
##  usethis       3.0.0   2024-07-29 [1] CRAN (R 4.3.3)
##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
##  xfun          0.47    2024-08-17 [1] CRAN (R 4.3.3)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.2)
##  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
## 
##  [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.3
##  [2] /usr/local/lib/R/site-library
##  [3] /usr/lib/R/site-library
##  [4] /usr/lib/R/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /home/jono/.virtualenvs/r-reticulate/bin/python
##  libpython:      /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
##  pythonhome:     /home/jono/.virtualenvs/r-reticulate:/home/jono/.virtualenvs/r-reticulate
##  version:        3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
##  numpy:           [NOT FOUND]
##  
##  NOTE: Python version was forced by VIRTUAL_ENV
## 
## ──────────────────────────────────────────────────────────────────────────────