By: Jonathan Carroll
Re-posted from: https://jcarroll.com.au/2023/06/10/reflecting-on-macros/
I’ve been following the drama of the RustConf Keynote Fiasco (RKNF, per @fasterthanlime)
from a great distance – I’m not involved in that community beyond starting to learn
the language. But the controversial topic itself Compile-Time Reflection seemed like something interesting I could learn something about.
A good start is usually a Wikipedia page, and I found one called “Reflective programming” under the “MetaProgramming”
category, where it defines
reflection is the ability of a process to examine, introspect, and modify its own structure and behavior
That sounds somewhat familiar from what metaprogramming I’ve read about. One of the
great features of R is the ability to inspect and rewrite functions, for example,
the body of the sd()
function (calculating the standard deviation of the input) looks
like
sd
## function (x, na.rm = FALSE)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
## na.rm = na.rm))
## <bytecode: 0x55a797b52960>
## <environment: namespace:stats>
Trying to extract a “component” of that function results in the ever-classic error
sd[1]
## Error in sd[1]: object of type 'closure' is not subsettable
However, using body()
we can get to the components of the function
body(sd)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
## na.rm = na.rm))
body(sd)[1]
## sqrt()
and we can even mess with it (meaninglessly, in this case)
vals <- c(1, 3, 5, 7)
sd(vals)
## [1] 2.581989
my_sd <- sd
body(my_sd)[1] <- call("log")
my_sd # note that the function now (wrongly) uses log() instead of sqrt()
## function (x, na.rm = FALSE)
## log(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
## na.rm = na.rm))
## <environment: namespace:stats>
my_sd(vals)
## [1] 1.89712
The Wikipedia page lists the following example of reflection in R
# Without reflection, assuming foo() returns an S3-type object that has method "hello"
obj <- foo()
hello(obj)
# With reflection
class_name <- "foo"
generic_having_foo_method <- "hello"
obj <- do.call(class_name, list())
do.call(generic_having_foo_method, alist(obj))
Using a more concrete data object and class, e.g. tibble::tibble
and summary
might be
clearer
library(tibble) # do.call doesn't like pkg::fun as a string
# Without reflection
obj <- tibble(a = 1:2, b = 3:4)
summary(obj)
## a b
## Min. :1.00 Min. :3.00
## 1st Qu.:1.25 1st Qu.:3.25
## Median :1.50 Median :3.50
## Mean :1.50 Mean :3.50
## 3rd Qu.:1.75 3rd Qu.:3.75
## Max. :2.00 Max. :4.00
# With reflection
class_name <- "tibble"
generic_having_foo_method <- "summary"
obj <- do.call(class_name, list(a = 1:2, b = 3:4))
obj
## # A tibble: 2 × 2
## a b
## <int> <int>
## 1 1 3
## 2 2 4
do.call(generic_having_foo_method, alist(obj))
## a b
## Min. :1.00 Min. :3.00
## 1st Qu.:1.25 1st Qu.:3.25
## Median :1.50 Median :3.50
## Mean :1.50 Mean :3.50
## 3rd Qu.:1.75 3rd Qu.:3.75
## Max. :2.00 Max. :4.00
So, maybe it’s more to do with being able to use a string containing the “name” of
a function and go and find that function, or just the ability to generate functions
on-demand based on non-function objects (?). Please, let me know if there’s a more
enlightening explanation.
I still don’t think I understand that at all (more time required) but I did note in
some additional research that “reflection” and “macros” are two very similar concepts. Now
macros are something I’ve heard of at least, so I was off to do some more research.
Unfortunately, web searches for the terms “reflection” and “macro” turn up a lot of
macro-lens photography results.
I’ve heard of macros in Julia where they’re used to “rewrite” an expression. This is a nice rundown
of the process, as are the official docs. These are
used in many places. One up-and-coming place is the new Tidier.jl which implements the tidyverse (at least the most common dplyr
and purrr
parts)
using macros (denoted with a @
prefix)
using Tidier
using RDatasets
movies = dataset("ggplot2", "movies");
@chain movies begin
@mutate(Budget = Budget / 1_000_000)
@filter(Budget >= mean(skipmissing(Budget)))
@select(Title, Budget)
@slice(1:5)
end
Rust uses macros for printing (amongst other things); println!()
is a macro,
apparently at least in part because it needs to be able to take an arbitrary
number of args, so one can write
>> println!("a = {}, b = {}, c = {}", 1, 2, 3)
a = 1, b = 2, c = 3
Rust has a shorthand macro for creating a new vector vec!()
>> let v = vec![2, 3, 4];
and also has the “debug macro” dbg!()
which is super handy – it prints out the expression you wrap, plus the value, so
you can inspect the current state with e.g.
>> dbg!(&v);
[src/lib.rs:109] &v = [
2,
3,
4,
]
This last one would be great to have in R… as a side note, we could construct a
simple version with {rlang}
dbg <- function(x) {
ex <- rlang::f_text(rlang::enquos(x)[[1]])
ret <- rlang::eval_bare(x)
message(glue::glue("DEBUG: {ex} = {ret}"))
ret
}
a <- 1
b <- 3
x <- dbg(a + b)
## DEBUG: a + b = 4
y <- dbg(2*x + 3)
## DEBUG: 2 * x + 3 = 11
z <- 10 + dbg(y*2)
## DEBUG: y * 2 = 22
In all of these examples of macros, the code that is run is different to the code you write
because the macro makes some changes before executing.
In R there isn’t a “proper” way to do this but we do have ways to manipulate code
and we do have ways to retrieve “unparsed” input, e.g. substitute()
. A quick look
for “macros in R” turned up a function in a package that is more than 20 years old (I was
only starting University when this came out and knew approximately 0 programming) and
comes with a journal article; gtools::defmacro()
by Thomas Lumley
has a construction for writing something that behaves like a macro.
That article is from 2001 when R 1.3.1 was being released. The example code made me do a double-take
library(gtools)
####
# macro for replacing a specified missing value indicator with NA
# within a dataframe
###
setNA <- defmacro(df, var, values,
expr = {
df$var[df$var %in% values] <- NA
}
)
# create example data using 999 as a missing value indicator
d <- data.frame(
Grp = c("Trt", "Ctl", "Ctl", "Trt", "Ctl", "Ctl", "Trt", "Ctl", "Trt", "Ctl"),
V1 = c(1, 2, 3, 4, 5, 6, 999, 8, 9, 10),
V2 = c(1, 1, 1, 1, 1, 2, 999, 2, 999, 999),
stringsAsFactors = TRUE
)
d
## Grp V1 V2
## 1 Trt 1 1
## 2 Ctl 2 1
## 3 Ctl 3 1
## 4 Trt 4 1
## 5 Ctl 5 1
## 6 Ctl 6 2
## 7 Trt 999 999
## 8 Ctl 8 2
## 9 Trt 9 999
## 10 Ctl 10 999
# Try it out
setNA(d, V1, 999)
setNA(d, V2, 999)
d
## Grp V1 V2
## 1 Trt 1 1
## 2 Ctl 2 1
## 3 Ctl 3 1
## 4 Trt 4 1
## 5 Ctl 5 1
## 6 Ctl 6 2
## 7 Trt NA NA
## 8 Ctl 8 2
## 9 Trt 9 NA
## 10 Ctl 10 NA
Wait – I thought… there’s no assignment in those last lines, but the data is
being modified!?! Sure enough, the internals of defmacro
make it clear that this
is the case, but it seemed like magic. Essentially, this identifies what needs to
happen, what it needs to happen to (via substitute()
), and makes it happen in the parent.frame()
. Neat! So, what else can we do with this?
I thought about it for a while and realised what could be a [te|ho]rrific one…
Just a couple of weeks ago, Danielle Navarro made a wish
not for the first time I find myself wishing that push() and pop() were S3 generics in #rstats
Now, if you’re not familiar with those, pop(x)
removes the first element of a structure x
(e.g. a vector) and returns that first value, leaving the original object x
containing only the remaining elements, whereas push(x, y)
inserts the value y
as the first element of x
, moving the remaining elements down the line. These show up more in object-oriented languages, but they
don’t exist in R.
If we define a vector a
containing some values
a <- c(3, 1, 4, 1, 5, 9)
and we wish to extract the first value, we can certainly do so with
a[1]
## [1] 3
but, due to the nature of R, the vector a
is unchanged
a
## [1] 3 1 4 1 5 9
Instead, we could remove the first value of a
with
a[-1]
## [1] 1 4 1 5 9
but again, a
remains unchanged – in order to modify a
we must redefine it as e.g.
a <- a[-1]
a
## [1] 1 4 1 5 9
If we wanted to build a pop()
function, we could use substitute()
to figure out
what the passed input object was, perform the extraction of the first element, and so on…
But as we’ve just seen, there’s a better way to define that – a macro!
r_pop <- gtools::defmacro(x, expr = {
ret <- x[1]
x <- x[-1]
ret
})
Now, if we use that on a vector
a <- c(3, 1, 4, 1, 5, 9)
r_pop(a)
## [1] 3
a
## [1] 1 4 1 5 9
It works!!!
Danielle wanted a Generic, though, so we can easily make pop()
a Generic and add methods for
some classes (which can be further extended).
To that end, I present a brand new package; {weasel}
This defines pop()
and push()
as Generics with methods defined for vector
s, list
s, and data.frame
s
a <- list(x = c(2, 3), y = c("foo", "bar"), z = c(3.1, 4.2, 6.9))
a
## $x
## [1] 2 3
##
## $y
## [1] "foo" "bar"
##
## $z
## [1] 3.1 4.2 6.9
x <- pop(a)
a
## $y
## [1] "foo" "bar"
##
## $z
## [1] 3.1 4.2 6.9
x
## [1] 2 3
a <- data.frame(x = c(2, 3, 4), y = c("foo", "bar", "baz"), z = c(3.1, 4.2, 6.9))
a
## x y z
## 1 2 foo 3.1
## 2 3 bar 4.2
## 3 4 baz 6.9
x <- pop(a)
a
## x y z
## 2 3 bar 4.2
## 3 4 baz 6.9
x
## x y z
## 1 2 foo 3.1
a <- c(1, 4, 1, 5, 9)
a
## [1] 1 4 1 5 9
push(a, 3)
a
## [1] 3 1 4 1 5 9
a <- data.frame(y = c("foo", "bar", "baz"), z = c(3.1, 4.2, 6.9))
a
## y z
## 1 foo 3.1
## 2 bar 4.2
## 3 baz 6.9
push(a, data.frame(y = 99, z = 77))
a
## y z
## 1 99 77.0
## 2 foo 3.1
## 3 bar 4.2
## 4 baz 6.9
I wrote this (simple) package as a bit of an exercise – I really don’t think you
should actually use it for anything. The “looks like it modifies in-place but actually
doesn’t” is really non-idiomatic for R. Nonetheless, I was really interested to see
that defmacro
can be used as a function definition that the dispatch machinery will respect. The only catch I’ve found so far is that I can’t use ellipses (...
) in the function signature.
I noticed that Dirk Schumacher built a similar defmacro
package more recently, but that appears
to be more aimed at building macros to be expanded on package load (funnily enough, “compile-time macros” – we’ve come full circle). This seems like a great opportunity for “inlining”
some functions. I’ll definitely be digging deeper into that one.
Let me know if you have a better explanation of any of the concepts I’ve (badly) described here;
I’m absolutely just learning and following Julia Evans’ advice about blogging.
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.1.2 (2021-11-01)
## os Pop!_OS 22.04 LTS
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_AU.UTF-8
## ctype en_AU.UTF-8
## tz Australia/Adelaide
## date 2023-06-10
## pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## blogdown 1.17 2023-05-16 [1] CRAN (R 4.1.2)
## bookdown 0.29 2022-09-12 [1] CRAN (R 4.1.2)
## bslib 0.4.1 2022-11-02 [3] CRAN (R 4.2.2)
## cachem 1.0.6 2021-08-19 [3] CRAN (R 4.2.0)
## callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2)
## cli 3.4.1 2022-09-23 [3] CRAN (R 4.2.1)
## crayon 1.5.2 2022-09-29 [3] CRAN (R 4.2.1)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.1.2)
## digest 0.6.30 2022-10-18 [3] CRAN (R 4.2.1)
## ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1)
## evaluate 0.18 2022-11-07 [3] CRAN (R 4.2.2)
## fansi 1.0.3 2022-03-24 [3] CRAN (R 4.2.0)
## fastmap 1.1.0 2021-01-25 [3] CRAN (R 4.2.0)
## fs 1.5.2 2021-12-08 [3] CRAN (R 4.1.2)
## glue 1.6.2 2022-02-24 [3] CRAN (R 4.2.0)
## gtools * 3.9.4 2022-11-27 [1] CRAN (R 4.1.2)
## htmltools 0.5.3 2022-07-18 [3] CRAN (R 4.2.1)
## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.2)
## httpuv 1.6.6 2022-09-08 [1] CRAN (R 4.1.2)
## jquerylib 0.1.4 2021-04-26 [3] CRAN (R 4.1.2)
## jsonlite 1.8.3 2022-10-21 [3] CRAN (R 4.2.1)
## knitr 1.40 2022-08-24 [3] CRAN (R 4.2.1)
## later 1.3.0 2021-08-18 [1] CRAN (R 4.1.2)
## lifecycle 1.0.3 2022-10-07 [3] CRAN (R 4.2.1)
## magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.2.0)
## memoise 2.0.1 2021-11-26 [3] CRAN (R 4.2.0)
## mime 0.12 2021-09-28 [3] CRAN (R 4.2.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2)
## pillar 1.8.1 2022-08-19 [3] CRAN (R 4.2.1)
## pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.2)
## pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.0.1)
## pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.1.2)
## prettyunits 1.1.1 2020-01-24 [3] CRAN (R 4.0.1)
## processx 3.8.0 2022-10-26 [3] CRAN (R 4.2.1)
## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.1.2)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2)
## ps 1.7.2 2022-10-26 [3] CRAN (R 4.2.2)
## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.1.2)
## R6 2.5.1 2021-08-19 [3] CRAN (R 4.2.0)
## Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.1.2)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2)
## rlang 1.0.6 2022-09-24 [1] CRAN (R 4.1.2)
## rmarkdown 2.18 2022-11-09 [3] CRAN (R 4.2.2)
## rstudioapi 0.14 2022-08-22 [3] CRAN (R 4.2.1)
## sass 0.4.2 2022-07-16 [3] CRAN (R 4.2.1)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
## shiny 1.7.2 2022-07-19 [1] CRAN (R 4.1.2)
## stringi 1.7.8 2022-07-11 [3] CRAN (R 4.2.1)
## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.1.2)
## tibble * 3.1.8 2022-07-22 [3] CRAN (R 4.2.2)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.1.2)
## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.1.2)
## utf8 1.2.2 2021-07-24 [3] CRAN (R 4.2.0)
## vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.1.2)
## weasel * 0.1.0 2023-06-09 [1] local
## xfun 0.34 2022-10-18 [3] CRAN (R 4.2.1)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.2)
## yaml 2.3.6 2022-10-18 [3] CRAN (R 4.2.1)
##
## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────