Tag Archives: code

Julia introspects

When people are introspective, they’re thinking about how their minds work,
about how and why they think what they do.
The Julia language has some impressive facilities for letting you see how the compilers’ mind works.
Using convenient built-in functions that are available both at the REPL1 and in normal code,
Julia allows you to see the layers of internal representation that your code goes through,
from the parsed AST to native assembly code.

This allows you to answer some otherwise difficult questions very easily.
(and allows you to peer into the inner workings of the compiler, which is just plain fun.)

A Simple Question: I wonder if it matters which of these I use?

One of the questions I have pondered is whether two syntaxes for assigning variables
make a performance difference.
Each of these two approaches assigns the same two values to the same two variables.

The most straight-forward approach:

function linear_foo()
  x = 4
  y = 5
end

Sometimes this looks nicer:

function tuple_foo()
  x,y = 4,5
end

The question that we’re specifcally wondering about is whether it matters which syntax we use,
from a speed stand point.
(The concern is that the second syntax might implicitly create a tuple and waste time messing around with it.)
In most languages, if you really cared, you’d make a microbenchmark to get an approximate answer.
Instead, I’m going to go take a look at the optimized version of the AST.
(It’s much easier and more exact than benchmarking. 🙂

In Julia, you can just look at the optimized version of the AST for any generic function2.
All it takes is a single call to code_typed.

The value of the last expression in a Julia function is implictly returned,
so the only change to our first approach is to add the return statement:

julia> code_typed(linear_foo,())
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4 # line 3:
        y = 5
        return 5
    end)))

The only difference in the optimized version of the second one is that it returns a tuple.
This is solely due to the expression evaluating to a tuple.

julia> code_typed(tuple_foo,())
1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4
        y = 5
        return top(tuple)(4,5)::(Int64,Int64)
    end)))

For example:

function tuple_foo2()
  x,y = 4,5
  y
end

becomes

julia> code_typed(tuple_foo2,())
1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4
        y = 5 # line 3:
        return y::Int64
    end)))

As you may have guessed, this is by no means the final internal represetation or the final optimization pass.
These functions can be optimized down to nearly nothing, as we can see if we take a look at the assembly code:

julia> code_native(linear_foo,()) # returns the value of y
    .text
Filename: none
Source line: 3
    push    RBP
    mov RBP, RSP
    mov EAX, 5
Source line: 3
    pop RBP
    ret

julia> code_native(tuple_foo,()) # returns a tuple of (x,y)
    .text
Filename: none
Source line: 2
    push    RBP
    mov RBP, RSP
    mov EAX, 83767488
Source line: 2
    pop RBP
    ret

julia> code_native(tuple_foo2,()) # returns the value of y
    .text
Filename: none
Source line: 3
    push    RBP
    mov RBP, RSP
    mov EAX, 5
Source line: 3
    pop RBP
    ret

Layers

This post will cover five layers of internal representations of Julia code.
Except for the first one, each layer is accessible via a normal generic function
that takes a generic function and a tuple of argument types (to specify which method you want to examine).

If you are uncertain about the signature of the method you’re calling,
the macro @which will be useful to you.3
The internal representation of your code in the compiler is called an Abstract Syntax Tree (AST);
the AST format is specific to Julia.
Julia uses LLVM to create machine-specific assembly code;
LLVM has its own intermediate representation (IR).
The Julia compiler generates LLVM IR to tell LLVM what the generated assembly code should do.
The native assembly code is specific to your computer’s architecture.
You can see the documentation for these functions in the official manual
here.

  1. The AST after parsing
  2. The AST after lowering
  3. The AST after type inference and optimization
  4. The LLVM IR
  5. The assembly code

Layer 1: The AST

When the parser takes your code in (as a String), it will produce an AST (Abstract Syntax Tree).
The AST is the compiler’s representation of your code.
It is like you turning my written sentances into your mental representation of their structure/meaning.
If you’re familiar with writing macros
in Julia, this will be old news to you.
This representation is not saved, so if we want to see it, we’ll need to quote the expression.

julia> :(2 + 2)
:(+(2,2))

Above, we can see that the infix + operator just becomes a function call.
This is identical to if you use + as a normal function:

julia> :(+(2,2))
:(+(2,2))

Slightly more interesting:

julia> :(1 + 2 + 3 + 4 + 5)
:(+(1,2,3,4,5))

julia> :(1 + 2 - 3 - 4 + 5)
:(+(-(-(+(1,2),3),4),5))

julia> :(1-2-3-4-5)
:(-(-(-(-(1,2),3),4),5))

This lets you see that + becomes one function call even with a lot of args,
while - does not.

This is the same quoting used in macros; you can also use a quote block, such as:

quote
  2 + 2
end

Layer 2: The Lowered AST

code_lowered(generic_function, (types_arg_list,))

While quoting will work on any expression, the rest of these layers involve calling
a function that takes a generic function and a tuple of argument types.
For example, code_lowered(linear_foo,()) returns the lowered AST of our function from the start of this post.

code_lowered will return the lowered AST for any method of a generic function.
Lowering in general is the process of moving from surface syntax (highest) to machine code (lowest).
Here, lowering involves transforming the AST in ways that make it simpler.
This includes unnesting some expressions and desugaring some syntax into the function calls indicated.

The lowered AST is stored for every generic function.
This will work on methods you write and on ones in packages and on ones from the base libraries.
code_lowered is a normal generic function: it will work from the REPL and from any Julia code you write.

Examples

You can call it on one of the simple functions we defined earlier:

julia> code_lowered(linear_foo,())
1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,:Any,18},{:y,:Any,18}},{}}, quote  # none, line 2:
        x = 4 # line 3:
        y = 5
        return 5
    end)))

Or you could call it on a built-in function from Base:

julia> code_lowered(+,(Int,Int))
1-element Any Array:
 :($(Expr(:lambda, {:x,:y}, {{},{{:x,:Any,0},{:y,:Any,0}},{}}, quote  # int.jl, line 36:
        return box(Int64,add_int(unbox(Int64,x),unbox(Int64,y)))
    end)))

The + function also has a single-argument method:

julia> +(5)
5

If you want to make a unary tuple, use a trailing comma:

julia> cl = code_lowered(+,(Int,)) #trailing comma to make (Int,) a tuple
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # operators.jl, line 39:
        return x
    end)))

As you can see below, the value you get back is a one-dimensional Any array of Exprs.
Expr is the type used to represent an expression in the AST; you also use them when writing macros.

julia> typeof(cl)
Array{Any,1}

julia> typeof(cl[1]) # Julia Arrays are indexed from 1
Expr

code_lowered returns an Array because it sometimes returns multiple (or 0) values.
It will return an entry for each matching method:

julia> code_lowered(+,(Any,))
3-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # bool.jl, line 35:
        return int(x)
    end)))     
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # operators.jl, line 39:
        return x
    end)))     
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # abstractarray.jl, line 264:
        return x
    end)))

An example of getting no results:

julia> code_lowered(+,(String,String))
0-element Any Array

There is no + for Strings because Julia uses * as the string concatenation operator.

julia> code_lowered(*,(String,String))
1-element Any Array:
 :($(Expr(:lambda, {:(s::top(apply_type)(Vararg,String))}, {{},{{:s,:Any,0}},{}}, quote  # string.jl, line 72:
        return top(apply)(string,s)
    end)))

It’s easier to see what lowering does if you take a look at examples involving control flow.
For example, if you define this function:

function myloop(x::Int)
  result = 0  
  for i=1:x
    result += x
  end
  result
end

You can see a loop in the lowered code:

julia> code_lowered(myloop,(Int,))
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{:result,:#s6,:#s5,:i},{{:x,:Any,0},{:result,:Any,2},{:#s6,:Any,2},{:#s5,:Any,18},{:i,:Any,18}},{}}, quote  # none, line 2:
        result = 0 # line 3:
        #s6 = 1
        #s5 = x
        1: 
        unless top(<=)(#s6,#s5) goto 2
        i = #s6 # line 4:
        result = +(result,x)
        3: 
        #s6 = top(convert)(top(typeof)(#s6),top(+)(1,#s6))
        goto 1
        2: 
        0:  # line 6:
        return result
    end)))

If you want to see what happens to an if-statment, you could use this example:

function lessthan5(x::Int)
  if x < 5
    return true
  else
    return false
  end
end

You can see that, like the loop, this is also lowered into an unless and a goto.

julia> code_lowered(lessthan5,(Int,))
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote  # none, line 2:
        unless <(x,5) goto 0 # line 3:
        return true
        goto 1
        0:  # none, line 5:
        return false
        1: 
    end)))

Layer 3: The Type-inferred, optimized AST

code_typed(generic_function, (types_arg_list,))

code_typed returns the type-inferred and optimized version of the Julia AST.
It is the last layer that is internal to Julia.

How to Call code_typed

You need to be extra careful to use trailing commas for single-argument functions with code_typed:

julia> code_typed(+,(Int))
0-element Any Array

julia> code_lowered(+,(Int))
ERROR: no method code_lowered(Function,DataType)

julia> code_typed(+,(Int,))
1-element Any Array:
 :($(Expr(:lambda, {:x}, {{},{{:x,Int64,0}},{}}, quote  # operators.jl, line 39:
        return x::Int64
    end)))

Structure of the Return Value

You should be getting an Array of Exprs back.
This value can be a bit complicated and hard to understand.
It has three fields: head, args, and typ.
(You can find this out by calling the function names on it.)

  • head is a Symbol that tells you what kind of expression this is.
    For Exprs returned by code_typed, this will always be :lambda.

  • typ is a DataType.
    Currently, it will always be Any for Exprs returned by code_typed.

  • args is a 1-dimensional Any Array (Array{Any,1}). It’s the interesting part:
    it contains information about the body of the function and the variables used there.

There are three parts to args:

  1. Symbols of the names of function arguments. This has type Array{Any,1}.
  2. An Array{Any,1} of length 3. It contains details about all variables used in the function (local, captured, and arguments).
  3. An Expr representing the body of the generic function.

The middle part (2) above has more structure to examine:

  1. An Array{Any,1} of Symbols. This contains a Symbol for the name of each local variable.
  2. An Array{Any,1} of length-3 Array{Any,1}s describing each local variable and argument. I’ll get to the format of the length-3 Arrays in a moment.
  3. An Array{Any,1} of length-3 Array{Any,1}s describing each captured variable. These entries have the same format as above.

The triple that describes each used variable is an Array{Any,1}.
It consists of a Symbol of the variable name, a DataType for the inferred type of the variable, and an Int64 of bit flags describing how the variable is used.

The lowest 5 bits of the Int64 are used as bit flags.4
From most to least significant, these bits represent whether the variable:
[is assigned once][is const][is assigned by inner function][is assigned][is captured].
So, if no bits are set, then the value will be 0.
If a variable is captured, but not const or assigned to, then it will have a value of 1.
If a local variable is assigned to, then it would have a value of 2.

An Example: 0-args, just assigning to local vars

The function:

function foo()
  x = 4
  y = 5
end

The result of code_typed(foo,()):

1-element Any Array:
 :($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote  # none, line 2:
        x = 4 # line 3:
        y = 5
        return 5
    end)))
  • the .head field is :lambda, as expected.
  • the .typ field is Any, also as expected.
  • the .args field is:
3-element Any Array:
{}                                                                   
{{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}                           
quote  # none, line 2:
   x = 4 # line 3:
   y = 5
   return 5
end

Let’s talk more about args:

  • .args[1] is empty because we took no arguments.
  • .args[2][1] contains the names of our two local variables, x and y.
  • .args[2][2] contains a description of each of those local variables.
    The Int64 values indicate that x and y have been inferred to be of that type, despite no type annotations in the code.
    The value 18 indicates that the set bit flags are “is assigned once” (16) and “is assigned by inner function” (4).
  • .args[2][3] is empty because we did not capture any variables.
  • .args[3] is the Expr representing the body of the function.
    You may notice that it is nearly identical to the original version of the code.

Layer 4: LLVM IR

code_llvm(generic_function, (types_arg_list,))

Calling code_llvm prints the LLVM IR for the function.
This is going to be more unfamiliar-looking that the previous layers,
since it looks like a complicated kind of assembly code, rather than being Julia-specific.
It also differs in that it prints out the code, not returning a maniputable value to you.

Usage Examples

julia> code_llvm(linear_foo,())

define i64 @julia_linear_foo() {
top:
  ret i64 5, !dbg !4355
}

julia> code_llvm(+,(Int,))

define i64 @"julia_+823"(i64) {
top:
  ret i64 %0, !dbg !4361
}

julia> code_llvm(+,(Int,Int))

define i64 @"julia_+824"(i64, i64) {
top:
  %2 = add i64 %1, %0, !dbg !4367
  ret i64 %2, !dbg !4367
}

Note that now trying to get multiple results is going to end in an error:

julia> code_llvm(+,(Any,))
ERROR: no method found for the specified argument types
 in _dump_function at reflection.jl:110
 in code_llvm at reflection.jl:115

Happily, accidently non-tuple types also result in an error:

julia> code_llvm(+,(Int))
ERROR: no method code_llvm(Function,DataType)

Layer 5: Assembly Code

code_native(generic_function, (types_arg_list,))

Calling code_native prints the native assembly code for the specified method.

Usage Example

julia> code_native(+,(Int,Int))
    .text
Filename: int.jl
Source line: 36
    push    RBP
    mov RBP, RSP
Source line: 36
    add RDI, RSI
    mov RAX, RDI
    pop RBP
    ret

Calling it with a non-tuple or for signatures that don’t exist results in an error:

julia> code_native(+,(Int))
ERROR: no method code_native(Function,DataType)

julia> code_native(+,(Any,))
ERROR: no method found for the specified argument types
 in _dump_function at reflection.jl:110
 in code_native at reflection.jl:116

Footnotes:


  1. The Julia REPL and normal Julia code (in files) are equally powerful and have all the same capabilities.
    These are not interpreter directives, like :t in GHCI. ↩

  2. If your function has a name, it’s a generic function. ↩

  3. The @which macro lets you see which method of a function would be called
    with a particular set of arguments, without actually calling it.
    For example:

    julia> @which 2+2
    +(x::Int64,y::Int64) at int.jl:36
    

    This means that you should look in the Julia source tree, in the base folder, for a file called int.jl, and you’ll find the method for +(Int64,Int64) defined there.

    It’s less helpful for functions defined in the REPL:

    julia> @which lessthan5(4)
    lessthan5(x::Int64) at none:2
    

    ↩

  4. To see for yourself what the bit flags field is, you should take a look around
    lines 2357-2381 of julia/src/julia-syntax.scm:

    ;; record whether var is captured
    (define (vinfo:set-capt! v c) (set-car! (cddr v)
                                            (if c
                                                (logior (caddr v) 1)
                                                (logand (caddr v) -2))))
    ;; whether var is assigned
    (define (vinfo:set-asgn! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 2)
                                                (logand (caddr v) -3))))
    ;; whether var is assigned by an inner function
    (define (vinfo:set-iasg! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 4)
                                                (logand (caddr v) -5))))
    ;; whether var is const
    (define (vinfo:set-const! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 8)
                                                (logand (caddr v) -9))))
    ;; whether var is assigned once
    (define (vinfo:set-sa! v a) (set-car! (cddr v)
                                            (if a
                                                (logior (caddr v) 16)
                                                (logand (caddr v) -17))))
    

    ↩

Running Shell Commands from Julia

Here are some examples of starting and interacting with other programs from Julia.
The official documentation is pretty good, but I want something with more (basic) examples and fewer words.
I do reccommend reading that to see some of the fancier tricks you can pull
(and for up-to-date documentation, when this blog post gets stale).

Running Other Programs

The easiest, most basic way to run a shell command is run.

julia> help(run)
Base.run(command)

   Run a command object, constructed with backticks. Throws an error
   if anything goes wrong, including the process exiting with a non-
   zero status.

You can’t communicate with the process at all,
and run will block until the command finishes running.

julia> run(`echo hello!`)
hello!

Notice that Command literals are written with backticks, not single quotes (').

You can run two commands in parallel using &:

julia> run(`echo hello` & `echo world`)
hello
world

If you try to cat a file that doesn’t exist, you’ll get an error.

julia> run(`cat test.txt`)
cat: test.txt: No such file or directory
ERROR: failed process: Process(`cat test.txt`, ProcessExited(1)) [1]
 in error at error.jl:22
 in pipeline_error at process.jl:430
 in run at process.jl:413

The first line is cat printing an error message.
The rest is Julia throwing an error.
Note that the command is containedin backticks, not normal single quotes.

Redirecting

Base.|>

   Redirect standard input or output of a process.

   Example: run(`ls` |> "out.log")
   Example: run("file.txt" |> `cat`)

Use |> to redirect the STDOUT of a command to a file
or to write a file to a command’s STDIN.

julia> run(`echo hello, world` |> "test.txt")

julia> run(`cat test.txt`)
hello, world

julia> run("test.txt" |> `cat`)
hello, world

You can also use |> to redirect a process’s output to another process.

julia> run(`echo hello` |> `cat`)
hello

julia> run(`echo $("hellonhinhello, world")` |> `cat` |> `grep -n o`)
1:hello
3:hello, world

String Interpolation

You can use $ to interpolate into Command literals, in the same way you can into string literals.

julia> filename = "testnnewline.txt"
"testnnewline.txt"

You can see what the interpolation expands to by just printing the Command literal.
For example, this doesn’t run or do anything, it just shows you what the command looks like.

julia> `cat $filename`
`cat 'test
newline.txt'`

If you run the command without having that file around, you’ll get an error because cat doesn’t exit cleanly.

julia> run(`cat $filename`)
cat: test
newline.txt: No such file or directory
ERROR: failed process: Process(`cat 'test
newline.txt'`, ProcessExited(1)) [1]
 in error at error.jl:22
 in pipeline_error at process.jl:430
 in run at process.jl:413

Creating the file resolves the error:

julia> run(`echo the name is $filename` |> filename)

julia> run(`cat $filename`)
the name is test
newline.txt

The rest of this section’s examples are mostly from the official documentation.

julia> names = ["foo","bar","baz"]
3-element ASCIIString Array:
 "foo"
 "bar"
 "baz"

Command interpolation is more specialized than just plain string interpolation.
Lists will become escaped and space separated, which is more reasonable in shell commands than the normal square-brackets-and-commas notation.

julia> `grep xylophone $names`
`grep xylophone foo bar baz`

If you interpolate a list as part of a word, the resulting space separated list will be versions of the complete word:

julia> `grep xylophone $names.txt`
`grep xylophone foo.txt bar.txt baz.txt`

julia> `grep xylophone hi$names.bye`
`grep xylophone hifoo.bye hibar.bye hibaz.bye`

julia> `grep xylophone hi$(names)bye`
`grep xylophone hifoobye hibarbye hibazbye`

You can do even cooler things if you have two arrays.
If you interpolate two arrays into the same word,
then you’ll get all combinations of elements in the two lists.

julia> exts = ["aux","log"]
2-element ASCIIString Array:
 "aux"
 "log"

julia> `rm -f $names.$exts`
`rm -f foo.aux foo.log bar.aux bar.log baz.aux baz.log`

Getting Output

The easiest way to call a command and get it’s output as a String is readall.

julia> readall(`echo hello?`)
"hello?n"

This words just as well with chains of commands and redirections
because the chain will become a single Cmd value
when the redirectin operators are done.

julia> readall("test.txt" |> `cat`)
"hello, worldn"

If you just want the contents of a file as a string, you don’t need to use cat or any other command. Just use readall on the file:

julia> readall("test.txt")
"hello, worldn"

However, sometimes you’d like to be able to read the output in over time, rather than all at once.
For that, you want readsfrom:

julia> help(readsfrom)
Base.readsfrom(command)

   Starts running a command asynchronously, and returns a tuple
   (stream,process). The first value is a stream reading from the
   process' standard output.

julia> (st,pr) = readsfrom(`cat poem`)
(Pipe(active, 0 bytes waiting),Process(`cat poem`, ProcessRunning))

This use of readsfrom isn’t that interesting, since cat exits right away.

julia> pr
Process(`cat poem`, ProcessExited(0))

You can use any reading stream functions you’d like: readall, readline, etc.

julia> readline(st)
" There is a place where the sidewalk endsn"

The pipe is closed because the process has exited.

julia> st
Pipe(closed, 718 bytes waiting)

This way, you can process the data one line (or whatever increment you like) at a time.

julia> readline(st)
"And before the street begins,n"

julia> readline(st)
"And there the grass grows soft and white,n"

julia> st
Pipe(closed, 646 bytes waiting)

When you run out of data in the pipe, readline will start returning empty strings: "".

Want to ls, but get the result as an array of file and directory names?
You want to use readdir()

julia> help(readdir)
# methods for generic function readdir
readdir(path::String) at file.jl:169
readdir() at file.jl:198

julia> readdir()
6-element String Array:
 "newnline"      
 "poem"           
 "snowman☃snowman"
 "sp ace"         
 "ttab"          

Sending Input

Sometimes, you might want to communicate with another process in a more complex way than just commandline args.

Write to STDIN and then wait for it to finish

The writesto function will give you a Pipe hooked up to the STDIN of the process and a Process. Do not read from the Pipe.

julia> (si,pr) = writesto(`cat`)
(Pipe(open, 0 bytes waiting),Process(`cat`, ProcessRunning))

If you don’t give cat any filenames (or if you pass in a -),
then cat will read from STDIN until you send CTRL-D.

julia> write(si,"hello")
5

We can keep writing to cat for as long as we want.
As you can see, the process is still running:

julia> pr
Process(`cat`, ProcessRunning)

When you’re done writing to the process, close the Pipe.

julia> close(si)

This will make cat exit, as it’s done doing it’s work:

julia> pr
Process(`cat`, ProcessExited(0))

If you want to wait for the process to be down doing work,
you can call wait on the Process.
This will block until the process exits.

julia> wait(pr)
0

Read AND Write on the same process

Using the readandwrite function, you can get two Pipes and a Process.
The Pipes are the STDOUT and STDIN of the process.
The Process is a Julia value representing the asynchronous process started to run the command.

You can read from the STDOUT of the process, but don’t write to it.
You can write to the STDIN of the process, but don’t read from it.

julia> (so,si,pr) = readandwrite(`cat`)
(Pipe(active, 0 bytes waiting),Pipe(open, 0 bytes waiting),Process(`cat`, ProcessRunning))

julia> write(si,"hellongoodnbyen")
15

julia> so
Pipe(active, 15 bytes waiting)

julia> si
Pipe(open, 0 bytes waiting)

julia> close(si)

julia> so
Pipe(closed, 15 bytes waiting)

julia> pr
Process(`cat`, ProcessExited(0))

julia> readall(so)
"hellongoodnbyen"

julia> so
Pipe(closed, 0 bytes waiting)

REPL specialness

In the REPL, you can use a ; to run shell commands easily.

julia> ;ls
new?line  NGrep.jl  poem  snowman☃snowman  sp ace  t?ab  test.txt

julia> ;mkdir folder

julia> ;ls
folder  new?line  NGrep.jl  poem  snowman☃snowman  sp ace  t?ab  test.txt

julia> ;cd folder/
/home/leah/src/lambdajam/folder

The name of the directory (folder/) tab completed! 🙂

julia> ;ls

julia> ;touch my_new_file

julia> ;ls
my_new_file

API Summary

Setting up the process

julia> help(readsfrom)
Base.readsfrom(command)

   Starts running a command asynchronously, and returns a tuple
   (stream,process). The first value is a stream reading from the
   process' standard output.

julia> help(writesto)
Base.writesto(command)

   Starts running a command asynchronously, and returns a tuple
   (stream,process). The first value is a stream writing to the
   process' standard input.

julia> help(readandwrite)
Base.readandwrite(command)

   Starts running a command asynchronously, and returns a tuple
   (stdout,stdin,process) of the output stream and input stream of the
   process, and the process object itself.

Interacting with streams

julia> help(write)
Base.write(stream, x)

   Write the canonical binary representation of a value to the given
   stream.

julia> help(readall)
Base.readall(stream)

   Read the entire contents of an I/O stream as a string.

julia> help(readline)
Base.readline(stream)

   Read a single line of text, including a trailing newline character
   (if one is reached before the end of the input).

julia> help(close)
Base.close(stream)

   Close an I/O stream. Performs a "flush" first.

Interacting with processes

julia> help(wait)
Base.wait(x)

   Block the current task until some event occurs, depending on the
   type of the argument:

   * "RemoteRef": Wait for a value to become available for the
     specified remote reference.

   * "Condition": Wait for "notify" on a condition.

   * "Process": Wait for the process to exit, and get its exit code.

   * "Task": Wait for a "Task" to finish, returning its result
     value.

Well, that was embarrassing.

I have long been confused by the strange behavior of integers as arguments to functions.
If I pass a variable into a function, I expect the function to be able to modify it.
This expectation applies to variables local to the calling context and to global variables;
it also applied to Strings and Floats and Integers and Chars.
When I’m trying to do something
(like modifying Int arguments inside a function and seeing the change outside it)
and it isn’t working, I fiddle with it until it does.

However, today, I decided that I should actually ask why it didn’t work the way I expect.
Today, I had the embarrassing experience of correcting my fundamentally flawed model of how variables work.
I’m going to skip over explaining the details of my previous model, since I don’t want to encourage its remaining grip on my mind.
Instead, I’m going to work through a very physical metaphor for my new understanding.

The Metaphor: A Table of Boxes and Forms and Colorful Stickers

There are two kinds of values: mutable and immutable.
Immutable values cannot be edited; they are like paper forms filled out in indelible ink.
Numbers (Ints,Floats,BigInts,BigFloats,etc), Chars, Strings, and user-defined immutable types are all immutable.
Mutable values can be edited; they are like boxes with boxes and paper forms inside.
I like to picture them as those organizer-boxes, with dividers it them.
Dictionaries, Arrays, and all other user-defined types are mutable.

There is a table (as in, a piece of furniture) of these boxes and forms; this is all the memory your program is using.
(this metaphor, as you may have noticed, is ignoring the stack/heap distinction and other implementation concerns.
I’m focused on getting correct expectations for what the value of my variables might be after I use them as arguments to a function.)
As your program executes, it fills out new forms, puts new boxes on the table, and throws out boxes and forms that it’s not using.
It also moves things into or out of the boxes.

There is one other component to this setup: stickers.
Any time that code is executing, it is executing within a context.
This context is made of bindings of names (variable names) to the values on the table.
We’ll model these bindings as stickers.
The name of the variable is printed on the sticker,
and the sticker is stuck to the value that the variable is currently bound to.
Each context has its own color of sticker, and ignores stickers that aren’t of its color.

Example 1: x = x + 1

Let’s say we have the variable name x bound to an Int value, 5.
This value is immutable, so it will be a paper form with 5 written on it in ink.
Because we’re calling this form by the name x, there is a sticker on the form with the name x printed on it.
Now that I’ve explained the context, let’s run the line of code x = x + 1.
First, we’ll take a look at the value on the form labeled x; it’s 5.
Then, we’ll write the result of 5 + 1 on a new form.
Finally, we’ll move the x sticker from the form with 5 written on it to the form with 6 written on it.

Example 2: a[2] = 6

Forget about x. a is a 1-dimensional Array.
Picture a as a long, thin box with three dividers in it.
Each space between the dividers is an element of a, so a is of length 4.
There is a sticker with a printed on it on the outside of the box.
Each of the spaces in the box has a paper form with a number written on it;
what numbers are written on them is not relevant to this example.

Now, let’s simulate running a[2] = 6.
(We’re simulating Julia code, so we’re indexing from 1.)
First, we’ll get a paper form and write 6 on it in pen.
Then, we’ll replace the paper in the second box from the left of a with the new form.
That’s all we need to do; notice that no stickers have been moved.

Example 3: foo(x)

Say we have our integer variable x again.
It’s a form with 6 written on it, with an x sticker stuck to it.
The sticker is blue.

Now, we have a function:

 :::.jl
 function foo(z::Int)
   z = z + 1
 end

I’m going to ignore irrelevant portions of calling this function,
including that it will return the value of z + 1.

When we run the line foo(x), we will move into the context of foo.
Our previous context is the calling context (the one calling foo).
Our current context is inside foo, so now there is a green z sticker on our form.
Now, inside foo, we’ll write 7 on a new form, and move our green z sticker to it.
Then we return from the function call, and remove all the green stickers.

What is the value of x now?

Well, the blue x sticker is still stuck to the form with 6 written on it.
This means that foo did not modify x.
In fact, passing an Int variable as an argument to a function will never modify that variable.
Inside the context of foo, we can’t see the blue x sticker at all — and we definitely can’t move it.[1]
The form is immutable, so you can’t erase and you can’t write anything new on it.
There’s nothing that foo could possibly do to change the value of x in the calling context.

Example 4: foo(a[2])

Picture the box for a again. It’s got 4 pieces of paper in it; the second one says 6.
There are no labels on any of the four forms; there is a blue a label on the outside of the box.

Now, we’ll execute foo(a[2]), where foo is the same function from the previous example.
As we move from the calling context into the foo context,
we’ll stick a green z sticker on the form in the second compartment of the box with the blue a sticker.
Now, we’ll get out a new form, write 7 on it, and move the green z sticker to this new form.
Finally, we’ll remove all the green stickers as we return to the calling context.

The value of a is unchanged.

Example 5: bar(a)

So far, we haven’t managed to mutate any arguments to a function.
This is about to change.

Picture the box for a. Let’s say that the four forms in have,
respectively, the following numbers written on them: 1,6,8,19.
None of the forms have any labels on them; the only label is the blue a on the box.

We’ll need a new function for this:

 :::.jl
 function bar(xs::Array)
   xs[2] = 42
 end

Now, let’s simulate: bar(a).
We’ll move from the blue calling context to the green callee context;
we’ll put a green xs sticker on the box.
Then we’ll take the piece of paper in the second compartment (with 6 written on it)
and replace it with a new form with 42 written on it.
Now, we’ll return to the calling context, removing the green xs sticker.
Notice that we did not move any stickers.

What is the value of a now?

It’s [1,42,8,19]. The assignment in bar did affect the value of a.
It did not affect the binding of a (the blue sticker),
but it affected which values were inside the mutable box that is the value of a.

Conclusion

I already had a sort of messy awareness of this whole thing.
I mean, I’ve brushed up against “by value” vs “by reference”;
I’m fine with writing C and using pointer;
I was good on the difference between variables shadowing and modifying a mutable box in OCaml.
But it wasn’t until today that this all clicked into place in my mental model
of how passing variables as arguments to functions works in “normal” languages, like Julia and Java and such.
Suddenly, the division between types that are passed “by reference” and ones that are inexplicably passed “by value” makes complete sense.

I am happy that I noted that I was confused and expressed my confusion despite feeling embarrassed.
After trying several functions in the Julia REPL and determining that the behavior did indeed break my mental model (I felt so confused),
I asked one of my mentors for the summer the relevant question.
Once I convinced him that I was in fact not joking and was honestly surprised and confused,
he was very patient about correcting my misunderstanding.
Being very confused about a fundamental aspect of programming,
misunderstanding such a seemingly basic thing,
was embarrassing to admit to myself;
it was much more embarrassing to reveal that mistake in front of someone as awesome as him.
I still feel embarrassed, but the world also makes a lot more sense, which is worth it.

Footnotes

  1. Except in certain circumstances in C++.