When people are introspective, they’re thinking about how their minds work,
about how and why they think what they do.
The Julia language has some impressive facilities for letting you see how the compilers’ mind works.
Using convenient built-in functions that are available both at the REPL and in normal code,
Julia allows you to see the layers of internal representation that your code goes through,
from the parsed AST to native assembly code.
This allows you to answer some otherwise difficult questions very easily.
(and allows you to peer into the inner workings of the compiler, which is just plain fun.)
A Simple Question: I wonder if it matters which of these I use?
One of the questions I have pondered is whether two syntaxes for assigning variables
make a performance difference.
Each of these two approaches assigns the same two values to the same two variables.
The most straight-forward approach:
function linear_foo()
x = 4
y = 5
end
Sometimes this looks nicer:
function tuple_foo()
x,y = 4,5
end
The question that we’re specifcally wondering about is whether it matters which syntax we use,
from a speed stand point.
(The concern is that the second syntax might implicitly create a tuple and waste time messing around with it.)
In most languages, if you really cared, you’d make a microbenchmark to get an approximate answer.
Instead, I’m going to go take a look at the optimized version of the AST.
(It’s much easier and more exact than benchmarking. 🙂
In Julia, you can just look at the optimized version of the AST for any generic function.
All it takes is a single call to code_typed
.
The value of the last expression in a Julia function is implictly returned,
so the only change to our first approach is to add the return statement:
julia> code_typed(linear_foo,())
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote # none, line 2:
x = 4 # line 3:
y = 5
return 5
end)))
The only difference in the optimized version of the second one is that it returns a tuple.
This is solely due to the expression evaluating to a tuple.
julia> code_typed(tuple_foo,())
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote # none, line 2:
x = 4
y = 5
return top(tuple)(4,5)::(Int64,Int64)
end)))
For example:
function tuple_foo2()
x,y = 4,5
y
end
becomes
julia> code_typed(tuple_foo2,())
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote # none, line 2:
x = 4
y = 5 # line 3:
return y::Int64
end)))
As you may have guessed, this is by no means the final internal represetation or the final optimization pass.
These functions can be optimized down to nearly nothing, as we can see if we take a look at the assembly code:
julia> code_native(linear_foo,()) # returns the value of y
.text
Filename: none
Source line: 3
push RBP
mov RBP, RSP
mov EAX, 5
Source line: 3
pop RBP
ret
julia> code_native(tuple_foo,()) # returns a tuple of (x,y)
.text
Filename: none
Source line: 2
push RBP
mov RBP, RSP
mov EAX, 83767488
Source line: 2
pop RBP
ret
julia> code_native(tuple_foo2,()) # returns the value of y
.text
Filename: none
Source line: 3
push RBP
mov RBP, RSP
mov EAX, 5
Source line: 3
pop RBP
ret
Layers
This post will cover five layers of internal representations of Julia code.
Except for the first one, each layer is accessible via a normal generic function
that takes a generic function and a tuple of argument types (to specify which method you want to examine).
If you are uncertain about the signature of the method you’re calling,
the macro @which
will be useful to you.
The internal representation of your code in the compiler is called an Abstract Syntax Tree (AST);
the AST format is specific to Julia.
Julia uses LLVM to create machine-specific assembly code;
LLVM has its own intermediate representation (IR).
The Julia compiler generates LLVM IR to tell LLVM what the generated assembly code should do.
The native assembly code is specific to your computer’s architecture.
You can see the documentation for these functions in the official manual
here.
- The AST after parsing
- The AST after lowering
- The AST after type inference and optimization
- The LLVM IR
- The assembly code
Layer 1: The AST
When the parser takes your code in (as a String
), it will produce an AST (Abstract Syntax Tree).
The AST is the compiler’s representation of your code.
It is like you turning my written sentances into your mental representation of their structure/meaning.
If you’re familiar with writing macros
in Julia, this will be old news to you.
This representation is not saved, so if we want to see it, we’ll need to quote the expression.
julia> :(2 + 2)
:(+(2,2))
Above, we can see that the infix +
operator just becomes a function call.
This is identical to if you use +
as a normal function:
julia> :(+(2,2))
:(+(2,2))
Slightly more interesting:
julia> :(1 + 2 + 3 + 4 + 5)
:(+(1,2,3,4,5))
julia> :(1 + 2 - 3 - 4 + 5)
:(+(-(-(+(1,2),3),4),5))
julia> :(1-2-3-4-5)
:(-(-(-(-(1,2),3),4),5))
This lets you see that +
becomes one function call even with a lot of args,
while -
does not.
This is the same quoting used in macros; you can also use a quote block, such as:
Layer 2: The Lowered AST
code_lowered(generic_function, (types_arg_list,))
While quoting will work on any expression, the rest of these layers involve calling
a function that takes a generic function and a tuple of argument types.
For example, code_lowered(linear_foo,())
returns the lowered AST of our function from the start of this post.
code_lowered
will return the lowered AST for any method of a generic function.
Lowering in general is the process of moving from surface syntax (highest) to machine code (lowest).
Here, lowering involves transforming the AST in ways that make it simpler.
This includes unnesting some expressions and desugaring some syntax into the function calls indicated.
The lowered AST is stored for every generic function.
This will work on methods you write and on ones in packages and on ones from the base libraries.
code_lowered
is a normal generic function: it will work from the REPL and from any Julia code you write.
Examples
You can call it on one of the simple functions we defined earlier:
julia> code_lowered(linear_foo,())
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,:Any,18},{:y,:Any,18}},{}}, quote # none, line 2:
x = 4 # line 3:
y = 5
return 5
end)))
Or you could call it on a built-in function from Base:
julia> code_lowered(+,(Int,Int))
1-element Any Array:
:($(Expr(:lambda, {:x,:y}, {{},{{:x,:Any,0},{:y,:Any,0}},{}}, quote # int.jl, line 36:
return box(Int64,add_int(unbox(Int64,x),unbox(Int64,y)))
end)))
The +
function also has a single-argument method:
If you want to make a unary tuple, use a trailing comma:
julia> cl = code_lowered(+,(Int,)) #trailing comma to make (Int,) a tuple
1-element Any Array:
:($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote # operators.jl, line 39:
return x
end)))
As you can see below, the value you get back is a one-dimensional Any
array of Expr
s.
Expr
is the type used to represent an expression in the AST; you also use them when writing macros.
julia> typeof(cl)
Array{Any,1}
julia> typeof(cl[1]) # Julia Arrays are indexed from 1
Expr
code_lowered
returns an Array
because it sometimes returns multiple (or 0) values.
It will return an entry for each matching method:
julia> code_lowered(+,(Any,))
3-element Any Array:
:($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote # bool.jl, line 35:
return int(x)
end)))
:($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote # operators.jl, line 39:
return x
end)))
:($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote # abstractarray.jl, line 264:
return x
end)))
An example of getting no results:
julia> code_lowered(+,(String,String))
0-element Any Array
There is no +
for Strings because Julia uses *
as the string concatenation operator.
julia> code_lowered(*,(String,String))
1-element Any Array:
:($(Expr(:lambda, {:(s::top(apply_type)(Vararg,String))}, {{},{{:s,:Any,0}},{}}, quote # string.jl, line 72:
return top(apply)(string,s)
end)))
It’s easier to see what lowering does if you take a look at examples involving control flow.
For example, if you define this function:
function myloop(x::Int)
result = 0
for i=1:x
result += x
end
result
end
You can see a loop in the lowered code:
julia> code_lowered(myloop,(Int,))
1-element Any Array:
:($(Expr(:lambda, {:x}, {{:result,:#s6,:#s5,:i},{{:x,:Any,0},{:result,:Any,2},{:#s6,:Any,2},{:#s5,:Any,18},{:i,:Any,18}},{}}, quote # none, line 2:
result = 0 # line 3:
#s6 = 1
#s5 = x
1:
unless top(<=)(#s6,#s5) goto 2
i = #s6 # line 4:
result = +(result,x)
3:
#s6 = top(convert)(top(typeof)(#s6),top(+)(1,#s6))
goto 1
2:
0: # line 6:
return result
end)))
If you want to see what happens to an if-statment, you could use this example:
function lessthan5(x::Int)
if x < 5
return true
else
return false
end
end
You can see that, like the loop, this is also lowered into an unless
and a goto
.
julia> code_lowered(lessthan5,(Int,))
1-element Any Array:
:($(Expr(:lambda, {:x}, {{},{{:x,:Any,0}},{}}, quote # none, line 2:
unless <(x,5) goto 0 # line 3:
return true
goto 1
0: # none, line 5:
return false
1:
end)))
Layer 3: The Type-inferred, optimized AST
code_typed(generic_function, (types_arg_list,))
code_typed
returns the type-inferred and optimized version of the Julia AST.
It is the last layer that is internal to Julia.
How to Call code_typed
You need to be extra careful to use trailing commas for single-argument functions with code_typed
:
julia> code_typed(+,(Int))
0-element Any Array
julia> code_lowered(+,(Int))
ERROR: no method code_lowered(Function,DataType)
julia> code_typed(+,(Int,))
1-element Any Array:
:($(Expr(:lambda, {:x}, {{},{{:x,Int64,0}},{}}, quote # operators.jl, line 39:
return x::Int64
end)))
Structure of the Return Value
You should be getting an Array of Expr
s back.
This value can be a bit complicated and hard to understand.
It has three fields: head
, args
, and typ
.
(You can find this out by calling the function names
on it.)
-
head
is a Symbol
that tells you what kind of expression this is.
For Expr
s returned by code_typed
, this will always be :lambda
.
-
typ
is a DataType
.
Currently, it will always be Any
for Expr
s returned by code_typed
.
-
args
is a 1-dimensional Any
Array (Array{Any,1}
). It’s the interesting part:
it contains information about the body of the function and the variables used there.
There are three parts to args
:
Symbol
s of the names of function arguments. This has type Array{Any,1}
.
- An
Array{Any,1}
of length 3. It contains details about all variables used in the function (local, captured, and arguments).
- An
Expr
representing the body of the generic function.
The middle part (2) above has more structure to examine:
- An
Array{Any,1}
of Symbol
s. This contains a Symbol
for the name of each local variable.
- An
Array{Any,1}
of length-3 Array{Any,1}
s describing each local variable and argument. I’ll get to the format of the length-3 Array
s in a moment.
- An
Array{Any,1}
of length-3 Array{Any,1}
s describing each captured variable. These entries have the same format as above.
The triple that describes each used variable is an Array{Any,1}
.
It consists of a Symbol
of the variable name, a DataType
for the inferred type of the variable, and an Int64
of bit flags describing how the variable is used.
The lowest 5 bits of the Int64 are used as bit flags.
From most to least significant, these bits represent whether the variable:
[is assigned once][is const][is assigned by inner function][is assigned][is captured]
.
So, if no bits are set, then the value will be 0.
If a variable is captured, but not const or assigned to, then it will have a value of 1.
If a local variable is assigned to, then it would have a value of 2.
An Example: 0-args, just assigning to local vars
The function:
function foo()
x = 4
y = 5
end
The result of code_typed(foo,())
:
1-element Any Array:
:($(Expr(:lambda, {}, {{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}, quote # none, line 2:
x = 4 # line 3:
y = 5
return 5
end)))
- the
.head
field is :lambda
, as expected.
- the
.typ
field is Any
, also as expected.
- the
.args
field is:
3-element Any Array:
{}
{{:x,:y},{{:x,Int64,18},{:y,Int64,18}},{}}
quote # none, line 2:
x = 4 # line 3:
y = 5
return 5
end
Let’s talk more about args
:
.args[1]
is empty because we took no arguments.
.args[2][1]
contains the names of our two local variables, x
and y
.
.args[2][2]
contains a description of each of those local variables.
The Int64
values indicate that x
and y
have been inferred to be of that type, despite no type annotations in the code.
The value 18
indicates that the set bit flags are “is assigned once” (16) and “is assigned by inner function” (4).
.args[2][3]
is empty because we did not capture any variables.
.args[3]
is the Expr
representing the body of the function.
You may notice that it is nearly identical to the original version of the code.
Layer 4: LLVM IR
code_llvm(generic_function, (types_arg_list,))
Calling code_llvm
prints the LLVM IR for the function.
This is going to be more unfamiliar-looking that the previous layers,
since it looks like a complicated kind of assembly code, rather than being Julia-specific.
It also differs in that it prints out the code, not returning a maniputable value to you.
Usage Examples
julia> code_llvm(linear_foo,())
define i64 @julia_linear_foo() {
top:
ret i64 5, !dbg !4355
}
julia> code_llvm(+,(Int,))
define i64 @"julia_+823"(i64) {
top:
ret i64 %0, !dbg !4361
}
julia> code_llvm(+,(Int,Int))
define i64 @"julia_+824"(i64, i64) {
top:
%2 = add i64 %1, %0, !dbg !4367
ret i64 %2, !dbg !4367
}
Note that now trying to get multiple results is going to end in an error:
julia> code_llvm(+,(Any,))
ERROR: no method found for the specified argument types
in _dump_function at reflection.jl:110
in code_llvm at reflection.jl:115
Happily, accidently non-tuple types also result in an error:
julia> code_llvm(+,(Int))
ERROR: no method code_llvm(Function,DataType)
Layer 5: Assembly Code
code_native(generic_function, (types_arg_list,))
Calling code_native
prints the native assembly code for the specified method.
Usage Example
julia> code_native(+,(Int,Int))
.text
Filename: int.jl
Source line: 36
push RBP
mov RBP, RSP
Source line: 36
add RDI, RSI
mov RAX, RDI
pop RBP
ret
Calling it with a non-tuple or for signatures that don’t exist results in an error:
julia> code_native(+,(Int))
ERROR: no method code_native(Function,DataType)
julia> code_native(+,(Any,))
ERROR: no method found for the specified argument types
in _dump_function at reflection.jl:110
in code_native at reflection.jl:116
Footnotes: