Author Archives: Christopher Rackauckas

7 Julia Gotchas and How to Handle Them

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/7-julia-gotchas-handle/

Let me start by saying Julia is a great language. I love the language, it is what I find to be the most powerful and intuitive language that I have ever used. It’s undoubtedly my favorite language. That said, there are some “gotchas”, tricky little things you need to know about. Every language has them, and one of the first things you have to do in order to master a language is to find out what they are and how to avoid them. The point of this blog post is to help accelerate this process for you by exposing some of the most common “gotchas” offering alternative programming practices.

Julia is a good language for understanding what’s going on because there’s no magic. The Julia developers like to have clearly defined rules for how things act. This means that all behavior can be explained. However, this might mean that you need to think about what’s going on to understand why something is happening. That’s why I’m not just going to lay out some common issues, but I am also going to explain why they occur. You will see that there are some very similar patterns, and once you catch onto the patterns, you will not fall for any of these anymore. Because of this, there’s a slightly higher learning curve for Julia over the simpler languages like MATLAB/R/Python. However, once you get the hang of this, you will fully be able to use the conciseness of Julia while obtaining the performance of C/Fortran. Let’s dig in.

Gotcha #1: The REPL (terminal) is the Global Scope

For anyone who is familiar with the Julia community, you know that I have to start here. This is by far the most common problem reported by new users of Julia. Someone will go “I heard Julia is fast!”, open up the REPL, quickly code up some algorithm they know well, and execute that script. After it’s executed they look at the time and go “wait a second, why is this as slow as Python?”

Because this is such an important issue and pervasive, let’s take some extra time delving into why this happens so we understand how to avoid it.

Small Interlude into Why Julia is Fast

To understand what just happened, you have to understand that Julia is about not just code compilation, but also type-specialization (i.e. compiling code which is specific to the given types). Let me repeat: Julia is not fast because the code is compiles using a JIT compiler, rather it is fast because type-specific code is compiled an ran.

If you want the full story, checkout some of the notes I’ve written for an upcoming workshop. I am going to summarize the necessary parts which are required to understand why this is such a big deal.

Type-specificity is given by Julia’s core design principle: multiple dispatch. When you write the code:

function f(a,b)
  return 2a+b
end

you may have written only one “function”, but you have written a very large amount of “methods”. In Julia parlance, a function is an abstraction and what is actually called is a method. If you call f(2.0,3.0), then Julia will run a compiled code which takes in two floating point numbers and returns the value 2a+b. If you call f(2,3), then Julia will run a different compiled code which takes in two integers and returns the value 2a+b. The function f is an abstraction or a short-hand for the multitude of different methods which have the same form, and this design of using the symbol “f” to call all of these different methods is called multiple dispatch. And this goes all the way down: the + operator is actually a function which will call methods depending on the types it sees.

Julia actually gets its speed is because this compiled code knows its types, and so the compiled code that f(2.0,3.0) calls is exactly the compiled code that you would get by defining the same C/Fortran function which takes in floating point numbers. You can check this with the @code_native macro to see the compiled assembly:

@code_native f(2.0,3.0)
 
# This prints out the following:
 
pushq	%rbp
movq	%rsp, %rbp
Source line: 2
vaddsd	%xmm0, %xmm0, %xmm0
vaddsd	%xmm1, %xmm0, %xmm0
popq	%rbp
retq
nop

This is the same compiled assembly you would expect from the C/Fortran function, and it is different than the assembly code for integers:

@code_native f(2,3)
 
pushq	%rbp
movq	%rsp, %rbp
Source line: 2
leaq	(%rdx,%rcx,2), %rax
popq	%rbp
retq
nopw	(%rax,%rax)

The Main Point: The REPL/Global Scope Does Not Allow Type Specificity

This brings us to the main point: The REPL / Global Scope is slow because it does not allow type specification. First of all, notice that the REPL is the global scope because Julia allows nested scoping for functions. For example, if we define

function outer()
  a = 5
  function inner()
    return 2a
  end
  b = inner()
  return 3a+b
end

you will see that this code works. This is because Julia allows you to grab the “a” from the outer function into the inner function. If you apply this idea recursively, then you understand the highest scope is the scope which is directly the REPL (which is the global scope of a module Main). But now let’s think about how a function will compile in this situation. Let’s do the same case as before, but using the globals:

a=2.0; a=3.0
function linearcombo()
  return 2a+b
end
ans = linearcombo()
a = 2; b = 3
ans2= linearcombo()

Question: What types should the compiler assume “a” and “b” are? Notice that in this example we changed the types and still called the same function. In order for this compiled C function to not segfault, it needs to be able to deal with whatever types we throw at it: floats, ints, arrays, weird user-defined types, etc. In Julia parlance, this means that the variables have to be “boxed”, and the types are checked with every use. What do you think that compiled code looks like?

pushq	%rbp
movq	%rsp, %rbp
pushq	%r15
pushq	%r14
pushq	%r12
pushq	%rsi
pushq	%rdi
pushq	%rbx
subq	$96, %rsp
movl	$2147565792, %edi       # imm = 0x800140E0
movabsq	$jl_get_ptls_states, %rax
callq	*%rax
movq	%rax, %rsi
leaq	-72(%rbp), %r14
movq	$0, -88(%rbp)
vxorps	%xmm0, %xmm0, %xmm0
vmovups	%xmm0, -72(%rbp)
movq	$0, -56(%rbp)
movq	$10, -104(%rbp)
movq	(%rsi), %rax
movq	%rax, -96(%rbp)
leaq	-104(%rbp), %rax
movq	%rax, (%rsi)
Source line: 3
movq	pcre2_default_compile_context_8(%rdi), %rax
movq	%rax, -56(%rbp)
movl	$2154391480, %eax       # imm = 0x806967B8
vmovq	%rax, %xmm0
vpslldq	$8, %xmm0, %xmm0        # xmm0 = zero,zero,zero,zero,zero,zero,zero,zero,xmm0[0,1,2,3,4,5,6,7]
vmovdqu	%xmm0, -80(%rbp)
movq	%rdi, -64(%rbp)
movabsq	$jl_apply_generic, %r15
movl	$3, %edx
movq	%r14, %rcx
callq	*%r15
movq	%rax, %rbx
movq	%rbx, -88(%rbp)
movabsq	$586874896, %r12        # imm = 0x22FB0010
movq	(%r12), %rax
testq	%rax, %rax
jne	L198
leaq	98096(%rdi), %rcx
movabsq	$jl_get_binding_or_error, %rax
movl	$122868360, %edx        # imm = 0x752D288
callq	*%rax
movq	%rax, (%r12)
L198:
movq	8(%rax), %rax
testq	%rax, %rax
je	L263
movq	%rax, -80(%rbp)
addq	$5498232, %rdi          # imm = 0x53E578
movq	%rdi, -72(%rbp)
movq	%rbx, -64(%rbp)
movq	%rax, -56(%rbp)
movl	$3, %edx
movq	%r14, %rcx
callq	*%r15
movq	-96(%rbp), %rcx
movq	%rcx, (%rsi)
addq	$96, %rsp
popq	%rbx
popq	%rdi
popq	%rsi
popq	%r12
popq	%r14
popq	%r15
popq	%rbp
retq
L263:
movabsq	$jl_undefined_var_error, %rax
movl	$122868360, %ecx        # imm = 0x752D288
callq	*%rax
ud2
nopw	(%rax,%rax)

For dynamic languages without type-specialization, this bloated code with all of the extra instructions is as good as you can get, which is why Julia slows down to their speed.

To understand why this is a big deal, notice that every single piece of code that you write in Julia is compiled. So let’s say you write a loop in your script:

a = 1
for i = 1:100
  a += a + f(a)
end

The compiler has to compile that loop, but since it cannot guarantee the types do not change, it conservatively gives that nasty long code, leading to slow execution.

How to Avoid the Issue

There are a few ways to avoid this issue. The simplest way is to always wrap your scripts in functions. For example, with the previous code we can do:

function geta(a)
  # can also just define a=1 here
  for i = 1:100
    a += a + f(a)
  end
  return a
end
a = geta(1)

This will give you the same output, but since the compiler is able to specialize on the type of a, it will give the performant compiled code that you want. Another thing you can do is define your variables as constants.

const b = 5

By doing this, you are telling the compiler that the variable will not change, and thus it will be able to specialize all of the code which uses it on the type that it currently is. There’s a small quirk that Julia actually allows you to change the value of a constant, but not the type. Thus you can use “const” as a way to tell the compiler that you won’t be changing the type and speed up your codes. However, note that there are some small quirks that come up since you guaranteed to the compiler the value won’t change. For example:

const a = 5
f() = a
println(f()) # Prints 5
a = 6
println(f()) # Prints 5

this does not work as expected because the compiler, realizing that it knows the answer to “f()=a” (since a is a constant), simply replaced the function call with the answer, giving different behavior than if a was not a constant.

This is all just one big way of saying: Don’t write your scripts directly in the REPL, always wrap them in a function.

Let’s hit one related point as well.

Gotcha #2: Type-Instabilities

So I just made a huge point about how specializing code for the given types is crucial. Let me ask a quick question, what happens when your types can change?

If you guessed “well, you can’t really specialize the compiled code in that case either”, then you are correct. This kind of problem is known as a type-instability. These can show up in many different ways, but one common example is that you initialize a value in a way that is easy, but not necessarily that type that it should be. For example, let’s look at:

function g()
  x=1
  for i = 1:10
    x = x/2
  end
  return x
end

Notice that “1/2” is a floating point number in Julia. Therefore it we started with “x=1”, it will change types from an integer to a floating point number, and thus the function has to compile the inner loop as though it can be either type. If we instead had the function:

function h()
  x=1.0
  for i = 1:10
    x = x/2
  end
  return x
end

then the whole function can optimally compile knowing x will stay a floating point number (this ability for the compiler to judge types is known as type inference). We can check the compiled code to see the difference:

pushq	%rbp
movq	%rsp, %rbp
pushq	%r15
pushq	%r14
pushq	%r13
pushq	%r12
pushq	%rsi
pushq	%rdi
pushq	%rbx
subq	$136, %rsp
movl	$2147565728, %ebx       # imm = 0x800140A0
movabsq	$jl_get_ptls_states, %rax
callq	*%rax
movq	%rax, -152(%rbp)
vxorps	%xmm0, %xmm0, %xmm0
vmovups	%xmm0, -80(%rbp)
movq	$0, -64(%rbp)
vxorps	%ymm0, %ymm0, %ymm0
vmovups	%ymm0, -128(%rbp)
movq	$0, -96(%rbp)
movq	$18, -144(%rbp)
movq	(%rax), %rcx
movq	%rcx, -136(%rbp)
leaq	-144(%rbp), %rcx
movq	%rcx, (%rax)
movq	$0, -88(%rbp)
Source line: 4
movq	%rbx, -104(%rbp)
movl	$10, %edi
leaq	477872(%rbx), %r13
leaq	10039728(%rbx), %r15
leaq	8958904(%rbx), %r14
leaq	64(%rbx), %r12
leaq	10126032(%rbx), %rax
movq	%rax, -160(%rbp)
nopw	(%rax,%rax)
L176:
movq	%rbx, -128(%rbp)
movq	-8(%rbx), %rax
andq	$-16, %rax
movq	%r15, %rcx
cmpq	%r13, %rax
je	L272
movq	%rbx, -96(%rbp)
movq	-160(%rbp), %rcx
cmpq	$2147419568, %rax       # imm = 0x7FFF05B0
je	L272
movq	%rbx, -72(%rbp)
movq	%r14, -80(%rbp)
movq	%r12, -64(%rbp)
movl	$3, %edx
leaq	-80(%rbp), %rcx
movabsq	$jl_apply_generic, %rax
vzeroupper
callq	*%rax
movq	%rax, -88(%rbp)
jmp	L317
nopw	%cs:(%rax,%rax)
L272:
movq	%rcx, -120(%rbp)
movq	%rbx, -72(%rbp)
movq	%r14, -80(%rbp)
movq	%r12, -64(%rbp)
movl	$3, %r8d
leaq	-80(%rbp), %rdx
movabsq	$jl_invoke, %rax
vzeroupper
callq	*%rax
movq	%rax, -112(%rbp)
L317:
movq	(%rax), %rsi
movl	$1488, %edx             # imm = 0x5D0
movl	$16, %r8d
movq	-152(%rbp), %rcx
movabsq	$jl_gc_pool_alloc, %rax
callq	*%rax
movq	%rax, %rbx
movq	%r13, -8(%rbx)
movq	%rsi, (%rbx)
movq	%rbx, -104(%rbp)
Source line: 3
addq	$-1, %rdi
jne	L176
Source line: 6
movq	-136(%rbp), %rax
movq	-152(%rbp), %rcx
movq	%rax, (%rcx)
movq	%rbx, %rax
addq	$136, %rsp
popq	%rbx
popq	%rdi
popq	%rsi
popq	%r12
popq	%r13
popq	%r14
popq	%r15
popq	%rbp
retq
nop

Verses:

pushq	%rbp
movq	%rsp, %rbp
movabsq	$567811336, %rax        # imm = 0x21D81D08
Source line: 6
vmovsd	(%rax), %xmm0           # xmm0 = mem[0],zero
popq	%rbp
retq
nopw	%cs:(%rax,%rax)

Notice how many fewer computational steps are required to compute the same value!

How to Find and Deal with Type-Instabilities

At this point you might ask, “well, why not just use C so you don’t have to try and find these instabilities?” The answer is:

  1. They are easy to find
  2. They can be useful
  3. You can handle necessary instabilities with function barriers

How to Find Type-Instabilities

Julia gives you the macro @code_warntype to show you where type instabilities are. For example, if we use this on the “g” function we created:

@code_warntype g()
 
Variables:
  #self#::#g
  x::ANY
  #temp#@_3::Int64
  i::Int64
  #temp#@_5::Core.MethodInstance
  #temp#@_6::Float64
 
Body:
  begin
      x::ANY = 1 # line 3:
      SSAValue(2) = (Base.select_value)((Base.sle_int)(1,10)::Bool,10,(Base.box)(Int64,(Base.sub_int)(1,1)))::Int64
      #temp#@_3::Int64 = 1
      5:
      unless (Base.box)(Base.Bool,(Base.not_int)((#temp#@_3::Int64 === (Base.box)(Int64,(Base.add_int)(SSAValue(2),1)))::Bool)) goto 30
      SSAValue(3) = #temp#@_3::Int64
      SSAValue(4) = (Base.box)(Int64,(Base.add_int)(#temp#@_3::Int64,1))
      i::Int64 = SSAValue(3)
      #temp#@_3::Int64 = SSAValue(4) # line 4:
      unless (Core.isa)(x::UNION{FLOAT64,INT64},Float64)::ANY goto 15
      #temp#@_5::Core.MethodInstance = MethodInstance for /(::Float64, ::Int64)
      goto 24
      15:
      unless (Core.isa)(x::UNION{FLOAT64,INT64},Int64)::ANY goto 19
      #temp#@_5::Core.MethodInstance = MethodInstance for /(::Int64, ::Int64)
      goto 24
      19:
      goto 21
      21:
      #temp#@_6::Float64 = (x::UNION{FLOAT64,INT64} / 2)::Float64
      goto 26
      24:
      #temp#@_6::Float64 = $(Expr(:invoke, :(#temp#@_5), :(Main./), :(x::Union{Float64,Int64}), 2))
      26:
      x::ANY = #temp#@_6::Float64
      28:
      goto 5
      30:  # line 6:
      return x::UNION{FLOAT64,INT64}
  end::UNION{FLOAT64,INT64}

Notice that it tells us at the top that the type of x is “ANY”. It will capitalize any type which is not inferred as a “strict type”, i.e. it is an abstract type which needs to be boxed/checked at each step. We see that at the end we return x as a “UNION{FLOAT64,INT64}”, which is another non-strict type. This tells us that the type of x changed, causing the difficulty. If we instead look at the @code_warntype for h, we get all strict types:

@code_warntype h()
 
Variables:
  #self#::#h
  x::Float64
  #temp#::Int64
  i::Int64
 
Body:
  begin
      x::Float64 = 1.0 # line 3:
      SSAValue(2) = (Base.select_value)((Base.sle_int)(1,10)::Bool,10,(Base.box)(Int64,(Base.sub_int)(1,1)))::Int64
      #temp#::Int64 = 1
      5:
      unless (Base.box)(Base.Bool,(Base.not_int)((#temp#::Int64 === (Base.box)(Int64,(Base.add_int)(SSAValue(2),1)))::Bool)) goto 15
      SSAValue(3) = #temp#::Int64
      SSAValue(4) = (Base.box)(Int64,(Base.add_int)(#temp#::Int64,1))
      i::Int64 = SSAValue(3)
      #temp#::Int64 = SSAValue(4) # line 4:
      x::Float64 = (Base.box)(Base.Float64,(Base.div_float)(x::Float64,(Base.box)(Float64,(Base.sitofp)(Float64,2))))
      13:
      goto 5
      15:  # line 6:
      return x::Float64
  end::Float64

Indicating that this function is type stable and will compile to essentially optimal C code. Thus type-instabilities are not hard to find. What’s harder is to find the right design.

Why Allow Type-Instabilities?

This is an age old question which has lead to dynamically-typed languages dominating the scripting language playing field. The idea is that, in many cases you want to make a tradeoff between performance and robustness. For example, you may want to read a table from a webpage which has numbers all mixed together with integers and floating point numbers. In Julia, you can write your function such that if they were all integers, it will compile well, and if they were all floating point numbers, it will also compile well. And if they’re mixed? It will still work. That’s the flexibility/convenience we know and love from a language like Python/R. But Julia will explicitly tell you (via @code_warntype) when you are making this performance tradeoff.

How to Handle Type-Instabilities

There are a few ways to handle type-instabilities. First of all, if you like something like C/Fortran where your types are declared and can’t change (thus ensuring type-stability), you can do that in Julia. You can declare your types in a function with the following syntax:

local a::Int64 = 5

This makes “a” an 64-bit integer, and if future code tries to change it, an error will be thrown (or a proper conversion will be done. But since the conversion will not automatically round, it will most likely throw errors). Sprinkle these around your code and you will get type stability the C/Fortran way.

A less heavy handed way to handle this is with type-assertions. This is where you put the same syntax on the other side of the equals sign. For example:

a = (b/c)::Float64

This says “calculate b/c, and make sure that the output is a Float64. If it’s not, try to do an auto-conversion. If it can’t easily convert, throw an error”. Putting these around will help you make sure you know the types which are involved.

However, there are cases where type instabilities are necessary. For example, let’s say you want to have a robust code, but the user gives you something crazy like:

arr = Vector{Union{Int64,Float64},2}(4)
arr[1]=4
arr[2]=2.0
arr[3]=3.2
arr[4]=1

which is a 4×4 array of both integers and floating point numbers. The actual element type for the array is “Union{Int64,Float64}” which we saw before was a non-strict type which can lead to issues. The compiler only knows that each value can be either an integer or a floating point number, but not which element is which type. This means that naively performing arithmetic on this array, like:

function foo{T,N}(array::Array{T,N})
  for i in eachindex(array)
    val = array[i]
    # do algorithm X on val
  end
end

will be slow since the operations will be boxed.

However, we can use multiple-dispatch to run the codes in a type-specialized manner. This is known as using function barriers. For example:

function inner_foo{T<:Number}(val::T)
  # Do algorithm X on val
end
 
function foo2{T,N}(array::Array{T,N})
  for i in eachindex(array)
    inner_foo(array[i])
  end
end

Notice that because of multiple-dispatch, calling inner_foo either calls a method specifically compiled for floating point numbers, or a method specifically compiled for integers. In this manner, you can put a long calculation inside of inner_foo and still have it perform well do to the strict typing that the function barrier gives you.

Thus I hope you see that Julia offers a good mixture between the performance of strict typing and the convenience of dynamic typing. A good Julia programmer gets to have both at their disposal in order to maximize performance and/or productivity when necessary.

Gotcha #3: Eval Runs at the Global Scope

One last typing issue: eval. Remember this: eval runs at the global scope.

One of the greatest strengths of Julia is its metaprogramming capabilities. This allows you to effortlessly write code which generates code, effectively reducing the amount of code you have to write and maintain. Macro is a function which runs at compile time and (usually) spits out code. For example:

macro defa()
  :(a=5)
end

will replace any instance of “@defa” with the code “a=5” (“:(a=5)” is the quoted expression for “a=5”. Julia code is all expressions, and thus metaprogramming is about building Julia expressions). You can use this to build any complex Julia program you wish, and put it in a function as a type of really clever shorthand.

However, sometimes you may need to directly evaluate the generated code. Julia gives you the “eval” function or the “@eval” macro for doing so. In general, you should try to avoid eval, but there are some codes where it’s necessary, like my new library for transferring data between different processes for parallel programming. However, note that if you do use it:

@eval :(a=5)

then this will evaluate at the global scope (the REPL). Thus all of the associated problems will occur. However, the fix is the same as the fixes for globals / type instabilities. For example:

function testeval()
  @eval :(a=5)
  return 2a+5
end

will not give a good compiled code since “a” was essentially declared at the REPL. But we can use the tools from before to fix this. For example, we can bring the global in and assert a type to it:

function testeval()
  @eval :(a=5)
  b = a::Int64
  return 2b+5
end

Here “b” is a local variable, and the compiler can infer that its type won’t change and thus we have type-stability and are living in good performance land. So dealing with eval isn’t difficult, you just have to remember it works at the REPL.

That’s the last of the gotcha’s related to type-instability. You can see that there’s a very common thread for why it occurs and how to handle them.

Gotcha #4: How Expressions Break Up

This is one that got me for awhile at first. In Julia, there are many cases where expressions will continue if they are not finished. For this reason line-continuation operators are not necessary: Julia will just read until the expression is finished.

Easy rule, right? Just make sure you remember how functions finish. For example:

a = 2 + 3 + 4 + 5 + 6 + 7
   +8 + 9 + 10+ 11+ 12+ 13

looks like it will evaluate to 90, but instead it gives 27. Why? Because “a = 2 + 3 + 4 + 5 + 6 + 7” is a complete expression, so it will make “a=27” and then skip over the nonsense “+8 + 9 + 10+ 11+ 12+ 13”. To continue the line, we instead needed to make sure the expression wasn’t complete:

a = 2 + 3 + 4 + 5 + 6 + 7 +
    8 + 9 + 10+ 11+ 12+ 13

This will make a=90 as we wanted. This might trip you up the first time, but then you’ll get used to it.

The more difficult issue dealing with array definitions. For example:

x = rand(2,2)
a = [cos(2*pi.*x[:,1]).*cos(2*pi.*x[:,2])./(4*pi) -sin(2.*x[:,1]).*sin(2.*x[:,2])./(4)]
b = [cos(2*pi.*x[:,1]).*cos(2*pi.*x[:,2])./(4*pi) - sin(2.*x[:,1]).*sin(2.*x[:,2])./(4)]

at glance you might think a and b are the same, but they are not! The first will give you a (2,2) matrix, while the second is a (1-dimensional) vector of size 2. To see what the issue is, here’s a simpler version:

a = [1 -2]
b = [1 - 2]

In the first case there are two numbers: “1” and “-2”. In the second there is an expression: “1-2” (which is evaluated to give the array [-1]). This is because of the special syntax for array definitions. It’s usually really lovely to write:

a = [1 2 3 -4
     2 -3 1 4]

and get the 2×4 matrix that you’d expect. However, this is the tradeoff that occurs. However, this issue is also easy to avoid: instead of concatenating using a space (i.e. in a whitespace-sensitive manner), instead use the “hcat” function:

a = hcat(cos(2*pi.*x[:,1]).*cos(2*pi.*x[:,2])./(4*pi),-sin(2.*x[:,1]).*sin(2.*x[:,2])./(4))

Problem solved!

Gotcha #5: Views, Copy, and Deepcopy

One way in which Julia gets good performance is by working with “views”. An “Array” is actually a “view” to the contiguous blot of memory which is used to store the values. The “value” of the array is its pointer to the memory location (and its type information). This gives (and useful) interesting behavior. For example, if we run the following code:

a = [3;4;5]
b = a
b[1] = 1

then at the end we will have that “a” is the array “[1;4;5]”, i.e. changing “b” changes “a”. The reason is “b=a” set the value of “b” to the value of “a”. Since the value of an array is its pointer to the memory location, what “b” actually gets is not a new array, rather it gets the pointer to the same memory location (which is why changing “b” changes “a”).

This is very useful because it also allows you to keep the same array in many different forms. For example, we can have both a matrix and the vector form of the matrix using:

a = rand(2,2) # Makes a random 2x2 matrix
b = vec(a) # Makes a view to the 2x2 matrix which is a 1-dimensional array

Now “b” is a vector, but changing “b” still changes “a”, where “b” is indexed by reading down the columns. Notice that this whole time, no arrays have been copied, and therefore these operations have been excessively cheap (meaning, there’s no reason to avoid them in performance sensitive code).

Now some details. Notice that the syntax for slicing an array will create a copy when on the right-hand side. For example:

c = a[1:2,1]

will create a new array, and point “c” to that new array (thus changing “c” won’t change “a”). This can be necessary behavior, however note that copying arrays is an expensive operation that should be avoided whenever possible. Thus we would instead create more complicated views using:

d = @view a[1:2,1]
e = view(a,1:2,1)

Both “d” and “e” are the same thing, and changing either “d” or “e” will change “a” because both will not copy the array, just make a new variable which is a Vector that only points to the first column of “a”. (Another function which creates views is “reshape” which lets you reshape an array.)

If this syntax is on the left-hand side, then it’s a view. For example:

a[1:2,1] = [1;2]

will change “a” because, on the left-hand side, “a[1:2,1]” is the same as “view(a,1:2,1)” which points to the same memory as “a”.

What if we need to make copies? Then we can use the copy function:

b = copy(a)

Now since “b” is a copy of “a” and not a view, changing “b” will not change “a”. If we had already defined “a”, there’s a handy in-place copy “copy!(b,a)” which will essentially loop through and write the values of “a” to the locations of “a” (but this requires that “b” is already defined and is the right size).

But now let’s make a slightly more complicated array. For example, let’s make a “Vector{Vector}”:

a = Vector{Vector{Float64}}(2)
a[1] = [1;2;3]
a[2] = [4;5;6]

Each element of “a” is a vector. What happens when we copy a?

b = copy(a)
b[1][1] = 10

Notice that this will change a[1][1] to 10 as well! Why did this happen? What happened is we used “copy” to copy the values of “a”. But the values of “a” were arrays, so we copied the pointers to memory locations over to “b”, so “b” actually points to the same arrays. To fix this, we instead use “deepcopy”:

b = deepcopy(a)

This recursively calls copy in such a manner that we avoid this issue. Again, the rules of Julia are very simple and there’s no magic, but sometimes you need to pay closer attention.

Gotcha #6: Temporary Allocations, Vectorization, and In-Place Functions

In MATLAB/Python/R, you’re told to use vectorization. In Julia you might have heard that “devectorized code is better”. I wrote about this part before so I will refer back to my previous post which explains why vectorized codes give “temporary allocations” (i.e. they make middle-man arrays which aren’t needed, and as noted before, array allocations are expensive and slow down your code!).

For this reason, you will want to fuse your vectorized operations and write them in-place in order to avoid allocations. What do I mean by in-place? An in-place function is one that updates a value instead of returning a value. If you’re going to continually operate on an array, this will allow you to keep using the same array, instead of creating new arrays each iteration. For example, if you wrote:

function f()
  x = [1;5;6]
  for i = 1:10
    x = x + inner(x)
  end
  return x
end
function inner(x)
  return 2x
end

then each time inner is called, it will create a new array to return “2x” in. Clearly we don’t need to keep making new arrays. So instead we could have a cache array “y” which will hold the output like so:

function f()
  x = [1;5;6]
  y = Vector{Int64}(3)
  for i = 1:10
    inner(y,x)
    for i in 1:3
      x[i] = x[i] + y[i]
    end
    copy!(y,x)
  end
  return x
end
function inner!(y,x)
  for i=1:3
    y[i] = 2*x[i]
  end
  nothing
end

Let’s dig into what’s happening here. “inner!(y,x)” doesn’t return anything, but it changes “y”. Since “y” is an array, the value of “y” is the pointer to the actual array, and since in the function those values were changed, “inner!(y,x)” will have “silently” changed the values of “y”. Functions which do this are called in-place. They are usually denoted with a “!”, and usually change the first argument (this is just by convention). So there is no array allocation when “inner!(y,x)” is called.

In the same way, “copy!(y,x)” is an in-place function which writes the values of “x” to “y”, updating it. As you can see, this means that every operation only changes the values of the arrays. Only two arrays are ever created: the initial array for “x” and the initial array for “y”. The first function created a new array every since time “x + inner(x)” was called, and thus 11 arrays were created in the first function. Since array allocations are expensive, the second function will run faster than the first function.

It’s nice that we can get fast, but the syntax bloated a little when we had to write out the loops. That’s where loop-fusion comes in. In Julia v0.5, you can now use the “.” symbol to vectorize any function (also known as broadcasting because it is actually calling the “broadcast” function). While it’s cool that “f.(x)” is the same thing as applying “f” to each value of “x”, what’s cooler is that the loops fuse. If you just applied “f” to “x” and made a new array, then “x=x+f.(x)” would have a copy. However, what we can instead do is designate everything as array functions:

x .= x .+ f.(x)

The “.=” will do element-wise equals, so this will essentially turn be the code

for i = 1:length(x)
  x[i] = x[i] + f(x[i])
end

which is the allocation-free loop we wanted! Thus another way to write our function
would’ve been:

function f()
  x = [1;5;6]
  for i = 1:10
    x .= x .+ inner.(x)
  end
  return x
end
function inner(x)
  return 2x
end

Therefore we still get the concise vectorized syntax of MATLAB/R/Python, but this version doesn’t create temporary arrays and thus will be faster. This is how you can use “scripting language syntax” but still get C/Fortran-like speeds. If you don’t watch for temporaries, they will bite away at your performance (same in the other languages, it’s just that using vectorized codes is faster than not using vectorized codes in the other languages. In Julia, we have the luxury of something faster being available).

**** Note: Some operators do not fuse in v0.5. For example, “.*” won’t fuse yet. This is still a work in progress but should be all together by v0.6 ****

Gotcha #7: Not Building the System Image for your Hardware

This is actually something I fell pray to for a very long time. I was following all of these rules thinking I was a Julia champ, and then one day I realized that not every compiler optimization was actually happening. What was going on?

It turns out that the pre-built binaries that you get via the downloads off the Julia site are toned-down in their capabilities in order to be usable on a wider variety of machines. This includes the binaries you get from Linux when you do “apt-get install” or “yum install”. Thus, unless you built Julia from source, your Julia is likely not as fast as it could be.

Luckily there’s an easy fix provided by Mustafa Mohamad (@musm). Just run the following code in Julia:

include(joinpath(dirname(JULIA_HOME),"share","julia","build_sysimg.jl")); build_sysimg(force=true)

If you’re on Windows, you may need to run this code first:

Pkg.add("WinRPM"); WinRPM.install("gcc")

And on any system, you may need to have administrator privileges. This will take a little bit but when it’s done, your install will be tuned to your system, giving you all of the optimizations available.

Conclusion: Learn the Rules, Understand Them, Then Profit

To reiterate one last time: Julia doesn’t have compiler magic, just simple rules. Learn the rules well and all of this will be second nature. I hope this helps you as you learn Julia. The first time you encounter a gotcha like this, it can be a little hard to reason it out. But once you understand it, it won’t hurt again. Once you really understand these rules, your code will compile down to essentially C/Fortran, while being written in a concise high-level scripting language. Put this together with broadcast fusing and metaprogramming, and you get an insane amount of performance for the amount of code you’re writing!

Here’s a question for you: what Julia gotchas did I miss? Leave a comment explaining a gotcha and how to handle it. Also, just for fun, what are your favorite gotchas from other languages? [Mine has to be the fact that, in Javascript inside of a function, “var x=3” makes “x” local, while “x=3” makes “x” global. Automatic globals inside of functions? That gave some insane bugs that makes me not want to use Javascript ever again!]

The post 7 Julia Gotchas and How to Handle Them appeared first on Stochastic Lifestyle.

Introducing DifferentialEquations.jl

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/introducing-differentialequations-jl/

Differential equations are ubiquitous throughout mathematics and the sciences. In fact, I myself have studied various forms of differential equations stemming from fields including biology, chemistry, economics, and climatology. What was interesting is that, although many different people are using differential equations for many different things, pretty much everyone wants the same thing: to quickly solve differential equations in their various forms, and make some pretty plots to describe what happened.

The goal of DifferentialEquations.jl is to do exactly that: to make it easy solve differential equations with the latest and greatest algorithms, and put out a pretty plot. The core idea behind DifferentialEquations.jl is that, while it is easy to describe a differential equation, they have such diverse behavior that experts have spent over a century compiling different ways to think about and handle differential equations. Most users will want to just brush past all of the talk about which algorithms simply ask: “I have this kind of differential equation. What does the solution look like?”

DifferentialEquations.jl’s User Interface

To answer that question, the user should just have to say what their problem is, tell the computer to solve it, and then tell the computer to plot it. In DifferentialEquations.jl, we use exactly those terms. Let’s look at an Ordinary Differential Equation (ODE): the linear ODE . It is described as the function

 y^\prime = \alpha y

To use DifferentialEquations.jl, you first have to tell the computer what kind of problem you have, and what your data is for the problem. Recall the general ordinary differential equation is of the form

 y^\prime = f(y)

and initial condition u_0, so in this case, we have an ODE with data f(y)=\alpha y and u_0. DifferentialEquations.jl is designed as a software for a high-level language, Julia. There are many reasons for this choice, but the one large reason is its type system and multiple dispatch. For our example, we tell the machine what type of problem we have by building a DEProblem type. The code looks like this:

using DifferentialEquations
alpha = 0.5 #Setting alpha to 1/2
f(y,t) = alpha*y
u0 = 1.5
prob = ODEProblem(f,u0)

where prob contains everything about our problem. You can then tell the computer to solve it and give you a plot by, well, solve and plot:

timespan = [0,1] # Solve from time = 0 to time = 1
sol = solve(prob,timespan) # Solves the ODE
plot(sol) # Plots the solution using Plots.jl

And that’s the key idea: the user should simply have to tell the program what the problem is, and the program should handle the details. That doesn’t mean that the user won’t have access to to all of the details. For example, we can control the solver and plotters in more detail, using something like

sol = solve(prob,alg=:RK4) # Unrolled and optimzed RK4
plot(sol,lw=3) # All of the Plots.jl attributes are available

However, in many cases a user may approach the problem for which they don’t necessarily know anything about the algorithms involved in approximating the problem, and so obfuscating the API with these names is simply confusing. One place where this occurs is solving stochastic differential equations (SDEs). These have been recently growing in popularity in many of the sciences (especially systems biology) due to their added realism and their necessity when modeling rare and random phenomena. In DifferentialEquations.jl, you can get started by simply knowing that an SDE problem is defined by the functions f and g in the form

 dX_t = f(X_t,t)dt + g(X_t,t)dW_t,

with initial condition u_0, and so the steps for defining and solving the linear SDE is

g(u,t) = 0.3u
prob = SDEProblem(f,g,u0)
sol = solve(prob,timespan)
plot(sol)

If you wish to dig into the manual, you will see that the default solver that is used is a Rossler-SRI type of method and will (soon) have adaptivity which is complex enough to be a numerical analysis and scientific computing research project. And you can dig into the manual to find out how to switch to different solvers, but the key idea is that you don’t have to. Everything is simply about defining a problem, and asking for solutions and plots.

And that’s it. For more about the API, take a look at the documentation or the tutorial IJulia notebooks. What I want to discuss is why I believe this is the right choice, where we are, and where we can go with it.

What exactly does that give us?

Julia was created to solve the many-language problem in scientific computing. Before people would have to write out the inner loops as C/Fortran, and bind it to a scripting language that was never designed with performance in mind. Julia has done extremely well as solving this problem via multiple-dispatch. Multiple dispatch is not just about ease of use, but it is also the core of what allows Julia to be fast . From a quote I am stealing from IRC: “Julia: come for the fast loops, stay for the type-system”.

In my view, the many-language problem always had an uglier cousin: the many-API problem. Every package has its own way of interacting with the user, and it becomes a burden to remember how all of them work. However, in Julia there seems to be a beautiful emergence of packages which solve the many-API problem via Julia’s multiple-dispatch and metaprogramming functionalities. Take for example Plots.jl. There are many different plotting packages in Julia. However, through Plots.jl, you can plot onto any “backend” (other plotting package) using just one API. You can mix and match plotting in PyPlot (matplotlib), GR, Plotly, and unicode. It’s all the same commands. Another example of this is JuMP. Its core idea is solver independence: you take your optimization problem, define the model in JuMP’s lingo, and then plug into many different solvers all by flipping a switch.

DifferentialEquations.jl is extending this idea to the realm of differential equations. By using the keyword `alg=:ode45`, the solver can call functions from ODE.jl. And changing it to `alg=:dopri5`, DifferentialEquations.jl will solve your ODE problem using the coveted dopri5 Fortran software. The complexity of learning and understanding many different APIs is no longer a requirement for trying different algorithms.

But why “Differential Equations”? Isn’t that broad?

Sure, there are packages for solving various types of differential equations, all specializing in one little part. But when I was beginning my PhD, quickly found that these packages were missing something. The different types of differential equations that we encounter are not different but many times embody the same problem: a PDE when discretized is a system of ODEs, the probability distribution of evolving SDEs is a PDE (a form of the Heat Equation), and all of the tools that they use to get good performance are the same. Indeed, many contemporary research questions can be boiled down to addressing the question: what happens if we change the type of differential equation? What happens if we add noise to our ODEs which describe population dispersal? What happens if we add to our model that RNA production is reading a delayed signal? Could we make this high-dimensional PDE computationally feasible via a Monte Carlo experiment combined with Feynman-Kac’s theorem?

Yet, our differential equations libraries are separate. Our PDEs are kept separate from our SDEs, while our delay equations hang out in their own world. Mixing and matching solvers requires learning complex APIs, usually large C/Fortran libraries with opaque function names. That is what DifferentialEquations.jl is looking to solve. I am building DifferentialEquations.jl as a hub for differential equations, the general sense of the term.

If you have defined an SDE problem, then via the Forward Kolmorogov equation there is a PDE associated to the SDE. In many cases like the Black-Scholes model, both the SDE and the PDE are canonical ways of looking at the same problem. The solver should translate between them, and the solver should handle both types of differential equations. With one API and the functionality for these contained within the same package, no longer are they separate entities to handle computationally.

Where are we currently?

DifferentialEquations.jl is still very young. Indeed, the project only started a few months ago, and during that time period I was taking 6 courses. However, the package already has a strong core, including

In fact, there are already a lot of features which are unique to DifferentialEquations.jl:

You may have been thinking, “but I am a numerical analyst. How could this program help me?”. DifferentialEquations.jl has a bunch of functionalities for quickly designing and testing algorithms. All of the DEProblems allow for one to give them the analytical solution, and the solvers will then automatically calculate the errors. Thus by using some simple macros, one can define new algorithms in just a few lines of code, test the convergence, benchmark times, and have the algorithm available as an `alg` option in no time (note: all of the ODE solvers were written in one morning!). Thus it is easy to define the loop, and the rest of the functionality will come by default. It’s both a great way to test algorithms, and share algorithms. Contributing will both help you and DifferentialEquations.jl!.

Where are we going?

I have big plans for DifferentialEquations.jl. For example:

  • I will be rolling out an efficiency testing suite so that one could just specify the algorithms you’d like to solve a problem, and along what convergence axis (i.e. choose a few \Delta ts, or along changing tolerances), and it will output comparisons of the computational efficiencies and make some plots. It will be similar in functionality to the ConvergenceSimulation suite.
  • Finite difference methods for Heat and Poisson equation. These are long overdue for the research I do.
  • Changing the tableaus in ODEs and SDEs to StaticArrays so they are stack allocated. This has already been tested and works well on v0.5.
  • Higher-order methods for parabolic SPDEs (a research project with promising results!).
  • Blazing fast adaptivity for SDEs. (Once the paper I have submitted for it is published, it will be available. It’s already implemented!)
  • High-stability high order methods for SDEs (another research project).
  • Parallel methods. I have already implemented parallel (Xeon Phi) solvers and described them in previous blog posts. They simply need to be integrated into DifferentialEquations.jl. I would like to have native GPU solvers as well.
  • Delay and algebraic differential equations.
  • Wrapping more popular solvers. I’d like to add Sundials, LSODE, and PetsC to the list.
  • A web interface via Escher.jl to define DEProblems and get the solution plots. I am looking to have this hosted as an XSEDE Gateway.

If you’d like to influence where this project is going, please file an issue on the Github repository. I am always open for suggestions.

I hope this gives you a good idea on what my plans are for DifferentialEquations.jl. Check out the documentation and give the package a whirl!

The post Introducing DifferentialEquations.jl appeared first on Stochastic Lifestyle.

Using Julia’s Type System For Hidden Performance Gains

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/using-julias-type-system-hidden-performance-chunkedarrays-growablearrays-ellipsisnotation/

What I want to share today is how you can use Julia’s type system to hide performance gains in your code. What I mean is this: in many cases you may find out that the optimal way to do some calculation is not a “clean” solution. What do you do? What I want to do is show how you can define special arrays which are wrappers such that these special “speedups” are performed in the background, while having not having to keep all of that muck in your main algorithms. This is easiest to show by example.

The examples I will be building towards are useful for solving ODEs and SDEs. Indeed, these tricks have all been implemented as part of DifferentialEquations.jl and so these examples come from a real use case! They really highlight a main feature of Julia: since Julia code is fast (as long as you have type stability!), you don’t need to worry about writing code outside of Julia, and you can take advantage of higher-level features (with caution). In Python, normally one would only get speed benefits by going down to C, and so utilizing these complex objects would not get speed benefits over simply using numpy arrays in the most vectorized fashion. The same holds for R. In MATLAB… it would be really tough to implement most of this in MATLAB!

Taking Random Numbers in Chunks: ChunkedArrays

Let’s say we need to take random numbers in a loop like the following:

for i = 1:N
  dW = randn(size(u))
  #Do some things and add dW to u
end

While this is the “intuitive” code to write, it’s not necessarily the best. While there have been some improvements made since early Julia, in principle it’s just slower to make 1000 random numbers via randn() than to use randn(1000). This is because of internal speedups due to caching, SIMD, etc. and you can find mentions of this fact all over the web especially when people are talking about fast random number generators like from the VSL library.

So okay, what we really want to do is the following. Every “bufferSize” steps, create a new random number dW which is of size size(u)*bufferSize, and go through using the buffer until it is all used up, and then grab another buffer.

for i = 1:N
  if i%bufferSize == 0
    dW = randn(size(u),bufferSize)
  end
  #Do some things and add dW[..,i] to u
end

But wait? What if we don’t always use one random number? Sometimes the algorithm may need to use more than one! So you can make an integer k which tracks the current state in the buffer, and then at each point where it can be incremented, you add the conditional to grab a new buffer, etc. Also, what if you want to have the buffer generated in parallel? As you can see, code complexity explosion, just to go a little faster?

This is where ChunkedArrays come in. What I did is defined an array which essentially does the chunking/buffering in the background, so that way the code in the algorithm could be clean. A ChunkedArray is a wrapper over an array, and then used the next command to hide all of this complexity. Thus, to generate random numbers in chunks to get this speed improvement, you can use code like this:

rands = ChunkedArray(u)
for i = 1:N
  if i%bufferSize == 0
    dW = next(rands)
  end
  #Do some things and add dW[..,i] to u
end

Any time another random number is needed, you just call next. It internally stores an array and the state of the buffer, and the next function automatically check / replenishes the buffer, and can launch another process to do this in parallel if the user wants. Thus we get the optimal solution without sacrificing cleanliness. I chopped off about 10% of a runtime in Euler-Maruyama code in DifferentialEquations.jl by switching to ChunkedArrays, and haven’t thought about doing a benchmark since.

Safe Vectors of Arrays and Conversion: GrowableArrays

First let’s look at the performance difference between Vectors of Arrays and higher-dimensional contiguous arrays when using them in a loop. Julia’s arrays can take in a parametric type which makes the array hold arrays, this makes the array essentially an array of pointers. The issue here is that this adds an extra cost every time the array is dereferenced. However, for high-dimensional arrays, the : way of referencing has to generate a slice each time. Which way is more performant?

function test1()
  u = Array{Int}(4,4,3)
  u[:,:,1] = [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  u[:,:,2] = [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  u[:,:,3] = [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  j = 1
  for i = 1:100000
    j += sum(u[:,:,1] + u[:,:,2] + 3u[:,:,3] + u[:,:,i%3+1] -  u[:,:,(i%j)%2+1])
  end
end
 
 
 
function test2()
  u = Vector{Matrix{Int}}(3)
  u[1] = [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  u[2] = [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  u[3] = [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  j = 1
  for i = 1:100000
    j += sum(u[1] + u[2] + 3u[3] + u[i%3+1] - u[(i%j)%2+1])
  end
end
 
 
function test3()
  u = Array{Int}(4,4,4)
  u[1,:,:] = reshape([1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1],(1,4,4))
 
  u[2,:,:] = reshape([1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1],(1,4,4))
 
  u[3,:,:] = reshape([1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1],(1,4,4))
 
  j = 1
  for i = 1:100000
    j += sum(u[1,:,:] + u[2,:,:] + 3u[3,:,:] + u[i%3+1,:,:] - u[(i%j)%2+1,:,:])
  end
end
 
#Pre-compile
test1()
test2()
test3()
 
t1 = @elapsed for i=1:10 test1() end
t2 = @elapsed for i=1:10 test2() end
t3 = @elapsed for i=1:10 test3() end
 
println("Test results: t1=$t1, t2=$t2, t3=$t3")
#Test results: t1=1.239379946, t2=0.576053075, t3=1.533462129

So using Vectors of Arrays is fast for dereferecing.

Now think about adding to an array. If you have a Vector of pointers and need to resize the array, it’s much easier to resize and copy over some pointers then it is to copy over all of the arrays. So, if you’re going to grow an array in a loop, the Vector of Arrays is the fastest implementation! Here’s a quick benchmark from GrowableArrays.jl:

using GrowableArrays, EllipsisNotation
using Base.Test
 
tic()
const NUM_RUNS = 100
const PROBLEM_SIZE = 1000
function test1()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = u
  for i = 1:PROBLEM_SIZE
    uFull = hcat(uFull,u)
  end
  uFull
end
 
function test2()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = u
 
  for i = 1:PROBLEM_SIZE
    uFull = vcat(uFull,u)
  end
  uFull
end
 
function test3()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = Vector{Int}(0)
  sizehint!(uFull,PROBLEM_SIZE*16)
  append!(uFull,vec(u))
 
  for i = 1:PROBLEM_SIZE
    append!(uFull,vec(u))
  end
  reshape(uFull,4,4,PROBLEM_SIZE+1)
  uFull
end
 
function test4()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = Vector{Array{Int}}(0)
  push!(uFull,copy(u))
 
  for i = 1:PROBLEM_SIZE
    push!(uFull,copy(u))
  end
  uFull
end
 
function test5()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = Vector{Array{Int,2}}(0)
  push!(uFull,copy(u))
 
  for i = 1:PROBLEM_SIZE
    push!(uFull,copy(u))
  end
  uFull
end
 
function test6()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = Vector{typeof(u)}(0)
  push!(uFull,u)
 
  for i = 1:PROBLEM_SIZE
    push!(uFull,copy(u))
  end
  uFull
end
 
function test7()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = GrowableArray(u)
  for i = 1:PROBLEM_SIZE
    push!(uFull,u)
  end
  uFull
end
 
function test8()
  u =    [1 2 3 4
          1 3 3 4
          1 5 6 3
          5 2 3 1]
 
  uFull = GrowableArray(u)
  sizehint!(uFull,PROBLEM_SIZE)
  for i = 1:PROBLEM_SIZE
    push!(uFull,u)
  end
  uFull
end
 
println("Run Benchmarks")
println("Pre-Compile")
#Compile Test Functions
test1()
test2()
test3()
test4()
test5()
test6()
test7()
test8()
 
println("Running Benchmarks")
t1 = @elapsed for i=1:NUM_RUNS test1() end
t2 = @elapsed for i=1:NUM_RUNS test2() end
t3 = @elapsed for i=1:NUM_RUNS test3() end
t4 = @elapsed for i=1:NUM_RUNS test4() end
t5 = @elapsed for i=1:NUM_RUNS test5() end
t6 = @elapsed for i=1:NUM_RUNS test6() end
t7 = @elapsed for i=1:NUM_RUNS test7() end
t8 = @elapsed for i=1:NUM_RUNS test8() end
 
println("Benchmark results: $t1 $t2 $t3 $t4 $t5 $t6 $t7 $t8")
 
#Benchmark results: 1.923640854 2.131108443 0.012493308 0.00866045 0.005246504 0.00532613 0.00773568 0.00819909

As you can see in test7 and test8, I created a “GrowableArray” which is an array which acts like a Vector of Arrays. However, it has an added functionality that if you copy(G), then what you get is the contiguous array. Therefore in the loop you can grow the array the quickest way as a storage machine, but after the loop (say to plot the array), but at any time you can copy it to a contiguous array which is more suited for interop with C and other goodies.

It also hides a few painful things. Notice that in the code we pushed a copy of u (copy(u)). This is because when u is an array, it’s only the reference to the array, so if we simply push!(uFull,u), every element of uFull is actually the same item! This benchmark won’t catch this issue, but try changing u and you will see that every element of uFull changes if you don’t use copy. This can be a nasty bug, so instead we build copy() into the push!() command for the GrowableArray. This gives another issue. Since copying a GrowableArray changes it, you need to make sure push! doesn’t copy on arguments of GrowableArrays (to create GrowableArrays of GrowableArrays). However, this is easily managed via dispatch.

Helping Yourself and the Julia Community

Simple projects like these lead to re-usable solutions to improve performance while allowing for ease of use. I have just detailed some projects I have personally done (and have more to do!), but there are others that should be pointed out. I am fond of projects like VML.jl which speedup standard functions, and DoubleDouble.jl which implements efficient quad-precision numbers that you can then use in place of other number types.

I think Julia will succeed not by the “killer packages” that are built in Julia, but by a rich type ecosystem that will make everyone want to build their “killer package” in Julia.

The post Using Julia’s Type System For Hidden Performance Gains appeared first on Stochastic Lifestyle.