Tag Archives: Uncategorized

Some Fun With Julia Types: Symbolic Expressions in the ODE Solver

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/fun-julia-types-symbolic-expressions-ode-solver/

In Julia, you can naturally write generic algorithms which work on any type which has specific “actions”. For example, an “AbstractArray” is a type which has a specific set of functions implemented. This means that in any generically-written algorithm that wants an array, you can give it an AbstractArray and it will “just work”. This kind of abstraction makes it easy to write a simple algorithm and then use that same exact code for other purposes. For example, distributed computing can be done by just passing in a DistributedArray, and the algorithm can be accomplished on the GPU by using a GPUArrays. Because Julia’s functions will auto-specialize on the types you give it, Julia automatically makes efficient versions specifically for the types you pass in which, at compile-time, strips away the costs of the abstraction.

This means that one way to come up with new efficient algorithms in Julia is to simply pass a new type to an existing algorithm. Does this kind of stuff really work? Well, I wanted to show that you can push this to absurd places by showing one of my latest experiments.

Note

The ODE solver part of this post won’t work on release until some tags go through (I’d say at least in a week?). The fix that I had to do was make the “internal instability checks”, which normally default to checking if the number is NaN, instead always be false for numbers which aren’t AbstractFloats (meaning, no matter what don’t halt integration due to instability). Do you understand why that would be a problem when trying to use a symbolic expression? This is a good question to keep in mind while reading.

SymEngine.jl

Let me first give a little bit of an introduction to SymEngine.jl. SymEngine is a re-write of SymPy into C++ for increased efficiency, and SymEngine.jl is a wrapper for Julia. The only part of the package I will use is its symbolic expression type, the SymEngine.Basic.

In many cases, this is all you need out of the package. This is because dispatch and generic algorithms tend to handle everything else you need. For example, what if you want the symbolic expression for the inverse of a matrix? Just make a matrix of expressions, and call the Julia inverse function:

y1,y2,y3,y4 = @vars y1 y2 y3 y4
A = [y1 y1+y2
     y3+y2 y4]
 
println(inv(A))
 
#SymEngine.Basic[(1 + y2*y3/(y1*(y4 - y2*y3/y1)))/y1 -y2/(y1*(y4 - y2*y3/y1)); -y3/(y1*(y4 - y2*y3/y1)) (y4 - y2*y3/y1)^(-1)]

Notice that the only thing that’s directly used here from SymEngine is the declaration of the variables for the symbolic expressions (the first line). The second line just creates a Julia matrix of these expressions. The last line prints the inv function from Julia’s Base library applied to this matrix of expressions.

Julia’s inverse function is generically written, and thus it just needs to have that the elements satisfy some properties of numbers. For example, they need to be able to add, subtract, multiply, and divide. Since our “number” type, can do this the inverse function “just works”. But also, by Julia’s magic, a separate function is created and compiled just for doing this, which makes this into a highly efficient function. So easy + efficient: the true promise of Julia is satisfied.

Going Deep: The ODE Solver

Now let’s push this. Let’s say we had a problem where we wanted to find out which initial condition is required in order to get out a specific value. One way to calculate this is to use Boundary Value Problem (BVP) solvers, but let’s say we will want to do this at a bunch of different timepoints.
How about using a numerical solver for ODEs, except instead of using numbers, let’s use symbolic expressions for our initial conditions so that way we can calculate approximate functions for the timepoints. Sounds fun and potentially useful, so let’s give it a try.

The ODE solvers for Julia are in the package DifferentialEquations.jl. Let’s solve the linear ODE:

 \frac{dy}{dt} = 2y

with an initial condition which is a symbolic variable. Following the tutorial, let’s swap out the numbers for symbolic expressions. To do this, we simply make the problem type and solve it:

using DifferentialEquations, SymEngine
y0 = symbols(:y0)
u0 = y0
f = (t,y) -> 2y
prob = ODEProblem(f,u0,(0.0,1.0))
sol = solve(prob,RK4(),dt=1/10)
 
println(sol)
# SymEngine.Basic[y0,1.2214*y0,1.49181796*y0,1.822106456344*y0,2.22552082577856*y0,2.71825113660594*y0,3.32007193825049*y0,4.05513586537915*y0,4.95294294597409*y0,6.04952451421275*y0,7.38888924165946*y0,7.38888924165946*y0]

The solution is an array of symbolic expressions for what the RK4 method (the order 4 Runge-Kutta method) gives at each timepoint, starting at 0.0 and stepping 0.1 units at a time up to 1.0. We can then use the SymEngine function lambdify to turn the solution at the end into a function, and check it. For our reference, I will re-solve the differential equation at a much higher accuracy. We can test this against the true solution, which for the linear ODE we know this is:

 y(t) = y_0 \exp(2t)

sol = solve(prob,RK4(),dt=1/1000)
end_solution  = lambdify(sol[end])
 
println(end_solution(2)) # 14.778112197857341
println(2exp(2)) # 14.7781121978613
println(end_solution(3)) # 22.167168296786013
println(3exp(2)) # 22.16716829679195

We have successfully computed a really high accuracy approximation to our solution which is a function of the initial condition! We can even do this for systems of ODEs. For example, let’s get a function which approximates a solution to the Lotka-Volterra equation:

using SymEngine, DifferentialEquations
# Make our initial condition be symbolic expressions from SymEngine
x0,y0 = @vars x0 y0
u0 = [x0;y0]
f = function (t,y,dy)
  dy[1] = 1.5y[1] - y[1]*y[2]
  dy[2] = -3y[2] + y[1]*y[2]
end
prob = ODEProblem(f,u0,(0.0,1.0))
sol = solve(prob,RK4(),dt=1/2)

The result is a stupidly large expression because it grows exponentially and SymEngine doesn’t have a simplification function yet (it’s still pretty new!), but hey this is super cool anyways.

Going Past The Edge: What Happens With Incompatibility?

There are some caveats here. You do need to work through MethodErrors. For example, if you wanted to use the more efficient version of ode45/dopri5, you’ll get an error:

sol = solve(prob,Tsit5())
 
 
MethodError: no method matching abs2(::SymEngine.Basic)
Closest candidates are:
  abs2(!Matched::Bool) at bool.jl:38
  abs2(!Matched::ForwardDiff.Dual{N,T<:Real}) at C:\Users\Chris\.julia\v0.5\ForwardDiff\src\dual.jl:325
  abs2(!Matched::Real) at number.jl:41
  ...

What this is saying is that there is no abs2 function for symbolic expressions. The reason why this is used is because the adaptive algorithm uses normed errors in order to find out how to change the stepsize. However, in order to get a normed error, we’d actually have to know the initial condition… so this just isn’t going to work.

Conclusion

This is kind of a silly example just for fun, but there’s a more general statement here. In Julia, new algorithms can just be passing new types to pre-written functionality. This vastly decreases the amount of work that you actually have to do, without a loss to efficiency! You can add this to your Julia bag of tricks.

The post Some Fun With Julia Types: Symbolic Expressions in the ODE Solver appeared first on Stochastic Lifestyle.

Timing in Julia

By: pkofod

Re-posted from: http://www.pkofod.com/2017/04/24/timing-in-julia/

Timing code is important when you want to benchmark or profile your code. Is it the solution of a linear system or the Monte Carlo integration scheme that takes up most of the time? Is version A or version B of a function faster? Questions like that show up all the time. Let us have a look at a few of the possible ways of timing things in Julia.

The basics

The most basic timing functionalities in Julia are the ones included in the Base language. The standard way of timing things in Julia, is by use of the @time macro.

julia> function test(n)
           A = rand(n, n)
           b = rand(n)
           @time A\b
       end
test (generic function with 1 method)

Do note, that the code we want to time is put in a function . This is because everything we do at the top level in the REPL is in global scope. It’s a mistake a lot of people do all the time, but currently it is a very bad idea performance wise. Anyway, let’s see what happens for n = 1, 10, 100, and 1000.

julia> test(1);
  0.000002 seconds (7 allocations: 320 bytes)
 
julia> test(10);
  0.000057 seconds (9 allocations: 1.313 KB)
 
julia> test(100);
  0.001425 seconds (10 allocations: 80.078 KB)
 
julia> test(1000);
  0.033573 seconds (10 allocations: 7.645 MB, 27.81% gc time)
 
julia> test(1000);
  0.045214 seconds (10 allocations: 7.645 MB, 47.66% gc time)

The first run is to compile both test, and then we have a look at what happens when the dimension of our problem increases. Elapsed time seems to increase, and we also see that the number of allocations, and the amount of memory that was allocated increases. For the runs with dimension 1000 we see something else in the output. 30-50% of the time was spent in “gc”. What is this? Julia is a garbage collected language. This means that Julia keeps track of current allocations, and frees the memory if it isn’t needed anymore. It doesn’t do this all the time, though. Running the 1000-dimensional problem once more gives us

julia> test(1000)
  0.029277 seconds (10 allocations: 7.645 MB)

We see it runs slightly faster, and there is no GC time this time around. Of course, these things will look slightly different if you try to replicate them.

So now we can time. But what if we want to store this number? We could be tempted to try

t = @time 3+3

but we will realize, that what is returned is the return value of the expression, not the elapsed time. To save the time, we can either use @timed or @elapsed. Let us try to change the @time to @timed and look at the output when we have our new test2 function return the return value.

julia> function test2(n)
           A = rand(n, n)
           b = rand(n)
           @timed A\b
       end
test2 (generic function with 1 method)
 
julia> test2(3)
([0.700921,-0.120765,0.683945],2.7889e-5,512,0.0,Base.GC_Diff(512,0,0,9,0,0,0,0,0))

We see that it returns a tuple with: the return value of A\b followed by the elapsed time, then the bytes allocated, time spent in garbage collection, and lastly some further memory counters. This is great as we can now work with the information @time printed, but we still have access to the results of our calculations. Of course, it is a bit involved to do it this way. If we simply wanted to see the elapsed time to act on that – then we would just use @time as we did above.

Before we move on to some simpler macros, let us consider the last “time*-family” macro: @timev. As we saw above, @timed contained more information about memory allocation than @time printed. If we want the “verbose” version, we use @timev (v for verbose):

julia> function test3(n)
           A = rand(n, n)
           b = rand(n)
           @timev A\b
       end
test3 (generic function with 1 method)

Running test3 on a kinda large problem, we see that is does indeed print the contents of Base.GC_Diff

julia> test3(5000);
  1.923164 seconds (12 allocations: 190.812 MB, 4.67% gc time)
elapsed time (ns): 1923164359
gc time (ns):      89733440
bytes allocated:   200080368
pool allocs:       9
malloc() calls:    3
GC pauses:         1
full collections:  1

If any of the entries are zero, the corresponding lines are omitted.

julia> test3(50);
  0.001803 seconds (10 allocations: 20.828 KB)
elapsed time (ns): 1802811
bytes allocated:   21328
pool allocs:       9
malloc() calls:    1

Of the three macros, you’ll probably not use @timev a lot.

Simpler versions

If we only want the elapsed time or only want the allocations, then we used either the @elapsed or @allocated macros. However, these do not return the results of our calculations, so in many cases it may be easier to just used @timed, so we can grab the results, the elapsed time, and the allocation information. “MATLAB”-style tic();toc()‘s are also available. toc() prints the time, while toq() is used if we want only the returned time without the printing. It is also possible to use time_ns() to do what time.time() would do in Python, although for practically all purposes, the above macros are recommended.

More advanced functionality

Moving on to more advanced features, we venture into the package ecosystem.

Nested timings

The first package I will present is the nifty TimerOutputs.jl by Kristoffer Carlsson. This packages essentially allows you to nest @time calls. The simplest way to show how it works, is to use the example posted at the announcement (so credit to Kristoffer for the example).

using TimerOutputs
 
# Create the timer object
to = TimerOutput()
 
# Time something with an assigned label
@timeit to "sleep" sleep(0.3)
 
# Data is accumulated for multiple calls
for i in 1:100
    @timeit to "loop" 1+1
end
 
# Nested sections are possible
@timeit to "nest 1" begin
    @timeit to "nest 2" begin
        @timeit to "nest 3.1" rand(10^3)
        @timeit to "nest 3.2" rand(10^4)
        @timeit to "nest 3.3" rand(10^5)
    end
    rand(10^6)
end

Basically we’re timing the sleep call in one time counter, all the additions in the loop in another counter, and then we do some nested generation of random numbers. Displaying the to instance gives us something like the following

 ───────────────────────────────────────────────────────────────────────
                                Time                   Allocations      
                        ──────────────────────   ───────────────────────
    Tot / % measured:        6.48s / 5.60%           77.4MiB / 12.0%    
 
 Section        ncalls     time   %tot     avg     alloc   %tot      avg
 ───────────────────────────────────────────────────────────────────────
 sleep               1    338ms  93.2%   338ms    804KiB  8.43%   804KiB
 nest 1              1   24.7ms  6.80%  24.7ms   8.52MiB  91.5%  8.52MiB
   nest 2            1   9.10ms  2.51%  9.10ms    899KiB  9.43%   899KiB
     nest 3.1        1   3.27ms  0.90%  3.27ms   8.67KiB  0.09%  8.67KiB
     nest 3.3        1   3.05ms  0.84%  3.05ms    796KiB  8.34%   796KiB
     nest 3.2        1   2.68ms  0.74%  2.68ms   92.4KiB  0.97%  92.4KiB
 loop              100   6.97μs  0.00%  69.7ns   6.08KiB  0.06%      62B
 ───────────────────────────────────────────────────────────────────────

which nicely summarizes absolute and relative time and memory allocation of the individual @timeit calls. A real use case could be to see what the effect is of using finite differencing to construct the gradient for the Generalized Rosenbrock (GENROSEN) problem from CUTEst.jl using a conjugate gradient solver in Optim.jl.

using CUTEst, Optim, TimerOutputs
 
nlp = CUTEstModel("GENROSE")
 
const to = TimerOutput()
 
f(x    ) =  @timeit to "f"  obj(nlp, x)
g!(g, x) =  @timeit to "g!" grad!(nlp, x, g)
 
begin
reset_timer!(to)
@timeit to "Conjugate Gradient" begin
    res = optimize(f, g!, nlp.meta.x0, ConjugateGradient(), Optim.Options(iterations=5*10^10));
    println(Optim.minimum(res))
end
@timeit to "Conjugate Gradient (FiniteDiff)" begin
    res = optimize(f, nlp.meta.x0, ConjugateGradient(), Optim.Options(iterations=5*10^10));
    println(Optim.minimum(res))
end
show(to; allocations = false)
end

the output is a table as before, this time without the allocations (notice the use of the allocations keyword in the show method)

 ────────────────────────────────────────────────────────────────
                                                   Time          
                                           ──────────────────────
             Tot / % measured:                  33.3s / 100%     
 
 Section                           ncalls     time   %tot     avg
 ────────────────────────────────────────────────────────────────
 Conjugate Gradient (FiniteDiff)        1    33.2s  99.5%   33.2s
   f                                1.67M    32.6s  97.9%  19.5μs
 Conjugate Gradient                     1    166ms  0.50%   166ms
   g!                               1.72k   90.3ms  0.27%  52.6μs
   f                                2.80k   59.1ms  0.18%  21.1μs
 ────────────────────────────────────────────────────────────────

And we conclude: finite differencing is very slow when you’re solving a 500 dimensional unconstrained optimization problem, and you really want to use the analytical gradient if possible.

Benchmarking

Timing individual pieces of code can be very helpful, but when we’re timing small function calls, this way of measuring performance can be heavily influenced by noise. To remedy that, we use proper benchmarking tools. The package for that, well, it’s called BenchmarkTools.jl and is mainly written by Jarrett Revels. The package is quite advanced in its feature set, but its basic functionality is straight forward to use. Please see the manual for more details than we provide here.

Up until now, we’ve asked Julia to tell us how much time some code took to run. Unfortunately for us, the computer is doing lots of stuff besides the raw calculations we’re trying to time. From the example earlier, this means that we have a lot of noise in our measure of the time it takes to solve A\b. Let us try to run test(1000) a few times

julia> test(1000);
  0.029859 seconds (10 allocations: 7.645 MB)
 
julia> test(1000);
  0.033381 seconds (10 allocations: 7.645 MB, 6.41% gc time)
 
julia> test(1000);
  0.024345 seconds (10 allocations: 7.645 MB)
 
julia> test(1000);
  0.039585 seconds (10 allocations: 7.645 MB)
 
julia> test(1000);
  0.037154 seconds (10 allocations: 7.645 MB, 2.82% gc time)
 
julia> test(1000);
  0.024574 seconds (10 allocations: 7.645 MB)
 
julia> test(1000);
  0.022185 seconds (10 allocations: 7.645 MB)

There’s a lot of variance here! Let’s benchmark instead. The @benchmark macro won’t work inside a function as above. This means that we have to be a bit careful (thanks to Fengyang Wang for clarifying this). Consider the following

julia> n = 200;
 
julia> A = rand(n,n);
 
julia> b = rand(n);
 
julia> @benchmark A\b
BenchmarkTools.Trial: 
  memory estimate:  316.23 KiB
  allocs estimate:  10
  --------------
  minimum time:     531.227 μs (0.00% GC)
  median time:      718.527 μs (0.00% GC)
  mean time:        874.044 μs (3.12% GC)
  maximum time:     95.515 ms (0.00% GC)
  --------------
  samples:          5602
  evals/sample:     1

This is fine, but since A and b are globals (remember, if it ain’t wrapped in a function, it’s a global when you’re working from the REPL), we’re also measuring the time dynamic dispatch takes. Dynamic dispatch happens here, because “Julia” cannot be sure what the types of A and b are when we invoke A\b since they’re globals. Instead, we should use interpolation of the non-constant variables, or mark them as constants using const A = rand(n,n) and const b = rand(20). Let us use interpolation.

julia> @benchmark $A\$b
BenchmarkTools.Trial: 
  memory estimate:  316.23 KiB
  allocs estimate:  10
  --------------
  minimum time:     531.746 μs (0.00% GC)
  median time:      717.269 μs (0.00% GC)
  mean time:        786.240 μs (3.22% GC)
  maximum time:     12.463 ms (0.00% GC)
  --------------
  samples:          6230
  evals/sample:     1

We see that the memory information is identical to the information we got from the other macros, but we now get a much more robust estimate of the time it takes to solve our A\b problem. We also see that dynamic dispatch was negligible here, as the solution takes much longer to compute than for Julia to figure out which method to call. The @benchmark macro will do various things automatically, to try to give as accurate results as possible. It is also possible to provide custom tuning parameters, say if you’re running these benchmarks over an extended period of time and want to track performance regressions, but that is beyond this blog post.

Dynamic dispatch

Before we conclude, let’s have a closer look at the significance of dynamic dispatch. When using globals it has to be determined at run time which method to call. If there a few methods, this may not be a problem, but the problem begins to show itself when a function has a lot of methods. For example, on Julia v0.5.0, identity has one method, but + has 291 methods. Can we measure the significance of dynamic dispatch then? Sure. Just benchmark with, and without interpolation (thanks again to Fengyang Wang for cooking up this example). To keep output from being too verbose, we’ll use the @btime macro – again from BenchmarkTools.jl.

julia> x = 0
0
 
julia> @btime identity(x)
  1.540 ns (0 allocations: 0 bytes)
0
 
julia> @btime +x
  15.837 ns (0 allocations: 0 bytes)
0
 
julia> @btime identity($x)
  1.540 ns (0 allocations: 0 bytes)
0
 
julia> @btime +$x
  1.548 ns (0 allocations: 0 bytes)
0

As we can see, calling + on the global x takes around 10 times a long as the single method function identity. To show that declaring the input a const and interpolating the variable gives the same result, consider the example below.

julia> const y = 0
0
 
julia> @btime identity(y)
  1.539 ns (0 allocations: 0 bytes)
0
 
julia> @btime +y
  1.540 ns (0 allocations: 0 bytes)
0
 
julia> @btime identity($y)
  1.540 ns (0 allocations: 0 bytes)
0
 
julia> @btime +$y
  1.540 ns (0 allocations: 0 bytes)
0

We see that interpolation is not needed, as long as we remember to use constants.

Conclusion

There are quite a few ways of measuring performance in Julia. I’ve presented some of them here, and hopefully you’ll be able to put the tools to good use. The functionality from Base is good for many purposes, but I really like the nested time measuring in TimerOutputs.jl a lot, and for serious benchmarking it is impossible to ignore BenchmarkTools.jl.

Deep Learning on the New Ubuntu-Based Data Science Virtual Machine for Linux

Authored by Paul Shealy, Senior Software Engineer, and Gopi Kumar, Principal Program Manager, at Microsoft.

Deep learning has received significant attention recently for its ability to create machine learning models with very high accuracy. It’s especially popular in image and speech recognition tasks, where the availability of massive datasets with rich information make it feasible to train ever-larger neural networks on powerful GPUs and achieve groundbreaking results. Although there are a variety of deep learning frameworks available, getting started with one means taking time to download and install the framework, libraries, and other tools before writing your first line of code.

Microsoft’s Data Science Virtual Machine (DSVM) is a family of popular VM images published on the Azure marketplace with a broad choice of machine learning and data science tools. Microsoft is extending it with the introduction of a brand-new offering in this family – the Data Science Virtual Machine for Linux, based on Ubuntu 16.04LTS – that also includes a comprehensive set of popular deep learning frameworks.

Deep learning frameworks in the new VM include:

  • Microsoft Cognitive Toolkit
  • Caffe and Caffe2
  • TensorFlow
  • H2O
  • MXNet
  • NVIDIA DIGITS
  • Theano
  • Torch, including PyTorch
  • Keras

The image can be deployed on VMs with GPUs or CPU-only VMs. It also includes OpenCV, matplotlib and many other libraries that you will find useful.

Run dsvm-more-info at a command prompt or visit the documentation for more information about these frameworks and how to get started.

Sample Jupyter notebooks are included for most frameworks. Start Jupyter or log in to JupyterHub to browse the samples for an easy way to explore the frameworks and get started with deep learning.

GPU Support

Training a deep neural network requires considerable computational resources, so things can be made significantly faster by running on one or more GPUs. Azure now offers NC-class VM sizes with 1-4 NVIDIA K80 GPUs for computational workloads. All deep learning frameworks on the VM are compiled with GPU support, and the NVIDIA driver, CUDA and cuDNN are included. You may also choose to run the VM on a CPU if you prefer, and that is supported without code changes. And because this is running on Azure, you can choose a smaller VM size for setup and exploration, then scale up to one or more GPUs for training.

The VM comes with nvidia-smi to monitor GPU usage during training and help optimize parameters to make full use of the GPU. It also includes NVIDIA Docker if you want to run Docker containers with GPU access.

Data Science Virtual Machine

The Data Science Virtual Machine family of VM images on Azure includes the DSVM for Windows, a CentOS-based DSVM for Linux, and an Ubuntu-based DSVM for Linux. These images come with popular data science and machine learning tools, including Microsoft R Server Developer Edition, Microsoft R Open, Anaconda Python, Julia, Jupyter notebooks, Visual Studio Code, RStudio, xgboost, and many more. A full list of tools for all editions of the DSVM is available here. The DSVM has proven popular with data scientists as it helps them focus on their tasks and skip mundane steps around tool installation and configuration.


To try deep learning on Windows with GPUs, the Deep Learning Toolkit for DSVM contains all tools from the Windows DSVM plus GPU drivers, CUDA, cuDNN, and GPU versions of CNTK, MXNet, and TensorFlow.

Get Started Today

We invite you to use the new image to explore deep learning frameworks or for your machine learning and data science projects – DSVM for Linux (Ubuntu) is available today through the Marketplace. Free Azure credits are available to help get you started.

Paul & Gopi