Tag Archives: CUDA

Julia GPU Programming with WSL2

By: Fabian Becker

Re-posted from: https://geekmonkey.org/julia-gpu-programming-with-wsl2/

Julia GPU Programming with WSL2

I use Windows for gaming. It's been a long time since I've last done any serious development work on my Windows machine and yet I still spent a good chunk of money on building out a beefy machine for my efforts to learn machine learning. It has taken me a few months to finally sit down and get this machine ready for anything other than gaming.

With the recent announcement of GUI support for WSLg I got really excited to try out WSL and see how good the GPU support actually is, but that's not the main reason. I've been shying away from developing on Windows because I'm used to a *NIX environment. WSL gives you that, but up until recently you wouldn't have been able to interact with any GPU – and this all changed with this announcement!

You can watch the video below to see what's coming for WSL2.

The first half of this article will show you how to get everything set up and in the second half we'll set up CUDA.jl in Julia. Since the latter part is about CUDA I'll assume that you have a compatible nVidia GPU.

Installation

Here's a summary of what we need to go through to prepare our environment:

  • Update Windows 10 to latest release on the dev channel
  • Install nVidia CUDA drivers
  • Install Ubuntu 20.20 in WSL2
  • Install Linux CUDA packages
  • 🎉

Windows 10 Insider Preview

At the time of writing all of the features are only available through the Windows Insider Program. The Windows Insider Program allows you to receive new Windows features before the hit the main update line. The program is split into three channels: Dev, Beta and Release Preview.

To receive the update with WSLg and GPU support we will need to switch to the dev channel.

Julia GPU Programming with WSL2

Note: The dev channel comes with some rough edges and potential for system instability. Be mindful of this when you switch and make sure you have backups!

After installing all downloaded updates you should end up with OS Build 21364 or higher. You can check your OS Build by running winver in PowerShell/cmd.

Julia GPU Programming with WSL2
Run winver using the Windows Run command

With this all set we can hop on to install the latest WSL2 compatible CUDA drivers.

CUDA drivers

NVIDIA are providing special CUDA drivers for Windows 10 WSL. The link below will take you to the download page. It's required to sign up for the NVIDIA Developer Program, which is free.

GPU in Windows Subsystem for Linux (WSL)
CUDA on Windows Subsystem for Linux (WSL) – Public Preview Microsoft Windows is a ubiquitous platform for enterprise, business, and personal computing systems. However, industry AI tools, models, frameworks, and libraries are predominantly available on Linux OS. Now all users of AI – whether they ar…
Julia GPU Programming with WSL2

Follow the setup wizard (I chose the express installation which keeps existing settings in place).

Note: There's also a documentation page provided by NVIDIA around setting up your GPU for WSL. I found the docs to be outdated and not working on my machine.

Installing Ubuntu 20.04 LTS with WSL2

Before we can proceed with installing Ubuntu I advise to update the WSL kernel by running:

wsl --update

In case you're like me and don't enjoy the default terminal Windows comes with I suggest you install Windows Terminal from the Microsoft Store. This terminal comes is a lot more pleasant to use than either cmd or the PowerShell terminal ever were.

It's also a good idea to set WSL to default to version 2:

wsl --set-default-version 2

Finally let's install Ubuntu with:

wsl --install --distribution Ubuntu-20.04

In case you were wondering what other distributions are available you can simply run: wsl --list --online.

Ubuntu 20.04 LTS

With Ubuntu installed we have a couple of final steps. First we will add an upstream repo to apt for getting the latest CUDA builds directly from NVIDIA:

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
Add CUDA apt repository for Ubuntu 20.04

We also need to add NVIDIA's GPG key for the apt repo:

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub 
Add nVidia gpg key

And finally to make sure that we prefer the packages provided by NVIDIA over packages in mainline Ubuntu we need to pin the apt repo:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
Pin file to prioritize CUDA apt repository

With this out of the way, we're ready to install the CUDA drivers inside our WSL Ubuntu installation:

 sudo apt update && sudo apt install -y cuda-drivers

In case setting up WSL2 with GPU support was all you wanted to do – we're done!

Julia + WSL2 + Cuda.jl

Ok, so as promised in the headline we're now going to install Julia inside our Ubuntu 20.04 and set up Cuda.jl.

Install Julia

At the time of writing Julia 1.6.1 is the latest version available (make sure to check for updates on Julia's download page).

First let's fetch the latest Julia tarball:

wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.1-linux-x86_64.tar.gz

Extract the .tar.gz:

tar -xvzf julia-1.6.1-linux-x86_64.tar.gz

Move the extracted folder to /opt:

sudo mv -r julia-1.6.1 /opt/

Finally, create a symbolic link to julia inside the /usr/local/bin folder:

sudo ln -s /opt/julia-1.6.1/bin/julia /usr/local/bin/julia

You may pick a different target directory for your installation of Julia or use a version manager like asdf-vm.

Install Cuda.jl

At this point simply run julia in your terminal and you should be dropped into the Julia REPL. I assume you've worked with Julia before and know how to operate it's package manager.

Hit ] to enter pkg mode and install CUDA with:

activate --temp
add CUDA

From here hit backspace and import CUDA. CUDA.jl provides a useful function called functional which will confirm that we've done everything right (well, that's the hope at least, right?).

using CUDA
CUDA.functional()
Julia GPU Programming with WSL2
Ensuring Julia CUDA.jl is functional inside WSL2

You can additionally run CUDA.versioninfo() to get a more detailed breakdown of the supported features on your GPU.

At this point you should have a working installation with WSL2, Ubuntu 20.04, Julia and CUDA.jl. In case you're new to CUDA.jl I suggest you follow the excellent introduction to GPU programming by JuliaGPU or jump in at the deep end with FluxML's GPU support.


If you like articles like this one, please consider subscribing to my free newsletter where at least once a week I send out my latest work covering Julia, Python, Machine Learning and other tech.

You can also follow me on Twitter.

Julia GPU Programming with WSL2

By: Fabian Becker

Re-posted from: https://geekmonkey.org/julia-gpu-programming-with-wsl2/

Julia GPU Programming with WSL2

I use Windows for gaming. It's been a long time since I've last done any serious development work on my Windows machine and yet I still spent a good chunk of money on building out a beefy machine for my efforts to learn machine learning. It has taken me a few months to finally sit down and get this machine ready for anything other than gaming.

With the recent announcement of GUI support for WSLg I got really excited to try out WSL and see how good the GPU support actually is, but that's not the main reason. I've been shying away from developing on Windows because I'm used to a *NIX environment. WSL gives you that, but up until recently you wouldn't have been able to interact with any GPU – and this all changed with this announcement!

You can watch the video below to see what's coming for WSL2.

The first half of this article will show you how to get everything set up and in the second half we'll set up CUDA.jl in Julia. Since the latter part is about CUDA I'll assume that you have a compatible nVidia GPU.

Installation

Here's a summary of what we need to go through to prepare our environment:

  • Update Windows 10 to latest release on the dev channel
  • Install nVidia CUDA drivers
  • Install Ubuntu 20.04 in WSL2
  • Install Linux CUDA packages
  • 🎉

Windows 10 Insider Preview

At the time of writing all of the features are only available through the Windows Insider Program. The Windows Insider Program allows you to receive new Windows features before the hit the main update line. The program is split into three channels: Dev, Beta and Release Preview.

To receive the update with WSLg and GPU support we will need to switch to the dev channel.

Julia GPU Programming with WSL2

Note: The dev channel comes with some rough edges and potential for system instability. Be mindful of this when you switch and make sure you have backups!

After installing all downloaded updates you should end up with OS Build 21364 or higher. You can check your OS Build by running winver in PowerShell/cmd.

Julia GPU Programming with WSL2
Run winver using the Windows Run command

With this all set we can hop on to install the latest WSL2 compatible CUDA drivers.

CUDA drivers

NVIDIA are providing special CUDA drivers for Windows 10 WSL. The link below will take you to the download page. It's required to sign up for the NVIDIA Developer Program, which is free.

GPU in Windows Subsystem for Linux (WSL)
CUDA on Windows Subsystem for Linux (WSL) – Public Preview Microsoft Windows is a ubiquitous platform for enterprise, business, and personal computing systems. However, industry AI tools, models, frameworks, and libraries are predominantly available on Linux OS. Now all users of AI – whether they ar…
Julia GPU Programming with WSL2

Follow the setup wizard (I chose the express installation which keeps existing settings in place).

Note: There's also a documentation page provided by NVIDIA around setting up your GPU for WSL. I found the docs to be outdated and not working on my machine.

Installing Ubuntu 20.04 LTS with WSL2

Before we can proceed with installing Ubuntu I advise to update the WSL kernel by running:

wsl --update

In case you're like me and don't enjoy the default terminal Windows comes with I suggest you install Windows Terminal from the Microsoft Store. This terminal comes is a lot more pleasant to use than either cmd or the PowerShell terminal ever were.

It's also a good idea to set WSL to default to version 2:

wsl --set-default-version 2

Finally let's install Ubuntu with:

wsl --install --distribution Ubuntu-20.04

In case you were wondering what other distributions are available you can simply run: wsl --list --online.

Ubuntu 20.04 LTS

With Ubuntu installed we have a couple of final steps. First we will add an upstream repo to apt for getting the latest CUDA builds directly from NVIDIA:

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
Add CUDA apt repository for Ubuntu 20.04

We also need to add NVIDIA's GPG key for the apt repo:

sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub 
Add nVidia gpg key

And finally to make sure that we prefer the packages provided by NVIDIA over packages in mainline Ubuntu we need to pin the apt repo:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
Pin file to prioritize CUDA apt repository

With this out of the way, we're ready to install the CUDA drivers inside our WSL Ubuntu installation:

 sudo apt update && sudo apt install -y cuda-drivers

In case setting up WSL2 with GPU support was all you wanted to do – we're done!

Julia + WSL2 + Cuda.jl

Ok, so as promised in the headline we're now going to install Julia inside our Ubuntu 20.04 and set up Cuda.jl.

Install Julia

At the time of writing Julia 1.6.1 is the latest version available (make sure to check for updates on Julia's download page).

First let's fetch the latest Julia tarball:

wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.1-linux-x86_64.tar.gz

Extract the .tar.gz:

tar -xvzf julia-1.6.1-linux-x86_64.tar.gz

Move the extracted folder to /opt:

sudo mv -r julia-1.6.1 /opt/

Finally, create a symbolic link to julia inside the /usr/local/bin folder:

sudo ln -s /opt/julia-1.6.1/bin/julia /usr/local/bin/julia

You may pick a different target directory for your installation of Julia or use a version manager like asdf-vm.

Install Cuda.jl

At this point simply run julia in your terminal and you should be dropped into the Julia REPL. I assume you've worked with Julia before and know how to operate it's package manager.

Hit ] to enter pkg mode and install CUDA with:

activate --temp
add CUDA

From here hit backspace and import CUDA. CUDA.jl provides a useful function called functional which will confirm that we've done everything right (well, that's the hope at least, right?).

using CUDA
CUDA.functional()
Julia GPU Programming with WSL2
Ensuring Julia CUDA.jl is functional inside WSL2

You can additionally run CUDA.versioninfo() to get a more detailed breakdown of the supported features on your GPU.

At this point you should have a working installation with WSL2, Ubuntu 20.04, Julia and CUDA.jl. In case you're new to CUDA.jl I suggest you follow the excellent introduction to GPU programming by JuliaGPU or jump in at the deep end with FluxML's GPU support.


If you like articles like this one, please consider subscribing to my free newsletter where at least once a week I send out my latest work covering Julia, Python, Machine Learning and other tech.

You can also follow me on Twitter.

Neural Jump SDEs (Jump Diffusions) and Neural PDEs

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/neural-jump-sdes-jump-diffusions-and-neural-pdes/

This is just an exploration of some new neural models I decided to jot down for safe keeping. DiffEqFlux.jl gives you the differentiable programming tools to allow you to use any DifferentialEquations.jl problem type (DEProblem) mixed with neural networks. We demonstrated this before, not just with neural ordinary differential equations, but also with things like neural stochastic differential equations and neural delay differential equations.

At the time we made DiffEqFlux, we were the “first to the gate” for many of these differential equations types and left it as an open question for people to find a use for these tools. And judging by the Arxiv papers that went out days after NeurIPS submissions were due, it looks like people now have justified some machine learning use cases for them. There were two separate papers on neural stochastic differential equations, showing them to be the limit of deep latent Gaussian models. Thus when you stick these new mathematical results on our existing adaptive high order GPU-accelerated neural SDE solvers, you get some very interesting and fast ways to learn some of the most cutting edge machine learning methods.

So I wanted to help you guys out with staying one step ahead of the trend by going to the next differential equations. One of the interesting NeurIPS-timed Arxiv papers was on jump ODEs. Following the DiffEqFlux.jl spirit, you can just follow the DifferentialEquations.jl tutorials on these problems, implement them, add a neural network, and it will differentiate through them. So let’s take it one step further and show an example of how you’d do that. I wanted to take a look at jump diffusions, or jump stochastic differential equations, which are exactly what they sound like. They are a mixture of these two methods. After that, I wanted to show how using some methods for stiff differential equations plus a method of lines discretization gives a way to train neural partial differential equations.

Instead of being fully defined by neural networks, I will also be showcasing how you can selectively make parts of a differential equation neuralitized and other parts pre-defined, something we’ve been calling mixed neural differential equations, so we’ll demonstrate a mixed neural jump stochastic differential equation and a mixed neural partial differential equation with fancy GPU-accelerated adaptive etc. methods. I’ll then leave as homework how to train a mixed neural jump stochastic partial differential equation with the fanciest methods, which should be easy to see from this blog post (so yes, that will be the MIT 18.337 homework). This blog post will highlight that these equations are all already possible within our framework, and will also show the specific places we see that we need to accelerate to really put these types of models into production.

Neural Jump Stochastic Differential Equations (Jump Diffusions)

To get to jump diffusions, let’s start with a stochastic differential equation. A stochastic differential equation is defined via

dXt=f(t,Xt)dt+g(t,Xt)dWtdXt=f(t,Xt)dt+g(t,Xt)dWt

which is essentially saying that there is a deterministic term ff and a continuous randomness term gg driven by a Brownian motion. Theorems like Donsker’s theorem can be thought of as a generalization of the central limit theorem, saying that continuous stochastic processes of some large class can be reinterpreted as this kind of process (due to the Gaussian-ness of Brownian motion), so in some sense this is a very large encompassing class. If you haven’t seen the previous blog post which mentions how to define neural SDEs, please check that out now. Let’s start with a code that uses reverse-mode automatic differentiation through a GPU-accelerated high order adaptive SDE solver. The code looks like:

using Flux, DiffEqFlux, StochasticDiffEq, Plots, DiffEqMonteCarlo
 
u0 = Float32[2.; 0.] |> gpu
datasize = 30
tspan = (0.0f0,1.0f0)
 
function trueODEfunc(du,u,p,t)
    true_A = [-0.1 2.0; -2.0 -0.1] |> gpu
    du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
mp = Float32[0.2,0.2] |> gpu
function true_noise_func(du,u,p,t)
    du .= mp.*u
end
prob = SDEProblem(trueODEfunc,true_noise_func,u0,tspan)
 
# Take a typical sample from the mean
monte_prob = MonteCarloProblem(prob)
monte_sol = solve(monte_prob,SOSRI(),num_monte = 100)
monte_sum = MonteCarloSummary(monte_sol)
sde_data = Array(timeseries_point_mean(monte_sol,t))
 
dudt = Chain(x -> x.^3,
             Dense(2,50,tanh),
             Dense(50,2)) |> gpu
ps = Flux.params(dudt)
n_sde = x->neural_dmsde(dudt,x,mp,tspan,SOSRI(),saveat=t,reltol=1e-1,abstol=1e-1)
 
pred = n_sde(u0) # Get the prediction using the correct initial condition
 
dudt_(u,p,t) = Flux.data(dudt(u))
g(u,p,t) = mp.*u
nprob = SDEProblem(dudt_,g,u0,(0.0f0,1.2f0),nothing)
 
monte_nprob = MonteCarloProblem(nprob)
monte_nsol = solve(monte_nprob,SOSRI(),num_monte = 100)
monte_nsum = MonteCarloSummary(monte_nsol)
#plot(monte_nsol,color=1,alpha=0.3)
p1 = plot(monte_nsum, title = "Neural SDE: Before Training")
scatter!(p1,t,sde_data',lw=3)
 
scatter(t,sde_data[1,:],label="data")
scatter!(t,Flux.data(pred[1,:]),label="prediction")
 
function predict_n_sde()
  n_sde(u0)
end
loss_n_sde1() = sum(abs2,sde_data .- predict_n_sde())
loss_n_sde10() = sum([sum(abs2,sde_data .- predict_n_sde()) for i in 1:10])
Flux.back!(loss_n_sde1())
 
data = Iterators.repeated((), 10)
opt = ADAM(0.025)
cb = function () #callback function to observe training
  sample = predict_n_sde()
  # loss against current data
  display(sum(abs2,sde_data .- sample))
  # plot current prediction against data
  cur_pred = Flux.data(sample)
  pl = scatter(t,sde_data[1,:],label="data")
  scatter!(pl,t,cur_pred[1,:],label="prediction")
  display(plot(pl))
end
 
# Display the SDE with the initial parameter values.
cb()
 
Flux.train!(loss_n_sde1 , ps, Iterators.repeated((), 100), opt, cb = cb)
Flux.train!(loss_n_sde10, ps, Iterators.repeated((), 100), opt, cb = cb)
 
dudt_(u,p,t) = Flux.data(dudt(u))
g(u,p,t) = mp.*u
nprob = SDEProblem(dudt_,g,u0,(0.0f0,1.2f0),nothing)
 
monte_nprob = MonteCarloProblem(nprob)
monte_nsol = solve(monte_nprob,SOSRI(),num_monte = 100)
monte_nsum = MonteCarloSummary(monte_nsol)
#plot(monte_nsol,color=1,alpha=0.3)
p2 = plot(monte_nsum, title = "Neural SDE: After Training", xlabel="Time")
scatter!(p2,t,sde_data',lw=3,label=["x" "y" "z" "y"])
 
plot(p1,p2,layout=(2,1))
 
savefig("neural_sde.pdf")
savefig("neural_sde.png")

This just uses the diffeq_rd layer function to tell Flux to use reverse-mode AD (using Tracker.jl, unless you check out a bunch of weird Zygote.jl branches: wait for Zygote) and then trains the neural network using a discrete adjoint. While the previously posted example uses forward-mode, we have found that this is much much faster on neural SDEs, so if you’re trying to train them, I would recommend using this code instead (and I’ll get the examples updated).

Now to this equation let’s add jumps. A jump diffusion is defined like:

du=f(u,p,t)dt+gi(u,t)dWi+ci(u,p,t)dpidu=f(u,p,t)dt+gi(u,t)dWi+ci(u,p,t)dpi

where dpidpi are the jump terms. The jump terms differ from the Brownian terms because they are non-continuous: they are zero except at countably many time points where you “hit” the equation with an amount ci(u,p,t)ci(u,p,t). The timing at which these occur is based on an internal rate λiλi of the jump dpidpi.

Jump diffusions are important because, just as there is a justification for the universality of stochastic differential equations, there is a justification here as well. The Levy Decomposition says that essentially any Markov process can be decomposed into something of this form. They also form the basis for many financial models, because for example changing regimes into a recession isn’t gradual but rather sudden. Models like Merton’s model thus use these as an essential tool in quantitative finance. So let’s train a neural network on that!

What we have to do is define jump processes and append them onto an existing differential equation. The documentation shows how to use the different jump definitions along with their pros and cons, so for now we will use ContinuousRateJump. Let’s define a ContinuousRateJump which has a constant rate and a neural network that decides what the effect of the jump (ci(u,p,t)ci(u,p,t)) will be. To do this, you’d simply put the neural network in there:

rate(u,p,t) = 2.0
affect!(integrator) = (integrator.u = dudt2(integrator.u))
jump = ConstantRateJump(rate,affect!)

where dudt2 is another neural network, and then wrap that into a jump problem:

prob = SDEProblem(dudt_,g,gpu(param(x)),tspan,nothing)
jump_prob = JumpProblem(prob,Direct(),jump,save_positions=(false,false))

And of course you can make this fancier: just replace that rate 2.0 with another neural network, make the g(u,p,t) term also have a neural network, etc.: explore this as you wish and go find some cool stuff. Let’s just stick with this as our example though, but please go ahead and make these changes and allow DiffEqFlux.jl to help you to explore your craziest mathematical idea!

Now when you solve this, the jumps also occur along with the stochastic differential equation. To show what that looks like, let’s define a jump diffusion and solve it 100 times, taking its mean as our training data:

using Flux, DiffEqFlux, StochasticDiffEq, Plots, DiffEqMonteCarlo,
    DiffEqJump
 
u0 = Float32[2.; 0.]
datasize = 30
tspan = (0.0f0,1.0f0)
 
function trueODEfunc(du,u,p,t)
  true_A = [-0.1 2.0; -2.0 -0.1]
  du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
const mp = Float32[0.2,0.2]
function true_noise_func(du,u,p,t)
  du .= mp.*u
end
 
true_rate(u,p,t) = 2.0
true_affect!(integrator) = (integrator.u[1] = integrator.u[1]/2)
true_jump = ConstantRateJump(true_rate,true_affect!)
prob = SDEProblem(trueODEfunc,true_noise_func,u0,tspan)
jump_prob = JumpProblem(prob,Direct(),true_jump,save_positions=(false,false))
 
# Take a typical sample from the mean
monte_prob = MonteCarloProblem(jump_prob)
monte_sol = solve(monte_prob,SOSRI(),num_monte = 100,parallel_type=:none)
plot(monte_sol,title="Training Data")
 
monte_sum = MonteCarloSummary(monte_sol)
sde_data = Array(timeseries_point_mean(monte_sol,t))

From the plot you can see wild discontinuities mixed in with an equation with continuous randomness. Just lovely.

A full code for training a neural jump diffusion thus is:

using Flux, DiffEqFlux, StochasticDiffEq, Plots, DiffEqMonteCarlo,
    DiffEqJump
 
u0 = Float32[2.; 0.] |> gpu
datasize = 30
tspan = (0.0f0,1.0f0)
 
function trueODEfunc(du,u,p,t)
  true_A = [-0.1 2.0; -2.0 -0.1] |> gpu
  du .= ((u.^3)'true_A)'
end
t = range(tspan[1],tspan[2],length=datasize)
const mp = Float32[0.2,0.2] |> gpu
function true_noise_func(du,u,p,t)
  du .= mp.*u
end
 
true_rate(u,p,t) = 2.0
true_affect!(integrator) = (integrator.u[1] = integrator.u[1]/2)
true_jump = ConstantRateJump(true_rate,true_affect!)
prob = SDEProblem(trueODEfunc,true_noise_func,u0,tspan)
jump_prob = JumpProblem(prob,Direct(),true_jump,save_positions=(false,false))
 
# Take a typical sample from the mean
monte_prob = MonteCarloProblem(jump_prob)
monte_sol = solve(monte_prob,SOSRI(),num_monte = 100,parallel_type=:none)
monte_sum = MonteCarloSummary(monte_sol)
sde_data = Array(timeseries_point_mean(monte_sol,t))
 
dudt = Chain(x -> x.^3,
           Dense(2,50,tanh),
           Dense(50,2)) |> gpu
dudt2 = Chain(Dense(2,50,tanh),
            Dense(50,2)) |> gpu
ps = Flux.params(dudt,dudt2)
 
g(u,p,t) = mp.*u
n_sde = function (x)
    dudt_(u,p,t) = dudt(u)
    rate(u,p,t) = 2.0
    affect!(integrator) = (integrator.u = dudt2(integrator.u))
    jump = ConstantRateJump(rate,affect!)
    prob = SDEProblem(dudt_,g,param(x),tspan,nothing)
    jump_prob = JumpProblem(prob,Direct(),jump,save_positions=(false,false))
    solve(jump_prob, SOSRI(); saveat=t ,abstol = 0.1, reltol = 0.1) |> Tracker.collect
end
 
pred = n_sde(u0) # Get the prediction using the correct initial condition
 
dudt__(u,p,t) = Flux.data(dudt(u))
rate__(u,p,t) = 2.0
affect!__(integrator) = (integrator.u = Flux.data(dudt2(integrator.u)))
jump = ConstantRateJump(rate__,affect!__)
nprob = SDEProblem(dudt__,g,u0,(0.0f0,1.0f0),nothing)
njump_prob = JumpProblem(prob,Direct(),jump, save_positions = (false,false))
 
monte_nprob = MonteCarloProblem(njump_prob)
monte_nsol = solve(monte_nprob,SOSRI(),num_monte = 1000,parallel_type=:none, abstol = 0.1, reltol = 0.1)
monte_nsum = MonteCarloSummary(monte_nsol)
 
#plot(monte_nsol,color=1,alpha=0.3)
p1 = plot(monte_nsum, title = "Neural Jump Diffusion: Before Training")
scatter!(p1,t,sde_data',lw=3)
 
scatter(t,sde_data[1,:],label="data")
scatter!(t,Flux.data(pred[1,:]),label="prediction")
 
function predict_n_sde()
    n_sde(u0)
end
 
loss_n_sde1() = sum(abs2,sde_data .- predict_n_sde())
 function loss_n_sde100()
    loss = sum([sum(abs2,sde_data .- predict_n_sde()) for i in 1:100])
    @show loss
    loss
end
function loss_n_sde500()
    loss = sum([sum(abs2,sde_data .- predict_n_sde()) for i in 1:500])
    @show loss
    loss
end 
Flux.back!(loss_n_sde1())
 
data = Iterators.repeated((), 10)
opt = ADAM(0.025)
cb = function () #callback function to observe training
    sample = predict_n_sde()
    # loss against current data
    display(sum(abs2,sde_data .- sample))
    # plot current prediction against data
    cur_pred = Flux.data(sample)
    pl = scatter(t,sde_data[1,:],label="data")
    scatter!(pl,t,cur_pred[1,:],label="prediction")
    display(plot(pl))
end
 
# Display the SDE with the initial parameter values.
cb()
 
Flux.train!(loss_n_sde1 , ps, Iterators.repeated((), 100), opt, cb = cb)

Notice how it’s almost exactly the same as the SDE code but with the definition of the jumps. You still get the same high order adaptive GPU-accelerated (choice of implicit, etc.) SDE solvers, but now to this more generalized class of problems. Using the GPU gives a good speedup in the neural network case, but slows it down quite a bit when generating the training data since it’s not very parallel. Finding out new ways to use GPUs is one thing I am interested in perusing here. Additionally, using a lower tolerance StackOverflows Tracker.jl, which is something we have fixed with Zygote.jl and will be coming to releases once Zygote.jl on the differential equation solvers is more robust. Lastly, the plotting with GPU-based arrays is wonky right now, we’ll need to make the interface a little bit nicer. However, this is a proof of concept that this stuff does indeed work, though it takes awhile to train it to a “decent” loss (way more than the number of repetitions showcased in here).

[Note: you need to add using CuArrays to enable the GPU support. I turned it off by default because I was training this on my dinky laptop :)]

Neural Partial Differential Equations

Now let’s do a neural partial differential equation (PDE). We can start by pulling code from this older blog post on solving systems of stochastic partial differential equations with GPUs. Here I’m going to strip the stochastic part off, simply because I want to train this on my laptop before the flight ends, so again I’ll leave it as an exercise to do the same jump diffusion treatment to this PDE. Let’s start by defining the method of lines discretization for our PDE. If you don’t know what that is, please go read that blog post on defining SPDEs. What happens is the discretization gives you a set of ODEs to solve, which looks like:

using OrdinaryDiffEq, RecursiveArrayTools, LinearAlgebra,
      DiffEqOperators, Flux, CuArrays
 
# Define the constants for the PDE
const α₂ = 1.0f0
const α₃ = 1.0f0
const β₁ = 1.0f0
const β₂ = 1.0f0
const β₃ = 1.0f0
const r₁ = 1.0f0
const r₂ = 1.0f0
const D = 100.0f0
const γ₁ = 0.1f0
const γ₂ = 0.1f0
const γ₃ = 0.1f0
const N = 100
const X = reshape([i for i in 1:N for j in 1:N],N,N) |> gpu
const Y = reshape([j for i in 1:N for j in 1:N],N,N) |> gpu
const α₁ = 1.0f0.*(X.>=80)
 
const Mx = Array(Tridiagonal([1.0f0 for i in 1:N-1],[-2.0f0 for i in 1:N],[1.0f0 for i in 1:N-1])) |> gpu
const My = copy(Mx)
Mx[2,1] = 2.0
Mx[end-1,end] = 2.0
My[1,2] = 2.0
My[end,end-1] = 2.0
 
# Define the initial condition as normal arrays
u0 = rand(Float32,N,N,3) |> gpu
const MyA = zeros(Float32,N,N) |> gpu
const AMx = zeros(Float32,N,N) |> gpu
const DA = zeros(Float32,N,N) |> gpu
 
# Define the discretized PDE as an ODE function
function f(_du,_u,p,t)
  u = reshape(_u,N,N,3)
  du= reshape(_du,N,N,3)
  A = @view u[:,:,1]
  B = @view u[:,:,2]
  C = @view u[:,:,3]
  dA = @view du[:,:,1]
  dB = @view du[:,:,2]
  dC = @view du[:,:,3]
  mul!(MyA,My,A)
  mul!(AMx,A,Mx)
  @. DA = D*(MyA + AMx)
  @. dA = DA + α₁ - β₁*A - r₁*A*B + r₂*C
  @. dB = α₂ - β₂*B - r₁*A*B + r₂*C
  @. dC = α₃ - β₃*C + r₁*A*B - r₂*C
end
 
# Solve the ODE
prob = ODEProblem(f,vec(u0),(0.0f0,100.0f0))
@time sol = solve(prob,BS3(),  progress=true,saveat = 5.0)
@time sol = solve(prob,ROCK2(),progress=true,saveat = 5.0)
 
 
using Plots; pyplot()
p1 = surface(X,Y,reshape(sol[end],N,N,3)[:,:,1],title = "[A]")
p2 = surface(X,Y,reshape(sol[end],N,N,3)[:,:,2],title = "[B]")
p3 = surface(X,Y,reshape(sol[end],N,N,3)[:,:,3],title = "[C]")
plot(p1,p2,p3,layout=grid(3,1))
savefig("neural_pde_training_data.png")
 
using DiffEqFlux, Flux
 
u0 = param(u0)
tspan = (0.0f0,100.0f0)
 
ann = Chain(Dense(3,50,tanh), Dense(50,3)) |> gpu
p1 = DiffEqFlux.destructure(ann)
ps = Flux.params(ann)
 
_ann = (u,p) -> reshape(p[3*50+51 : 2*3*50+50],3,50)*
                    tanh.(reshape(p[1:3*50],50,3)*u + p[3*50+1:3*50+50]) + p[2*3*50+51:end]
 
function dudt_(_u,p,t)
  u = reshape(_u,N,N,3)
  A = u[:,:,1]
  DA = D .* (A*Mx + My*A)
  _du = mapslices(x -> _ann(x,p),u,dims=3) |> gpu
  du = reshape(_du,N,N,3)
  x = vec(cat(du[:,:,1]+DA,du[:,:,2],du[:,:,3],dims=3))
end
 
prob = ODEProblem(dudt_,vec(Flux.data(u0)),tspan,Flux.data(p1))
@time diffeq_fd(p1,Array,length(u0)*length(0.0f0:5.0f0:100.0f0),prob,ROCK2(),progress=true,
                saveat=0.0f0:5.0f0:100.0f0)
 
function predict_fd()
  diffeq_fd(p1,Array,length(u0)*length(0.0f0:5.0f0:100.0f0),prob,ROCK2(),progress=true,
                  saveat=0.0f0:5.0f0:100.0f0)
end
 
function loss_fd()
  _sol = predict_fd()
  loss = sum(abs2,Array(sol) .- _sol)
  @show loss
  loss
end
loss_fd()
 
data = Iterators.repeated((), 10)
opt = ADAM(0.025)
 
Flux.train!(loss_fd, ps, data, opt)

The interesting part of this neural differential equation is the local/global aspect of parts. The mapslices call makes it so that way there’s a local nonlinear function of 3 variables applied at each point in space. While it keeps the neural network small, this currently does not do well with reverse-mode automatic differentiation or GPUs. That isn’t a major problem here because, since the neural network is kept small in this architecture, the number of parameters is also quite small. That said, reverse-mode AD will be required for fast adjoint passes, so this is still a work in progress / proof of concept, with a very specific point made (all that’s necessary here is overloads to make mapslices work well).

One point that really came out of this was the ODE solver methods. The ROCK2 method is much faster when generating the training data and when running diffeq_fd. It was a difference of 3 minutes with ROCK2 vs 40 minutes with BS3 (on the CPU), showing how specialized methods really are the difference between the problem being solvable or not. The standard implicit methods like Rodas5 aren’t performing well here either since the 30,000×30,000 dense matrix, and I didn’t take the time to specify sparsity patterns or whatnot to actually make them viable competitors. So for the lazy neural ODE use with sparsity, ROCK2 seems like a very interesting option. This is a testament to our newest GSoC crew’s results since it’s one of the newer methods implemented by our student Deepesh Thakur. There are still a few improvements that need to be made to make the eigenvalue estimates more GPU-friendly as well, making this performance result soon carry over to GPUs as well (currently, the indexing in this part of the code gives it trouble, so a PR is coming probably in a week or so). Lastly, I’m not sure what’s a good picture for these kinds of things, so I’m going to have to think about how to represent a global neural PDE fit.

Conclusion

Have fun with this. There are still some rough edges, for example plotting is still a little wonky because all of the automatic DiffEq solution plotting seems to index, so the GPU-based arrays don’t like that (I’ll update that soon now that it’s becoming a standard part of the workflow). Use it as starter code and find some cool stuff. Note that the examples shown here are not the only ones that are possible. This all just uses Julia’s generic programming and differentiable programming infrastructure in order to automatically generate code that is compatible with GPUs and automatic differentiation, so it’s impossible for me to enumerate all of the possible combinations. That means there’s plenty of things to explore. These are very early preliminary results, but shows that these equations are all possible. These examples show some places where we want to continue accelerating by both improving the methods and their implementation details. I look forward to doing an update with Zygote soon.

CITATION:

 Christopher Rackauckas, Neural Jump SDEs (Jump Diffusions) and Neural PDEs, The Winnower6:e155975.53637 (2019). DOI:10.15200/winn.155975.53637

The post Neural Jump SDEs (Jump Diffusions) and Neural PDEs appeared first on Stochastic Lifestyle.