Author Archives: Josh Day

Performance Tips

By: Josh Day

Re-posted from: https://www.juliafordatascience.com/performance-tips/

Performance Tips

The Julia Language Docs have a great page on performance tips.  It's worth a read, but the writing is aimed more towards a computer scientist than a data scientist. Here we'll cover some easy performance wins.


We find the best way to code in Julia is to:

  1. Start with code that works (without worrying about performance).
  2. Find and replace bottlenecks.  

Speeding up bottlenecks is usually straightforward in Julia, often involving drop-in replacements (like using StaticArrays for small, fixed arrays).  There are, however, some performance gotchas you should avoid from the start so that you don't need to refactor code later on.

🏗️ Do your "heavy lifting" inside functions.

Any code that is performance critical or being benchmarked should be inside a function.  - Julia Docs

  • Try to compartmentalize the parts of the problem you are trying to solve into functions.  Resist the temptation to write long scripts.  

Avoid (non-constant) globals.

"Globals" are essentially any variable that isn't inside a function.  

  • If you are using a variable as a parameter inside a function, make it an argument, e.g.
# not great 
x = 1

f(y) = x + y

# much better 
f(x, y) = x + y
  • If the variable is a constant setting, declare it as const.  However, you won't be able to change its value later on.
const x = 1

f(y) = x + y

Concern yourself with "type stability".

Type stability is the idea that the return type of a function only depends on the types of the inputs, not the input values.  Here is an example of  a type-unstable function:

function f(x)
    x < 10 ? "string" : 1
end

f(0) == "string"

f(100) == 1

The issue with the function above is that Julia's compiler needs to handle multiple cases (output could be either a String or an Int).

Don't change the type of a variable inside a function

Try not to initialize a variable with a type that will change later because of some computation.  For example, starting with x = 1 but then performing division x = x/2 will change x from Int to Float64.

Avoid abstract types in containers and structs.

When you use abstract types (e.g. Real) for things like arrays and struct fields, Julia must reason about a set of types rather than a specific type.  

Side note: Here is an short snippet for printing the subtypes of an abstract type using AbstractTrees:

julia> using AbstractTrees

julia> AbstractTrees.children(x::Type) = subtypes(x)

julia> print_tree(Real)
Real
├─ AbstractFloat
│  ├─ BigFloat
│  ├─ Float16
│  ├─ Float32
│  └─ Float64
├─ AbstractIrrational
│  └─ Irrational
├─ Integer
│  ├─ Bool
│  ├─ Signed
│  │  ├─ BigInt
│  │  ├─ Int128
│  │  ├─ Int16
│  │  ├─ Int32
│  │  ├─ Int64
│  │  └─ Int8
│  └─ Unsigned
│     ├─ UInt128
│     ├─ UInt16
│     ├─ UInt32
│     ├─ UInt64
│     └─ UInt8
└─ Rational
Real type tree.
  • If you are initializing an empty array, but know the type in advance, let Julia know about it!
x = []         # eltype(x) == Any 

x = Float64[]  # eltype(x) == Float64
  • For structs, you can make them parametric to avoid ambiguous field types.
# Ambiguous field type!
struct A 
    thing::Real
end 

# Unambiguous field type!
struct B{T <: Real}
    thing::T 
end
  • If you are unsure about whether a type is abstract or concrete, you can check with isconcretetype.  You can think about abstract types as things that "don't exist", but instead they define a set of things that do exist.

⏱️ Use @time and watch your allocations.  

One of the easiest speed gains is to remove temporary variables from a computation since Julia needs to spend time cleaning these up (garbage collection).

  • NOTE!  The first time you call @time do_something() will also include some overhead from the JIT (Just-in-time) compiler.  You'll need to run it a second time for the most accurate results.  For the most accurate measurement, see the fantastic BenchmarkTools package and its @btime macro.

Take a look at this poorly written loop to add 100 to each element of an array:

julia> function add100!(x)
           for i in 1:100
               x[:] = x .+ 1
           end
           return x
       end
add100 (generic function with 1 method)

julia> data = randn(10^6);

julia> @time add100!(data);
  1.115484 seconds (295.91 k allocations: 780.613 MiB, 58.96% gc time, 18.67% compilation time)

julia> @time add100!(data);
  0.386620 seconds (200 allocations: 762.947 MiB, 21.76% gc time)
😟

There are number of red flags in the last line above:

  • The number of allocations (200).  Lots of temporary vectors are being created!
  • The memory being allocated (762.947 mebibytes).  
  • The percentage of time spent in garbage collection (21.76%).

Let's try again and see that these metrics can be improved by a lot.

julia> function add100!(x)
           x .+= 100
       end
add100 (generic function with 1 method)

julia> data = randn(10^6);

julia> @time add100!(data);
  0.040681 seconds (140.36 k allocations: 8.008 MiB, 98.36% compilation time)

julia> @time add100!(data);
  0.001023 seconds
😃

If you are seeing unexpectedly poor @time results, try asking yourself some of the following questions:

Can I use broadcasting?

Are you creating temporary arrays as an intermediate step in a computation?  In Julia, you can fuse multiple element-wise computations with the dot syntax:

x = randn(100)

# No intermediate vectors here!  Thanks broadcast fusion!
y = sin.(abs.(x)) .+ 1 ./ (1:100)

Is there a mutating (!) version of a function I can use?

For example, sort!(x) will sort the items of an array x in-place whereas sort(x) will return a sorted copy of x.

Working with arrays?  

Can I use views?

  • Accessing sections of an array, e.g. x[:, 1:2], creates a copy!  Try using a view of the data instead, which does not copy.  You can do this with the view function or @views macro.
x = randn(3, 3)

# The view equivalent of x[:, 1:2] (all rows, first 2 cols)
view(x, :, 1:2)  

# These are all views
@views begin 
    x[1:2, 1]
    x[:, 1:2]
end
Creating Views
  • Most Julia package developers write functions in terms of abstract types, e.g. f(x::AbstractVector{<:Integer}) vs. f(x::Vector{<:Integer}) (there's no performance penalty because of Julia's JIT).  This means views can be used as easy drop-in replacements for array copies!

Am I accessing elements in order?

Julia's arrays are stored in column-major order.  If you are, for example, iterating through the elements of a Matrix, make sure the inner loop is iterating over rows, since column elements are stored next to each other in your computer's memory:

x = rand(3,3)

for j in 1:3      # column j
    for i in 1:3  # row i
        x_ij = x[i, j]
        perform_calculation_on_element(x_ij)
    end
end

🚀 That's It!

You now know how to achieve some easy performance wins in Julia.  The Julia community is very concerned (some might say obsessed) with performance, so there is a long rabbit hole to go down if this has piqued your interest.  If you want to know more details and even more performance tips, see the link below to the Julia manual's Performance Tips section.

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.

Resources

Animations with Plots.jl

By: Josh Day

Re-posted from: https://www.juliafordatascience.com/animations-with-plots-jl/

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.


Why Animations are Great

Animations with Plots.jl

The ability to communicate results is an under-appreciated skill in data science.  An important analysis can be unheard or misunderstood if it's not presented well.  Let's borrow the model for data science projects proposed by R for Data Science, in which Communicate is the final step.  

Animations with Plots.jl
R for Data Science: Model for Data Science Projects

Animations tell a story that static images are unable to tell by adding an extra dimension (often time).  They are also more engaging to an audience (here is one of many social marketing blogs on the topic of engagement from video vs. static images).  Getting your audience to pay attention is a part of communicating your results, so animations are a great tool.

Animations with Plots.jl

Plots.jl is a Julia package that provides a unified syntax for multiple plotting backends.  It also provides some super simple and powerful utilities for creating animations in the gif format.  There are several ways to create animations in Plots, with varying levels of complexity.  We recommend using Pluto (see our Pluto introduction here) to make Plots animations because they'll appear in the notebook.

The simplest way is the @gif macro.

  • Place @gif in front of a for loop that generates a plot in each iteration.  Each plot will be saved as a single frame in the animation.
using Plots 

@gif for i in 1:50
    plot(sin, 0, i * 2pi / 10)
end
Animations with Plots.jl
  • You can add "flags" such as every n to only save a frame every n images. or when <condition> to only save certain frames.
@gif for i in 1:50
    plot(sin, 0, i * 2pi / 10)
end when i > 30
Animations with Plots.jl

To control frames-per-second, use @animate.

  • Works just like @gif, but creates a Plots.Animation rather than a gif directly.
anim = @animate for i in 1:50
    Random.seed!(123)
    scatter(cumsum(randn(i)), ms=i, lab="", alpha = 1 - i/50, 
        xlim=(0,50), ylim=(-5, 7))
end
  • You can then use the gif function on your Animation.
gif(anim, fps=50)
Animations with Plots.jl

For the most control, use Plots.Animation directly.

  • Save each frame explicitly with frame.
a = Animation()
	
for i in 1:10
    plt = bar(1:i, ylim=(0,10), xlim=(0,10), lab="")
    frame(a, plt)
end
	
gif(a)
Animations with Plots.jl

🚀 That's It!

You now know how to make some cool animations with Julia and Plots.jl.

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.

Additional Resources

First Steps #5: Pluto.jl 🎈

By: Josh Day

Re-posted from: https://www.juliafordatascience.com/first-steps-5-pluto/

What's Pluto?

First Steps #5: Pluto.jl 🎈

Notebook environments (e.g. Jupyter and Observable) have become extremely popularity in the last decade.  They give programmers a way to intersperse code with markup, add interactive UI elements, and show off code in a format more interesting than text files.  People love them (well, not everyone).

Pluto.jl is a newcomer (PlutoCon 2021 was just held to celebrate its one-year anniversary!) to the world of notebook environments.  It provides a reactive environment specific to Julia.  People are doing some very cool things with Pluto.  Check out MIT's Introduction to Compuitational Thinking course for some fantastic public lectures with Pluto.

Pluto Quickstart

  • Installing Pluto:
] add Pluto
  • Starting the Pluto Server:
using Pluto

Pluto.run()
  • The above command will open up the following page.  
First Steps #5: Pluto.jl 🎈
Pluto Welcome Screen
  • To get back to this page from an opened notebook, click the Pluto.jl icon in the top navbar.  
  • For a deeper introduction to Pluto, go through the sample notebooks (we highly recommend them!).
  • Press ctrl + ? to view keyboard shortcuts:
First Steps #5: Pluto.jl 🎈

Key Points about Pluto

1. Your Code is Reactive.  

When you change a variable, that change gets propagated through all cells which reference that variable.

First Steps #5: Pluto.jl 🎈

2. Returned Values will Render as HTML.

That means things like the markdown string macro (md) will look nice.  Note that output gets displayed above the code.

First Steps #5: Pluto.jl 🎈

3. Code can be Hidden.

Click the eye icon on the top left of a cell to hide the code.  It only appears if your cursor is hovering over the cell.

First Steps #5: Pluto.jl 🎈

4. You can @bind HTML Inputs to Julia Variables.

Here we are using Pluto.@bind along with the html string macro to create a simple text input and bind it to a Julia variable my_input.  The @bind macro works with any HTML input type.

First Steps #5: Pluto.jl 🎈

5. You can Avoid Writing HTML by using PlutoUI.

  • First:
] add PlutoUI
  • Then:
First Steps #5: Pluto.jl 🎈
  • To see all of the UI options in PlutoUI, open the PlutoUI.jl sample notebook.

Notes, Tips, and Tricks

Multiple Expressions

  • Pluto will try to get you to split multiple expressions into multiple cells (You can also put multiple expressions in a beginend block).  This helps Pluto manage the dependencies between cells and avoids unnecessary re-running of code that "reacts" to something it doesn't need to.

Custom Display Methods

  • If you want something to make use of Pluto's rich HTML display, you need to define your own Base.show method for the text/html MIME type.
First Steps #5: Pluto.jl 🎈

Interpolating UI Elements into Markdown

You can use Julia's string interpolation syntax to interpolate values into a markdown block that will then get rendered as HTML.  This includes html strings and PlutoUI elements!  You can even define the UI element somewhere else to keep your markdown block look cleaner.

x_ui = @bind x Slider(1:10)

md"My UI Element: $x_ui"

# Provides the same result: 

md"My UI Element: $(@bind x Slider(1:10))"
First Steps #5: Pluto.jl 🎈
Cool Little Pluto UI

Final Thoughts

On a personal note, I've found Pluto particularly useful for making:

  1. Lightweight user interfaces for customers without strong Julia skills.  I simply teach the customer to run Pluto.run() and then I don't need to deal with the overhead of developing a full web app.  The downside is that Pluto notebooks can't (yet) be deployed as a web app.
  2. Interactive presentations.  Pluto works great for demonstrating code and more.  A huge benefit is that thanks to reactivity, you'll never get in an awkward state with cells run out of order!
  3. Data Visualization.  Data visualization is often an iterative process that takes many incremental changes to get the plot you want.  The reactivity of Pluto provides instant feedback and greatly speeds up this process.

🚀 That's It!

You now know how to do some really cool stuff with Pluto.  What will you build with it?

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.

Additional Resources