By: Josh Day
Re-posted from: https://www.juliafordatascience.com/first-steps-3-primer-on-plots/
Visualizing data is an essential skill for a data scientist. Unlike R, Julia does not ship with plotting functionality built-in. If you search for ways to make plots in Julia, you'll discover a lot of options. So what should you use?
📊 Plots.jl
We recommend the Plots package (especially for beginners).
Plots is a unified interface for creating visualizations with different backends (such as GR, Plotly.js, and UnicodePlots). It's great for beginners and power users both and it's designed such that a lot things you try will "just work".
💻 Install Plots
In the Julia REPL, add the Plots package if you haven't already done so. Recall that you enter Pkg Mode by pressing ]
:
(@v1.6) pkg> add Plots
📈 Create Your First Plot
Back in Julia mode (by pressing delete), enter:
julia> using Plots
julia> plot(randn(10), title="My First Plot")
🎉 Congrats! You made your first plot 📈! You created it using:
randn(10)
: AVector
of 10 random samples from a Normal(0,1) distribution.- The GR backend (Plots' default).
✨ Core Principles
The main function you'll use, as you may have guessed, is
plot(args...; kw...)
Here args...
means any number of positional arguments and kw...
is any number of keyword arguments. Look back at the first plot we created and notice that we provided data randn(10)
as a positional argument and the title title="My First Plot"
as a keyword argument. Another function you'll use is
plot!(args...; kw...)
In Julia, !
is used as a convention to identify functions that mutate at least one of the arguments. With Plots, this lets you make changes or additions to a plot.
Now that we know the functions we are using, let's look at the core principles:
Principle #1: Every Thing You Plot is a Series
When you give data to the plot
function (like randn(10)
above), the seriestype
determines how Plots will interpet the data. By default this is :path
.
plot(1:10, seriestype = :path, label = "Series 1")
plot!(rand(1:10,10), seriestype = :scatter, label = "Series 2")
Principle #2: Plot Attributes have Aliases
Plot attributes are passed by keyword arguments. Because of aliases, you can often guess at the name of an attribute and Plots will interpret it correctly. For example, the following commands are equivalent:
plot(randn(10), seriestype = :scatter)
plot(randn(10), st = :scatter)
scatter(randn(10))
Principle #3: Columns are Mapped to Series
For both data and attributes, the columns of matrices will be mapped to individual series. In this example, we create two series by providing a 10 x 2 matrix. Now look at the difference between p1
and p2
. If the st
(seriestype
) attribute is a vector, the provided attributes will loop through the available series. If the st
attribute is a matrix, the attributes in the i-th column will be mapped to the i-th series. This provides a very succinct way of providing attributes to series.
x = randn(10, 2)
# Series 1 --> :scatter & :line
# Series 2 --> :scatter & :line
p1 = plot(x, st=[:scatter, :line])
# Series 1 --> :scatter
# Series 2 --> :line
p2 = plot(x, st=[:scatter :line])
plot(p1, p2)
Principle #4: Some Attributes are Magic 🪄
Some attributes can be provided with multiple values all at once and Plots will figure out what to do with them. For example, using m=(10, .5, "blue")
will set the marker size to 10
, the marker alpha (opacity) to 0.5
, and the marker color to "blue"
.
plot(randn(10), m = (10, .5, "blue"))
Principle #5: Many Types have Plot Recipes
This is best seen through example. Let's add the RDatasets and OnlineStats packages via Pkg Mode in the REPL:
(@v1.6) pkg> add OnlineStats RDatasets
Now load the packages and retrieve the diamonds dataset that comes packaged with R's ggplot2. The diamonds data is collection of variables on diamond price and quality.
using RDatasets, OnlineStats
df = dataset("ggplot2", "diamonds")
Suppose the first thing we want to see is the distribution of the :Cut
variable in our diamonds data. We'll use OnlineStats.CountMap
to count the number of occurrences for each unique value in the :Cut
column.
When we plot the CountMap
, a recipe is invoked to turn it into data that Plots knows how to display. What recipes provide, other than say a plot_countmap
function, is the ability to hook into plot attributes just as if you were plotting raw numbers.
o = CountMap(String)
fit!(o, string.(df.Cut))
plot(o, title="Neat!")
Try This!
Use a Different Backend
The backends of Plots can be changed interactively. Try typing
plotly()
to switch to the interactive javascript library Plotly.js. Then rerun the above examples.
That's It!
Now you know Plots' core principles. Time to try a few things on your own!
Enjoying Julia For Data Science? Please share us with a friend and follow us on Twitter at @JuliaForDataSci.