By: Josh Day
Re-posted from: https://www.juliafordatascience.com/quickstart/
Enjoying Julia For Data Science? Please share us with a friend and follow us on Twitter at @JuliaForDataSci.
This post is something between a FAQ and lightning-fast introduction to Julia. Think of it as "First Steps #0: I've heard of Julia. What's it Like to Code in It?". After you've read this, check out our First Steps series to keep on learning!
This page was last updated June 10, 2021.
🤔 I'm Stuck. Where Can I Find Help?
1. Try the Julia REPL's help mode.
- Help mode is enabled by typing
?
. You can search for anything and display its documentation: functions, macros, types, variables, etc.
2. Help mode didn't answer your question?
- Search on Julia Discourse.
3. Still stuck? Time to ask for help! 🙋
- Do you think other people have the same question?
- Yes: Please post your question on Julia Discourse for posterity! Slack messages disappear after a time and we'd love to keep our shared knowledge searchable.
- No: Ask on the Julia Slack.
The Julia community is full of people who like to help! We'll note that it's beneficial for everyone if you ask good questions.
Working with Arrays
Creating Vectors
x = [1, 2, 3, 4]
# A "Range" doesn't store the values between 1 and 4.
y = 1:4
# `1:4` -> `[1, 2, 3, 4]`
collect(y)
# 1 to 100 with a step size of 3: [1, 4, 7, ..., 94, 97, 100]
1:3:100
Creating Matrices
# Row-vector (1 x 4)
[1 2 3 4]
# Matrix (2 x 3)
[1 2 3 ; 3 4 5]
# Matrix (100 x 3) of random Normal(0, 1) samples
randn(100, 3)
Indexing (1-Based)
If someone tells you a language is unusable because it uses 1 (or 0)-based indexing, they are just plain wrong.
1-based indexing is a big deal for same reason most other fad topics are a big deal: it’s such a simple idea that everyone can have an opinion on it, and everyone seems to think they can “help” by telling their personal experience about how this arbitrary choice has affected them at one time in their life.
x = rand(100, 2)
x[3, 2] # retrieve 3rd row of column 2
Arrays are Column-Major
This means that data in a matrix is stored in computer memory with column elements next to each other.
x = rand(100, 2)
x[105] == x[5, 2]
Working With Strings
- A big difference from some languages is that
"
is different from'
. - Strings are made
"like this"
. - Character literals are made like this:
's'
. - String concatenation is achieved via
*
:
- String interpolation is achieved through
$
.
- String macros change the interpretation of a string:
julia> r"[a-z]" # I'm a regular expression!
r"[a-z]"
julia> html"<div>I'm html</div>" # I'm HTML!
HTML{String}("<div>I'm html</div>")
📦 How do I Find/Install/Load Packages?
Finding Packages
JuliaHub is a great resource for discovering packages. We find it's a bit easier to find stuff compared to Googling.
It's hard to know which Julia packages are "the good ones" at first glance. However, good packages tend to have similar characteristics:
- Active development. GitHub's pulse feature shows a summary of package activity.
- Quality documentation. It's a good sign when the docs are both understandable and thorough, as they are for DataFrames.
- Other people are interested in it. On GitHub, the Watch number is the how many people receive notifications for activity, the Star number is how many people have "liked" it, and Fork is how many people have created their own copy of the package to potentially make changes to it. It's typically a good sign when these numbers are large.
Installing Packages
The simplest way to add packages is to use Pkg Mode in the REPL by pressing ]
. You'll notice the prompt will change to (current environment) pkg>
(@v1.6) pkg> add DataFrames, StatsBase
Loading Packages
using DataFrames, StatsBase
# Only bring certain names into the namespace
using StatsBase: countmap, zscore
Using Environments
Julia lets you use different environments that use different collections of packages/package versions. The default environment is v1.6
(note the Pkg Mode prompt above). You can activate a new environment with:
] activate <dir>
If you make changes (e.g. add a package) to an environment, two files will be created: Project.toml and Manifest.toml.
- What's Project.toml? How the user tells Julia what they want installed. Version bounds for packages go here.
- What's Manifest.toml? How Julia tells the user what is installed.
What are Types?
- Everything in Julia has a type.
julia> typeof(1)
Int64
- Types can be parameterized by another type. For example, an
Array
is parameterized by the type of its elements and number of dimensions. Therefore, a vector of 64-bit integers is anArray{Int64, 1}
.
julia> typeof([1,2,3])
Vector{Int64} (alias for Array{Int64, 1})
- If we follow
Int64
"up the type tree" we'll eventually run intoAny
, the top level abstract type.
julia> supertype(Int64)
Signed
julia> supertype(Signed)
Integer
julia> supertype(Integer)
Real
julia> supertype(Real)
Number
julia> supertype(Number)
Any
- Abstract types "don't exist", but they define a set of concrete types (things that exist). For example, you can create an instance of
Int64
, but notReal
. Inside the set of allReal
numbers,Int64
is one of many concrete types.
🎉 What is Multiple Dispatch? 🎉
- Multiple dispatch is a major part of why people love Julia. The gist of it is that you can write functions so that different/specialized code is called depending on the types of the arguments.
- Above we used
::Type
to add a type annotation. Since we only added methods for::Int
and::Float64
, our functionf
can only be called onInt
s andFloat64
s. However, type annotations are not necessary:
Automatic Specialized Code
- Julia uses a Just-in-time compiler, meaning that every time you call a function with new types, Julia compiles a specific method for exactly those types. Thus the following two functions will have the same performance!
function f(x::Type1, y::Type2, z::Type3)
# big computation
end
function f(x, y, z)
# big computation
end
What is Broadcasting?
Broadcasting is a way of applying a function to multiple inputs at once.
- For example, there is no mathematical definition for calling the sine function on a vector, but many languages will automatically apply sine to each element. In Julia, you must explicity broadcast a function over multiple inputs by adding a dot
.
julia> sin([1,2,3])
ERROR: MethodError: no method matching sin(::Vector{Int64})
julia> sin.([1,2,3])
3-element Vector{Float64}:
0.8414709848078965
0.9092974268256817
0.1411200080598672
- You can even fuse broadcasted computations, which removes the need to create temporary vectors:
How do I Code in Julia?
According to the 2020 Julia User & Developer Survey (PDF), Julia programmers use the following editors/IDEs "frequently":
- 39% use Juno.
- 35% use VS Code with the Julia Plug-in.
- 31% use JupyterLab.
A new coding environment on the scene is Pluto.jl, which we love! If you are new to Julia or programming in general, we recommend starting with Pluto 🎈.
What are Macros?
Macros (names that start with @
) are functions of expressions. They let you change an expression before it gets run. For example, @time
will record both the time elapsed and allocations generated from an expression.
julia> @time begin
sleep(1)
sleep(2)
end
3.008873 seconds (8 allocations: 256 bytes)
Metaprogramming (writing code that writes other code) is a pretty advanced topic. It's also a super powerful tool.
That's it!
Did you like this post? Have a question? Did we miss something important?
Ping us on Twitter at @JuliaForDataSci 🚀