Julia Quickstart

By: Josh Day

Re-posted from: https://www.juliafordatascience.com/quickstart/

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.


Julia Quickstart

This post is something between a FAQ and lightning-fast introduction to Julia.  Think of it as "First Steps #0: I've heard of Julia. What's it Like to Code in It?".  After you've read this, check out our First Steps series to keep on learning!  

This page was last updated June 10, 2021.


🤔 I'm Stuck.  Where Can I Find Help?

1. Try the Julia REPL's help mode.

2. Help mode didn't answer your question?

3. Still stuck? Time to ask for help! 🙋

  • Do you think other people have the same question?
    • Yes: Please post your question on Julia Discourse for posterity! Slack messages disappear after a time and we'd love to keep our shared knowledge searchable.
    • No: Ask on the Julia Slack.

The Julia community is full of people who like to help! We'll note that it's beneficial for everyone if you ask good questions.


Working with Arrays

Creating Vectors

x = [1, 2, 3, 4]

# A "Range" doesn't store the values between 1 and 4.
y = 1:4  

# `1:4` -> `[1, 2, 3, 4]`
collect(y)  

# 1 to 100 with a step size of 3: [1, 4, 7, ..., 94, 97, 100]
1:3:100

Creating Matrices

# Row-vector (1 x 4)
[1 2 3 4] 

# Matrix (2 x 3)
[1 2 3 ; 3 4 5]

# Matrix (100 x 3) of random Normal(0, 1) samples
randn(100, 3)

Indexing (1-Based)

If someone tells you a language is unusable because it uses 1 (or 0)-based indexing, they are just plain wrong.

1-based indexing is a big deal for same reason most other fad topics are a big deal: it’s such a simple idea that everyone can have an opinion on it, and everyone seems to think they can “help” by telling their personal experience about how this arbitrary choice has affected them at one time in their life.

Chris Rackauckus via Julia Discourse

x = rand(100, 2)

x[3, 2]  # retrieve 3rd row of column 2

Arrays are Column-Major

This means that data in a matrix is stored in computer memory with column elements next to each other.

x = rand(100, 2)

x[105] == x[5, 2]

Working With Strings

  • A big difference from some languages is that " is different from '.
  • Strings are made "like this".
  • Character literals are made like this: 's'.
  • String concatenation is achieved via *:
julia> "Hello, " * "World!"
"Hello, World!"
Hello, World!
  • String interpolation is achieved through $.
julia> x = "World!"
"World!"

julia> "Hello, $x"
"Hello, World!"
Hello Again
  • String macros change the interpretation of a string:
julia> r"[a-z]"  # I'm a regular expression!
r"[a-z]"

julia> html"<div>I'm html</div>"  # I'm HTML!
HTML{String}("<div>I'm html</div>")

📦 How do I Find/Install/Load Packages?

Finding Packages

JuliaHub is a great resource for discovering packages.  We find it's a bit easier to find stuff compared to Googling.

It's hard to know which Julia packages are "the good ones" at first glance.  However, good packages tend to have similar characteristics:

  • Active development.  GitHub's pulse feature shows a summary of package activity.
  • Quality documentation.  It's a good sign when the docs are both understandable and thorough, as they are for DataFrames.
  • Other people are interested in it.  On GitHub, the Watch number is the how many people receive notifications for activity, the Star number is how many people have "liked" it, and Fork is how many people have created their own copy of the package to potentially make changes to it.  It's typically a good sign when these numbers are large.
Julia Quickstart
"Interest" in DataFrames.jl

Installing Packages

The simplest way to add packages is to use Pkg Mode in the REPL by pressing ].  You'll notice the prompt will change to (current environment) pkg>

(@v1.6) pkg> add DataFrames, StatsBase

Loading Packages

using DataFrames, StatsBase

# Only bring certain names into the namespace
using StatsBase: countmap, zscore

Using Environments

Julia lets you use different environments that use different collections of packages/package versions.  The default environment is v1.6 (note the Pkg Mode prompt above).  You can activate a new environment with:

] activate <dir>

If you make changes (e.g. add a package) to an environment, two files will be created: Project.toml and Manifest.toml.

  • What's Project.toml?  How the user tells Julia what they want installed.  Version bounds for packages go here.
  • What's Manifest.toml?  How Julia tells the user what is installed.  

What are Types?

  • Everything in Julia has a type.
julia> typeof(1)
Int64
  • Types can be parameterized by another type.  For example, an Array is parameterized by the type of its elements and number of dimensions.  Therefore, a vector of 64-bit integers is an Array{Int64, 1}.
julia> typeof([1,2,3])
Vector{Int64} (alias for Array{Int64, 1})
  • If we follow Int64 "up the type tree" we'll eventually run into Any, the top level abstract type.  
julia> supertype(Int64)
Signed

julia> supertype(Signed)
Integer

julia> supertype(Integer)
Real

julia> supertype(Real)
Number

julia> supertype(Number)
Any
  • Abstract types "don't exist", but they define a set of concrete types (things that exist).  For example, you can create an instance of Int64, but not Real.  Inside the set of all Real numbers, Int64 is one of many concrete types.
Julia Quickstart
Real Numbers

🎉 What is Multiple Dispatch? 🎉

  • Multiple dispatch is a major part of why people love Julia.  The gist of it is that you can write functions so that different/specialized code is called depending on the types of the arguments.
julia> f(x::Int) = 1
f (generic function with 1 method)

julia> f(x::Float64) = 2
f (generic function with 2 methods)

julia> f(1)
1

julia> f(1.0)
2
Super Simple Multiple Dispatch Example
  • Above we used ::Type to add a type annotation.  Since we only added methods for ::Int and ::Float64, our function f can only be called on Ints and Float64s.  However, type annotations are not necessary:

Automatic Specialized Code

  • Julia uses a Just-in-time compiler, meaning that every time you call a function with new types, Julia compiles a specific method for exactly those types.  Thus the following two functions will have the same performance!
function f(x::Type1, y::Type2, z::Type3)
    # big computation
end

function f(x, y, z)
    # big computation
end

What is Broadcasting?

Broadcasting is a way of applying a function to multiple inputs at once.  

  • For example, there is no mathematical definition for calling the sine function on a vector, but many languages will automatically apply sine to each element.  In Julia, you must explicity broadcast a function over multiple inputs by adding a dot .
julia> sin([1,2,3])
ERROR: MethodError: no method matching sin(::Vector{Int64})

julia> sin.([1,2,3])
3-element Vector{Float64}:
 0.8414709848078965
 0.9092974268256817
 0.1411200080598672
  • You can even fuse broadcasted computations, which removes the need to create temporary vectors:
julia> x = [1,2,3];

julia> y = [4,5,6];

julia> z = [7,8,9];

julia> x .+ (y .* sin.(z))
3-element Vector{Float64}:
 3.6279463948751562
 6.946791233116909
 5.472710911450539
Broadcast Fusion

How do I Code in Julia?

According to the 2020 Julia User & Developer Survey (PDF), Julia programmers use the following editors/IDEs "frequently":

A new coding environment on the scene is Pluto.jl, which we love!  If you are new to Julia or programming in general, we recommend starting with Pluto 🎈.


What are Macros?

Macros (names that start with @) are functions of expressions.  They let you change an expression before it gets run.  For example, @time will record both the time elapsed and allocations generated from an expression.

julia> @time begin
          sleep(1)
          sleep(2)
       end
  3.008873 seconds (8 allocations: 256 bytes)

Metaprogramming (writing code that writes other code) is a pretty advanced topic.  It's also a super powerful tool.


That's it!

Did you like this post?  Have a question?  Did we miss something important?

Ping us on Twitter at @JuliaForDataSci 🚀

Additional Resources