Re-posted from: https://white.ucc.asn.au/2019/04/03/Julia-Nomenclature.html
These are some terms that get thrown around a lot by julia programmers.
This is a brief writeup of a few of them.
Closures
Closures are when a function is created (normally via returning from anotehr function)
that references some variable in the enclosing scope.
We say that it closes over those variables.
Simple
This closure closes over count
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Useful
I use this to control early stopping when training neural networks.
This closes over best_loss
and remaining_patience
Input:
Output:
Input:
Output:
Input:
Output:
you may be using closures without realising it
e.g. the following closes over model
function runall(dates)
model = Model()
pmap(dates) do the_day
simulate(model, the_day)
end
end
Parallelism
3 types:
- Multiprocessing / Distributed
- Multithreading / Shared Memory
- Asynchronous / Coroutines
Multiprocessing / Distributed
- this is
pmap
,remotecall
,@spawn
. - Actually starts seperate julia process
- potentially on another machine
- Often has high communication overhead
Multithreading / Shared Memory
- this is
@threads
- Also in julia 1.2 is coming PARTR
- Can be unsafe, care must always be taken to do things in a threadsafe way
Asynchronous / Coroutines
- this is
@async
, and@asyncmap
- Does not actually allow two things to run at once, but allows tasks to take turns running
- Mostly safe
- Does not lead to speedup unless the “work” is done elsewhere
- e.g. in
IO
the time is spent filling network buffers / spinning up disks - e.g. if you are spawning extra process like with
run
time is spent in those processes.
- e.g. in
Dynamic Dispatch vs Static Dispatch
- If which method to call needs to be dicided at runtime then it will be a dynamic dispatch
- i.e. if it nees to be is decided by the values of the input, or by external factors
- If it can be decided at compile time it will be a static dispatch
- i.e. if it can be decided only by the types of the input
Input:
Output:
Input:
Output:
Input:
Output:
Type Stability
Closely related to Dynamic vs Static Dispatch
- If the return type can decided at compile time then it will be a type stable
- i.e. if the return type is decided only by the types of the input
- If the return type can’t decided until run time then it will be a type unstable
- i.e. if the return type is decided by the values of the input, or by external factors
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Type Piracy
If your package did not define the
- Function (name); or
- at least 1 of the argument types
You are doing a type piracy, and this is a bad thing.
By doing type piracy you can break code in other models even if they don’t import your definitions.
Input:
Output:
Lets define a new method, to reduce the magnitude first element by the first argument and the second by the second
we are going to call it mapreduce
because is is kind of mapping this reduction in magnitude.
And because this is a slightly forced example.
Input:
Input:
Output:
Lets sum some numbers
Input:
Output:
DimensionMismatch("arrays could not be broadcast to a common size")
Stacktrace:
[1] _bcs1 at ./broadcast.jl:438 [inlined]
[2] _bcs at ./broadcast.jl:432 [inlined]
[3] broadcast_shape at ./broadcast.jl:426 [inlined]
[4] combine_axes at ./broadcast.jl:421 [inlined]
[5] _axes at ./broadcast.jl:208 [inlined]
[6] axes at ./broadcast.jl:206 [inlined]
[7] combine_axes at ./broadcast.jl:422 [inlined]
[8] combine_axes at ./broadcast.jl:421 [inlined]
[9] instantiate at ./broadcast.jl:255 [inlined]
[10] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(sign),Tuple{Array{Int64,1}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(-),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(abs),Tuple{Array{Int64,1}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(abs),Tuple{Array{Function,1}}}}}}}) at ./broadcast.jl:753
[11] mapreduce(::Function, ::Function, ::Array{Int64,1}) at ./In[19]:3
[12] _sum at ./reducedim.jl:653 [inlined]
[13] _sum at ./reducedim.jl:652 [inlined]
[14] #sum#550 at ./reducedim.jl:648 [inlined]
[15] sum(::Array{Int64,1}) at ./reducedim.jl:648
[16] top-level scope at In[21]:1
Glue Packages
Sometimes to make two packages work together,
you have to make them aware of each others types.
For example to implement
convert(::Type(DataFrame), axisarray::AxisArray)
where
convert
is from BaseDataFrame
is from DataFrames.jlAxisArray
is from AxisArrays.jl
Then the only way to do this without type piracy is to do it either DataFrames.jl or AxisArrays.jl.
But that isn’t possible without adding a dependency which isn’t great.
So instead we have a Glue Package, eg, DataFrameAxisArrayBuddies.jl,
that adds this method.
It is piracy but it is fairly safe, since it is adding behavour to types that would normally be a method error as is. Misdemenor type piracy.
Wrapper Types and Delegation Pattern
I would argue that this is a core part of polymorphism via composition.
In the following example, we construct SampledVector
,
which is a vector-like type that has fast access to the total so that it can quickly calculate the mean.
It is a wrapper of the Vector type,
and it delegates several methods to it.
Even though it overloads Statistics.mean
, and push!
, size
and getindex
from Base
,
we do not commit type piracy, as we alrways own one of the types – the SampleVector
.
Input:
Output:
Input:
Input:
delegate size
and getindex
Input:
Demo
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Input:
Output:
Views vs Copies
In julia indexing slices from arrays produces a copy.
ys = xs[1:3, :]
will allocate a new array with the first 3 rows of xs
copied into it.
Modifying ys
will not modify xs
.
Further ys
is certain to work fast from suitable CPU operations because of its striding.
However, allocating memory itself is quiet slow.
In contrast one can take a view into the existing array: with @view
or the function view
.
ys = @view xs[1:3, :]
will make a SubArray
which acts like an array that contains only the first 3 rows of xs
.
But creating it will not allocate (in julia 1.5 literally not at all, prior to that it will allocate a handful of bytes for a pointer.)
Further, mutating the content of ys
will mutate the content of xs
.
It may or may not be able to hit very fast operations, depending on the striding pattern.
It will probably be pretty good none-the-less, since allocation is very slow.
Note this is a difference from numpy where it is always views, and you opt-out by calling copy(x[...])
,
and from MATLAB where it is always copying, its been long enough that that i don’t remember how to opt into views.
The concept of views vs copies is more general than just arrays.
Substring
s are views into strings.
They also apply to DataFrames.
Indexing into DataFrames is fairly intuitive, though when you write it down it looks complex.
@view
and indexing works as per with Arrays, where normal indexing creates a new dataframe (or vector if just 1 column index) with a copy of the selected region, and @view
makes a SubDataFrame
.
But there is the additional interesting case, that accessing a dataframe column either by getproperty
(as in df.name
) or via !
indexing (as in df[!, :name]
) creates what is conceptually a view into that column of the DataFrame.
Even though it is AbstractVector
typed (rather than SubVector
typed), it acts like a view, in that creating it is nonallocating, and mutating it mutates the original dataframe.
Implementation wise it is actually direct access to the DataFrame’s internal column storage, but semantically it is a view into that column of the dataframe.
Tim Holy Traits
Traits as something that naturally falls out of functions that can be performed on types at compile time,
and on having multiple dispatch.
See previous post for details.
and better future post.