Author Archives: Andrew Collier

#MonthOfJulia Day 5: Collections

Julia caters for various collection types including tuples and arrays, dictionaries, sets, dequeues, priority queues and heaps. There’s a lot of functionality distributed across these different structures, so we’ll only skim the surface and pick out a few interesting bits and pieces.

An Array is really the most important workhorse collection (IMHO). Julia can handle arrays of arbitrary dimension, but we’ll only have a look at the most commonly used, which are 1D and 2D.

julia> x = [-7, 1, 2, 3, 5]
5-element Array{Int64,1}:
 -7
  1
  2
  3
  5
julia> typeof(x)
Array{Int64,1}
julia> eltype(x)
Int64
julia> y = [3, "foo", 'a']                 # Elements can be of mixed type
3-element Array{Any,1}:
   3     
    "foo"
 'a'     
julia> typeof(y)                           # Type of the Array itself
Array{Any,1}
julia> eltype(y)                           # Type of the elements in the Array
Any

Type promotion is applied to an array with mixed content (like the second example above, which contains an integer, a string and a character), elevating the element type to a common ancestor, which in the example is Any.

The usual indexing operations apply, noting that in Julia indices are 1-based.

julia> x[1]                                # First element
-7
julia> getindex(x, [1, 3])                 # Alternative syntax
2-element Array{Int64,1}:
 -7
  2
julia> x[end]                              # Last element
5
julia> x[end-1]                            # Penultimate element
3
julia> x[2:4]                              # Slicing
3-element Array{Int64,1}:
 1
 2
 3
julia> x[2:4] = [1, 5, 9]                  # Slicing with assignment
3-element Array{Int64,1}:
 1
 5
 9

An Array can be treated like a stack or queue, where additional items can be popped from or pushed onto the “end” of the collection. Functions shift!() and unshift!() do analogous operations to the “front” of the collection.

julia> pop!(x)                             # Returns last element and remove it from array.
5
julia> push!(x, 12)                        # Append value to end of array.
5-element Array{Int64,1}:
 -7
  1
  5
  9
 12
julia> append!(x, 1:3)                     # Append one array to the end of another array.
8-element Array{Int64,1}:
 -7
  1
  5
  9
 12
  1
  2
  3

What about a 2D array (or matrix)? Not too many surprises here. With reference to the examples above we can see that a 1D array is effectively a column vector.

julia> M = [1 2 3; 4 5 6; 7 8 9]
3x3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9
julia> N = [1 2; 2 3; 3 4]
3x2 Array{Int64,2}:
 1  2
 2  3
 3  4
julia> M[2,2]                      # [row,column]
5
julia> M[1:end,1]
3-element Array{Int64,1}:
 1
 4
 7
julia> M[1,:]                      # : is the same as 1:end
1x3 Array{Int64,2}:
 1  2  3

Collections are copied by reference. A shallow copy can be created with copy(). If you want a truly distinct collection of objects you need to use deepcopy().

And now a taste of the other collection types, starting with the tuple.

julia> a, b, x, text = 1, 2, 3.5, "Hello"
(1,2,3.5,"Hello")
julia> a, b = b, a                         # I never get tired of this!
(1,2)

A dictionary is just a collection of key-value pairs.

ulia> stuff = {"number" => 43, 1 => "zap!", 2.5 => 'x'}
Dict{Any,Any} with 3 entries:
  "number" => 43
  2.5      => 'x'
  1        => "zap!"
julia> stuff["number"]
43

Sets are unordered collections which are not indexed and do not allow duplicates.

julia> S1 = Set([1, 2, 3, 4, 5])           # Set{Int64}
Set{Int64}({4,2,3,5,1})
julia> S2 = Set({3, 4, 5, 6, 7})           # Set{Any}
Set{Any}({7,4,3,5,6})
julia> union(S1, S2)
Set{Any}({7,4,2,3,5,6,1})
julia> intersect(S1, S2)
Set{Int64}({4,3,5})
julia> setdiff(S2, S1)
Set{Any}({7,6})

We’ll see more about collections when we look at Julia’s functional programming capabilities, which will be in the next but one installment. In the meantime you can find the full code for today’s flirtation with Julia on github.

The post #MonthOfJulia Day 5: Collections appeared first on Exegetic Analytics.

#MonthOfJulia Day 4: Functions

Julia performs Just-in-Time (JIT) compilation using a Low Level Virtual Machine (LLVM) to create machine-specific assembly code. The first time a function is called, Julia compiles the function’s source code and the results are cached and used for any subsequent calls to the same function. However, there are some additional wrinkles to this story.

Julia caters for generic functions which will operate on variables with a variety of data types. But it gains a lot of speed by compiling specialised versions of a function which are optimised to work on particular data types. As a result the code generated by Julia achieves performance which is comparable to that of lower level languages like C and FORTRAN, as illustrated by the benchmark results below.

Julia benchmarks.

Declaring Functions

There are multiple routes to defining a function in Julia, ranging from verbose to succinct. The last statement in a function definition becomes the return value, so it is not necessary to have an explicit return statement (although for clarity it is often a good idea to be explicit!).

# Verbose form.
#
function square(x)
    return x^2                  # The return keyword is redundant, but still a good idea.
end

# One-line form.
#
square(x) = x^2
hypotenuse(x, y) = sqrt(x^2 + y^2)
x² = x -> x*x                   # To get x² type x^2 followed by Tab in the REPL.

# Anonymous (lambda) form.
#
x -> x^2
#
anonsquare = function (x)       # Creates an anonymous function and assigns it to anonsquare.
    x^2
end

The functions defined above are generic in the sense that they do not pertain to arguments of any particular data type: they should work for all (reasonable) arguments.

Functions can return multiple values as a tuple.

julia> function divide(n, m)
       	div(n,m), n%m
       end
divide (generic function with 1 method)
julia> divide(7,3)
(2,1)

Julia has an interesting convention for functions with side effects: function names which end in a ! modify their first argument. For example, sort() will return a sorted version of its input, while sort!() will reorder the elements of the input in place. Maintaining this syntax is important since all function arguments are passed by reference and it is thus perfectly permissible that the values of arguments be modified within a function.

Introspection

We can get a glimpse of at what happens under the hood during the compilation process. code_llvm() returns the bytecode generated by the LLVM for a generic function when its applied to an argument of a given type. For example, below we see the bytecode for the function sqrt() applied to a 64 bit floating point argument.

julia> code_llvm(sqrt, (Float64,))

define double @julia_sqrt_20446(double) {
top:
  %1 = fcmp uge double %0, 0.000000e+00, !dbg !1150
  br i1 %1, label %pass, label %fail, !dbg !1150

fail:                                             ; preds = %top
  %2 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !1150, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %2, i32 131), !dbg !1150
  unreachable, !dbg !1150

pass:                                             ; preds = %top
  %3 = call double @llvm.sqrt.f64(double %0), !dbg !1150
  ret double %3, !dbg !1150
}

Digging deeper we can scrutinise the native assembly instructions using code_native().

julia> code_native(sqrt, (Float64,))
	.text
Filename: math.jl
Source line: 131
	push	RBP
	mov	RBP, RSP
	xorpd	XMM1, XMM1
	ucomisd	XMM1, XMM0
Source line: 131
	ja	6
	sqrtsd	XMM0, XMM0
	pop	RBP
	ret
	movabs	RAX, 140208263704872
	mov	RDI, QWORD PTR [RAX]
	movabs	RAX, 140208248879392
	mov	ESI, 131
	call	RAX

Not being a Computer Scientist, this information is of limited use to me. But it will serve to illustrate the next point. As one might expect, the assembly code generated for a particular function depends on type of the arguments being fed into that function. If, for example, we look at the assembly code for sqrt() applied to a 64 bit integer argument then we see that there are a number of differences.

julia> code_native(sqrt, (Int64,))
	.text
Filename: math.jl
Source line: 133
	push	RBP
	mov	RBP, RSP
Source line: 133
	cvtsi2sd	XMM0, RDI
	xorpd	XMM1, XMM1
	ucomisd	XMM1, XMM0
	ja	6
	sqrtsd	XMM0, XMM0
	pop	RBP
	ret
	movabs	RAX, 140208263704872
	mov	RDI, QWORD PTR [RAX]
	movabs	RAX, 140208248879392
	mov	ESI, 133
	call	RAX

So the JIT compiler is effectively generating versions of the generic function sqrt() which are specialised for different argument types. This is good for performance because the code being executed is always optimised for the appropriate argument types.

Type Specification and Multiple Dispatch

Julia implements multiple dispatch, which means that the code executed for a specific function depends dynamically on the data type of the arguments. In the case of generic functions, each time that a function is called with a new data type specialised code is generated. But it’s also possible to define different versions of a function which are selected on the basis of the argument data type. These would then be applied in lieu of the corresponding generic function. There is an obvious parallel with function overloading.

The Julia documentation articulates these ideas better than I can:

To facilitate using many different implementations of the same concept smoothly, functions need not be defined all at once, but can rather be defined piecewise by providing specific behaviors for certain combinations of argument types and counts. A definition of one possible behavior for a function is called a method. Thus far, we have presented only examples of functions defined with a single method, applicable to all types of arguments. However, the signatures of method definitions can be annotated to indicate the types of arguments in addition to their number, and more than a single method definition may be provided. When a function is applied to a particular tuple of arguments, the most specific method applicable to those arguments is applied. Thus, the overall behavior of a function is a patchwork of the behaviors of its various method definitions. If the patchwork is well designed, even though the implementations of the methods may be quite different, the outward behavior of the function will appear seamless and consistent.

Specific data types are indicated by the :: type annotation operator. So, for example, we can create a version of square() which does something unique (and inane) for integer arguments.

julia> function square(x::Int64)
           println("Squaring an integer.")
           x^2
       end
square (generic function with 2 methods)
julia> square(1.5)                         # Using the generic function
2.25
julia> square(2)                           # Using the function specialised for Int64 arguments
Squaring an integer.
4

You can get a list of the methods associated with a function using methods(). Below we have both the specialised (Int64) version as well as the fully generic version of square().

julia> methods(square)
# 2 methods for generic function "square":
square(x::Int64) at none:2
square(x) at none:1

Type annotations can also be used within the local scope of a function body (as well as for, while, try, let, and type blocks).

function foo(x)
	y::Float64 = 3
	return(x + y)
end

I’ve really only scratched the surface here. There’s a lot more to say about default, positional and keyword arguments, operators, parametric types and a host of other topics. You can find further details of today’s Julia ramblings on github and there’s even more in the documentation for functions and methods.

The post #MonthOfJulia Day 4: Functions appeared first on Exegetic Analytics.

#MonthOfJulia Day 3: Variables and Data Types

Most coding involves the assignment and manipulation of variables. Julia is dynamically typed, which means that you don’t need to declare explicitly a variable’s data type. It also means that a single variable name can be associated with different data types at various times. Julia has a sophisticated, yet extremely flexible, system for dealing with data types. covered in great detail by the official documentation. My notes below simply highlight some salient points I uncovered while digging around.

One of the major drawbacks of dynamically typed languages is that they generally sacrifice performance for convenience. This does not apply with Julia, as explained in the quote below.

The landscape of computing has changed dramatically over the years. Modern scientific computing environments such as MATLAB, R, Mathematica, Octave, Python (with NumPy), and SciLab have grown in popularity and fall under the general category known as dynamic languages or dynamically typed languages. In these programming languages, programmers write simple, high-level code without any mention of types like int, float or double that pervade statically typed languages such as C and Fortran.

Julia is dynamically typed, but it is different as it approaches statically typed performance. New users can begin working with Julia as they did in the traditional numerical computing languages, and work their way up when ready. In Julia, types are implied by the computation itself together with input values. As a result, Julia programs are often completely generic and compute with data of different input types without modification—a feature known as “polymorphism.”

Julia: A Fresh Approach to Numerical Computing

Variables and Assignment

Variable assignment and evaluation work in precisely the way you’d expect.

julia> x = 3
3
julia> x^2
9

Julia supports Unicode, so ß, Ɛ and Ȁ are perfectly legitimate (although possibly not too sensible) variable names.

julia> ß = 3; 2ß
6

Chained and simultaneous assignments are possible.

julia> a = b = c = 5
5
julia> d, e = 3, 5
(3,5)

Multiple expressions can be combined into a single compound expression. Julia supports both verbose and compact syntax.

julia> x = begin
           p = 2
           q = 3
           p + q
       end
5
julia> x = (p = 2; q = 3; p + q)
5

Constants, declared as const, are immutable variables (forgive the horrendous contradiction, but I trust that you know what I mean!).

julia> const SPEED_OF_LIGHT = 299792458;

Somewhat disconcertingly it is possible to change the value of a constant. You just can’t change its type. This restriction has more to do with performance than anything else.

There are numerous predefined constants too.

julia> pi
π = 3.1415926535897...
julia> e
e = 2.7182818284590...
julia> VERSION
v"0.3.10"

Data Types

Julia has an extensive type hierachy with its root at the universal Any type. You can query the current data type for a variable using typeof(). As mentioned above, this is dynamic and a variable can readily be reassigned a value with a different type.

julia> x = 3.5;
julia> typeof(x)
Float64
julia> x = "I am a string";
julia> typeof(x)
ASCIIString (constructor with 2 methods)

There are various other functions for probing the type hierarchy. For example, you can use isa() to check whether a variable or constant is of a particular type.

julia> isa(x, ASCIIString)
true
julia> isa(8, Int64)
true
julia> isa(8, Number)
true
julia> isa(8, ASCIIString)
false

So we see that 8 is both a In64 and a Number. Not only does that make mathematical sense, it also suggests that there is a relationship between the In64 and Number data types. In fact Int64 is a subtype of Signed, which derives from Integer, which is a subtype of Real, which derives from Number

julia> super(Int64)
Signed
julia> super(Signed)
Integer
julia> super(Integer)
Real
julia> super(Real)
Number

Formally this can be written as

julia> Int64 <: Signed <: Integer <: Real <: Number
true

where <: is the "derived from" operator. We can explore the hierarchy in the opposite direction too, where subtypes() descends one level in the type hierarchy.

julia> subtypes(Integer)
5-element Array{Any,1}:
 BigInt  
 Bool    
 Char    
 Signed  
 Unsigned

Numeric Types

Julia supports integers between Int8 and Int128, with Int32 or Int64 being the default depending on the hardware and operating system. A “U” prefix indicates unsigned variants, like UInt64. Arbitrary precision integers are supported via the BigInt type.

Floating point numbers are stored by Float16, Float32 and Float64 types. Arbitrary precision floats are supported via the BigFloat type. Single and double precision floating point constants are given with specific syntax.

julia> typeof(1.23f-1)
Float32
julia> typeof(1.23e-1)
Float64

In Julia complex and rational numbers are parametric types, for example Complex{Float32} and Rational{Int64}. More information on complex and rational numbers in Julia can be found in the documentation.

julia> (3+4im)^2
-7 + 24im
julia> typeof(3+4im)
Complex{Int64} (constructor with 1 method)
julia> typeof(3.0+4im)
Complex{Float64} (constructor with 1 method)
julia> typeof(3//4)
Rational{Int64} (constructor with 1 method)

An encyclopaedic selection of mathematical operations is supported on numeric types.

julia> 1 + 2
3
julia> 1 / 2
0.5
julia> div(5, 3), 5 % 3
(1,2)
julia> sqrt(2)
1.4142135623730951

Characters and Strings

Julia distinguishes between strings and characters. Strings are enclosed in double quotes, while individual characters are designated by single quotes. Strings are immutable and can be either ASCII (type ASCIIString) or unicode (type UTF8String). The indexing operator [] is used to extract slices from within strings. Evaluating variables within a string is done with a syntax which will be familiar to most shell warriors.

julia> name = "Julia"
"Julia"
julia> name[4]
'i'
julia> name[end]
'a'
julia> length(name)
5
julia> "My name is $name and I'm $(2015-2012) years old."
"My name is Julia and I'm 3 years old."

There is a lot of functionality attached to the String class. To get an idea of the range, have a look at the output from methodswith(String).

Julia supports Perl regular expressions. Regular expression objects are of type Regex and defined by a string preceded by the character ‘r’ and possibly followed by a modifier character (‘i’, ‘s’, ‘m’ or ‘x’).

julia> username_regex = r"^[a-z0-9_-]{3,16}$"
r"^[a-z0-9_-]{3,16}$"
julia> ismatch(username_regex, "my-us3r_n4m3")
true
julia> ismatch(username_regex, "th1s1s-wayt00_l0ngt0beausername")
false
julia> hex_regex = r"#?([a-f0-9]{6}|[a-f0-9]{3})"i;
julia> m = match(hex_regex, "I like the color #c900b5.")
RegexMatch("#c900b5", 1="c900b5")
julia> m.match
"#c900b5"
julia> m.offset
18

Type Conversions

Explicit type conversion works either by using convert() or the lower-case type name as a function.

julia> convert(Float64, 3)
3.0
julia> float64(3)
3.0
julia> int64(2.5)
3
julia> string(2.5)
"2.5"

Type Specifications

Although Julia is dynamically typed, it’s still possible (and often desireable) to stipulate the type for a particular variable. Furthermore, from a performance perspective it’s beneficial that a variable retains a particular type once it has been assigned. This is known as type stability. We’ll find out more about these issues when we look at defining functions in Julia.

As before, my detailed notes on today’s foray into Julia can be found on github.

The post #MonthOfJulia Day 3: Variables and Data Types appeared first on Exegetic Analytics.