Tag Archives: #MonthOfJulia

#MonthOfJulia Day 26: Statistics

Julia-Logo-Statistics

JuliaStats is a meta-project which consolidates various packages related to statistics and machine learning in Julia. Well worth taking a look if you plan on working in this domain.

julia> x = rand(10);
julia> mean(x)
0.5287191472784906
julia> std(x)
0.2885446536178459

Julia already has some builtin support for statistical operations, so additional packages are not strictly necessary. However they do increase the scope and ease of possible operations (as we’ll see below).Julia already has some builtin support for statistical operations. Let’s kick off by loading all the packages that we’ll be looking at today.

julia> using StatsBase, StatsFuns, StreamStats

StatsBase

The documentation for StatsBase can be found here. As the package name implies, it provides support for basic statistical operations in Julia.

High level summary statistics are generated by summarystats().

julia> summarystats(x)
Summary Stats:
Mean:         0.528719
Minimum:      0.064803
1st Quartile: 0.317819
Median:       0.529662
3rd Quartile: 0.649787
Maximum:      0.974760

Weighted versions of the mean, variance and standard deviation are implemented. There’re also geometric and harmonic means.

julia> w = WeightVec(rand(1:10, 10));      # A weight vector.
julia> mean(x, w)                          # Weighted mean.
0.48819933297961043
julia> var(x, w)                           # Weighted variance.
0.08303843715334995
julia> std(x, w)                           # Weighted standard deviation.
0.2881639067498738
julia> skewness(x, w)
0.11688162715805048
julia> kurtosis(x, w)
-0.9210456851144664
julia> mean_and_std(x, w)
(0.48819933297961043,0.2881639067498738)

There’s a weighted median as well as functions for calculating quantiles.

julia> median(x)                           # Median.
0.5296622773635412
julia> median(x, w)                        # Weighted median.
0.5729104703595038
julia> quantile(x)
5-element Array{Float64,1}:
 0.0648032
 0.317819 
 0.529662 
 0.649787 
 0.97476  
julia> nquantile(x, 8)
9-element Array{Float64,1}:
 0.0648032
 0.256172 
 0.317819 
 0.465001 
 0.529662 
 0.60472  
 0.649787 
 0.893513 
 0.97476  
julia> iqr(x)                              # Inter-quartile range.
0.3319677541313941

Sampling from a population is also catered for, with a range of algorithms which can be applied to the sampling procedure.

julia> sample(['a':'z'], 5)                # Sampling (with replacement).
5-element Array{Char,1}:
 'w'
 'x'
 'e'
 'e'
 'o'
julia> wsample(['T', 'F'], [5, 1], 10)     # Weighted sampling (with replacement).
10-element Array{Char,1}:
 'F'
 'T'
 'T'
 'T'
 'F'
 'T'
 'T'
 'T'
 'T'
 'T'

There’s also functionality for empirical estimation of distributions from histograms and a range of other interesting and useful goodies.

StatsFuns

The StatsFuns package provides constants and functions for statistical computing. The constants are by no means essential but certainly very handy. Take, for example, twoπ and sqrt2.

There are some mildly exotic mathematical functions available like logistic, logit and softmax.

julia> logistic(-5)
0.0066928509242848554
julia> logistic(5)
0.9933071490757153
julia> logit(0.25)
-1.0986122886681098
julia> logit(0.75)
1.0986122886681096
julia> softmax([1, 3, 2, 5, 3])
5-element Array{Float64,1}:
 0.0136809
 0.101089 
 0.0371886
 0.746952 
 0.101089 

Finally there is a suite of functions relating to various statistical distributions. The functions for the Normal distribution are illustrated below, but there’re functions for Beta and Binomial distribution, the Gamma and Hypergeometric distribution and many others. The function naming convention is consistent across all distributions.

julia> normpdf(0);                         # PDF
julia> normlogpdf(0);                      # log PDF
julia> normcdf(0);                         # CDF
julia> normccdf(0);                        # Complementary CDF
julia> normlogcdf(0);                      # log CDF
julia> normlogccdf(0);                     # log Complementary CDF
julia> norminvcdf(0.5);                    # inverse-CDF
julia> norminvccdf(0.99);                  # inverse-Complementary CDF
julia> norminvlogcdf(-0.693147180559945);  # inverse-log CDF
julia> norminvlogccdf(-0.693147180559945); # inverse-log Complementary CDF

StreamStats

Finally, the StreamStats package supports calculating online statistics for a stream of data which is being continuously updated.

julia> average = StreamStats.Mean()
Online Mean
 * Mean: 0.000000
 * N:    0
julia> variance = StreamStats.Var()
Online Variance
 * Variance: NaN
 * N:        0
julia> for x in rand(10)
       	update!(average, x)
       	update!(variance, x)
       	@printf("x = %3.f: mean = %.3f | variance = %.3fn", x, state(average),
                                                                state(variance))
       end
x = 0.928564: mean = 0.929 | variance = NaN
x = 0.087779: mean = 0.508 | variance = 0.353
x = 0.253300: mean = 0.423 | variance = 0.198
x = 0.778306: mean = 0.512 | variance = 0.164
x = 0.566764: mean = 0.523 | variance = 0.123
x = 0.812629: mean = 0.571 | variance = 0.113
x = 0.760074: mean = 0.598 | variance = 0.099
x = 0.328495: mean = 0.564 | variance = 0.094
x = 0.303542: mean = 0.535 | variance = 0.090
x = 0.492716: mean = 0.531 | variance = 0.080

In addition to the mean and variance illustrated above, the package also supports online versions of min() and max(), and can be used to generate incremental confidence intervals for Bernoulli and Poisson processes.

That’s it for today. Check out the full code on github and watch the video below.

The post #MonthOfJulia Day 26: Statistics appeared first on Exegetic Analytics.

#MonthOfJulia Day 25: Interfacing with Other Languages

Julia-Logo-Other-Languages

Julia has native support for calling C and FORTRAN functions. There are also add on packages which provide interfaces to C++, R and Python. We’ll have a brief look at the support for C and R here. Further details on these and the other supported languages can be found on github.

Why would you want to call other languages from within Julia? Here are a couple of reasons:

  • to access functionality which is not implemented in Julia;
  • to exploit some efficiency associated with another language.

The second reason should apply relatively seldom because, as we saw some time ago, Julia provides performance which rivals native C or FORTRAN code.

C

C functions are called via ccall(), where the name of the C function and the library it lives in are passed as a tuple in the first argument, followed by the return type of the function and the types of the function arguments, and finally the arguments themselves. It’s a bit klunky, but it works!

julia> ccall((:sqrt, "libm"), Float64, (Float64,), 64.0)
8.0

It makes sense to wrap a call like that in a native Julia function.

julia> csqrt(x) = ccall((:sqrt, "libm"), Float64, (Float64,), x);
julia> csqrt(64.0)
8.0

This function will not be vectorised by default (just try call csqrt() on a vector!), but it’s a simple matter to produce a vectorised version using the @vectorize_1arg macro.

julia> @vectorize_1arg Real csqrt;
julia> methods(csqrt)
# 4 methods for generic function "csqrt":
csqrt{T<:Real}(::AbstractArray{T<:Real,1}) at operators.jl:359
csqrt{T<:Real}(::AbstractArray{T<:Real,2}) at operators.jl:360
csqrt{T<:Real}(::AbstractArray{T<:Real,N}) at operators.jl:362
csqrt(x) at none:6

Note that a few extra specialised methods have been introduced and now calling csqrt() on a vector works perfectly.

julia> csqrt([1, 4, 9, 16])
4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0

R

I’ll freely admit that I don’t dabble in C too often these days. R, on the other hand, is a daily workhorse. So being able to import R functionality into Julia is very appealing. The first thing that we need to do is load up a few packages, the most important of which is RCall. There’s great documentation for the package here.

julia> using RCall
julia> using DataArrays, DataFrames

We immediately have access to R’s builtin data sets and we can display them using rprint().

julia> rprint(:HairEyeColor)
, , Sex = Male

       Eye
Hair    Brown Blue Hazel Green
  Black    32   11    10     3
  Brown    53   50    25    15
  Red      10   10     7     7
  Blond     3   30     5     8

, , Sex = Female

       Eye
Hair    Brown Blue Hazel Green
  Black    36    9     5     2
  Brown    66   34    29    14
  Red      16    7     7     7
  Blond     4   64     5     8

We can also copy those data across from R to Julia.

julia> airquality = DataFrame(:airquality);
julia> head(airquality)
6x6 DataFrame
| Row | Ozone | Solar.R | Wind | Temp | Month | Day |
|-----|-------|---------|------|------|-------|-----|
| 1   | 41    | 190     | 7.4  | 67   | 5     | 1   |
| 2   | 36    | 118     | 8.0  | 72   | 5     | 2   |
| 3   | 12    | 149     | 12.6 | 74   | 5     | 3   |
| 4   | 18    | 313     | 11.5 | 62   | 5     | 4   |
| 5   | NA    | NA      | 14.3 | 56   | 5     | 5   |
| 6   | 28    | NA      | 14.9 | 66   | 5     | 6   |

rcopy() provides a high-level interface to function calls in R.

julia> rcopy("runif(3)")
3-element Array{Float64,1}:
 0.752226
 0.683104
 0.290194

However, for some complex objects there is no simple way to translate between R and Julia, and in these cases rcopy() fails. We can see in the case below that the object of class lm returned by lm() does not diffuse intact across the R-Julia membrane.

julia> "fit <- lm(bwt ~ ., data = MASS::birthwt)" |> rcopy
ERROR: `rcopy` has no method matching rcopy(::LangSxp)
 in rcopy at no file
 in map_to! at abstractarray.jl:1311
 in map_to! at abstractarray.jl:1320
 in map at abstractarray.jl:1331
 in rcopy at /home/colliera/.julia/v0.3/RCall/src/sexp.jl:131
 in rcopy at /home/colliera/.julia/v0.3/RCall/src/iface.jl:35
 in |> at operators.jl:178

But the call to lm() was successful and we can still look at the results.

julia> rprint(:fit)

Call:
lm(formula = bwt ~ ., data = MASS::birthwt)

Coefficients:
(Intercept)          low          age          lwt         race  
    3612.51     -1131.22        -6.25         1.05      -100.90  
      smoke          ptl           ht           ui          ftv  
    -174.12        81.34      -181.95      -336.78        -7.58 

You can use R to generate plots with either the base functionality or that provided by libraries like ggplot2 or lattice.

julia> reval("plot(1:10)");                # Will pop up a graphics window...
julia> reval("library(ggplot2)");
julia> rprint("ggplot(MASS::birthwt, aes(x = age, y = bwt)) + geom_point() + theme_classic()")
julia> reval("dev.off()")                  # ... and close the window.

Watch the videos below for some other perspectives on multi-language programming with Julia. Also check out the complete code for today (including examples with C++, FORTRAN and Python) on github.



The post #MonthOfJulia Day 25: Interfacing with Other Languages appeared first on Exegetic Analytics.

#MonthOfJulia Day 24: Graphs

Julia-Logo-Graphs

If you’re not too familiar Graph Theory, then it might be an idea to take a moment to get the basics. Graphs are an extremely versatile data structure for storing data consisting of linked entities. I’m going to look at two packages for managing graphs in Julia: LightGraphs and Graphs.

LightGraphs

As usual, the first step is to load the package.

julia> using LightGraphs

LightGraphs has methods which generate a selection of standard graphs like StarGraph(), WheelGraph() and FruchtGraph(). There are also functions for random graphs, for example, erdos_renyi() and watts_strogatz(). We’ll start off by creating two small graphs. One will have 10 nodes connected by 20 random edges. The other will be a directed star graph consisting of four nodes, the central node being connected to every other node.

julia> g1 = Graph(10, 20)
{10, 20} undirected graph
julia> g2 = StarDiGraph(4)
{4, 3} directed graph
julia> edges(g2)
Set{Pair{Int64,Int64}}({edge 1 - 2,edge 1 - 4,edge 1 - 3})

It’s simple to find the degree and neighbours of a given node.

julia> degree(g1, 4)                       # How many neighbours for vertex 4?
6
julia> neighbors(g1, 4)                    # Find neighbours of vertex 4
6-element Array{Int64,1}:
 1
 3
 6
 2
 9
 7

There’s a straightforward means to add and remove edges from the graph.

julia> add_edge!(g1, 4, 8)                 # Add edge between vertices 4 and 8
edge 4 - 8
julia> rem_edge!(g1, 4, 6)                 # Remove edge between vertices 4 and 6
edge 6 - 4

The package has functionality for performing high level tests on the graph (checking, for instance, whether it is cyclic or connected). There’s also support for path based algorithms, but we’ll dig into those when we look at the Graphs package.

Graphs

Before we get started with the Graphs package you might want to restart your Julia session to purge all of that LightGraphs goodness. Take a moment to browse the Graphs.jl documentation, which is very comprehensive.

julia> using Graphs

As with LightGraphs, there are numerous options for generating standard graphs.

julia> g1a = simple_frucht_graph()
Undirected Graph (20 vertices, 18 edges)
julia> g1b = simple_star_graph(8)
Directed Graph (8 vertices, 7 edges)
julia> g1c = simple_wheel_graph(8)
Directed Graph (8 vertices, 14 edges)

Graphs uses the GraphViz library to generate plots.

julia> plot(g1a)

sample-graph

Of course, a graph can also be constructed manually.

julia> g2 = simple_graph(4)
Directed Graph (4 vertices, 0 edges)
julia> add_edge!(g2, 1, 2)
edge [1]: 1 -- 2
julia> add_edge!(g2, 1, 3)
edge [2]: 1 -- 3
julia> add_edge!(g2, 2, 3)
edge [3]: 2 -- 3

Individual vertices (a vertex is the same as a node) can be interrogated. Since we are considering a directed graph we look separately at the edges exiting and entering a node.

julia> num_vertices(g2)
4
julia> vertices(g2)
1:4
julia> out_degree(1, g2)
2
julia> out_edges(1, g2)
2-element Array{Edge{Int64},1}:
 edge [1]: 1 -- 2
 edge [2]: 1 -- 3
julia> in_degree(2, g2)
1
julia> in_edges(2, g2)
1-element Array{Edge{Int64},1}:
 edge [1]: 1 -- 2

Vertices can be created with labels and attributes.

julia> V1 = ExVertex(1, "V1");
julia> V1.attributes["size"] = 5.0
5.0
julia> V2 = ExVertex(2, "V2");
julia> V2.attributes["size"] = 3.0
3.0
julia> V3 = ExVertex(3, "V3")
vertex [3] "V3"

Those vertices can then be used to define edges, which in turn can have labels and attributes.

julia> E1 = ExEdge(1, V1, V2)
edge [1]: vertex [1] "V1" -- vertex [2] "V2"
julia> E1.attributes["distance"] = 50
50
julia> E1.attributes["color"] = "green"
"green"

Finally the collection of vertices and edges can be gathered into a graph.

julia> g3 = edgelist([V1, V2], [E1], is_directed = true)
Directed Graph (2 vertices, 1 edges)

It’s possible to systematically visit all connected vertices in a graph, applying an operation at every vertex. traverse_graph() performs the graph traversal using either a depth first or breadth first algorithm. In the sample code below the operation applied at each vertex is LogGraphVisitor(), which is a simple logger.

julia> traverse_graph(g1c, DepthFirst(), 1, LogGraphVisitor(STDOUT))
discover vertex: 1
examine neighbor: 1 -> 2 (vertexcolor = 0, edgecolor= 0)
discover vertex: 2
open vertex: 2
examine neighbor: 2 -> 3 (vertexcolor = 0, edgecolor= 0)
discover vertex: 3
open vertex: 3
examine neighbor: 3 -> 4 (vertexcolor = 0, edgecolor= 0)
discover vertex: 4
open vertex: 4
examine neighbor: 4 -> 5 (vertexcolor = 0, edgecolor= 0)
discover vertex: 5
open vertex: 5
examine neighbor: 5 -> 6 (vertexcolor = 0, edgecolor= 0)
discover vertex: 6
open vertex: 6
examine neighbor: 6 -> 7 (vertexcolor = 0, edgecolor= 0)
discover vertex: 7
open vertex: 7
examine neighbor: 7 -> 8 (vertexcolor = 0, edgecolor= 0)
discover vertex: 8
open vertex: 8
examine neighbor: 8 -> 2 (vertexcolor = 1, edgecolor= 0)
close vertex: 8
close vertex: 7
close vertex: 6
close vertex: 5
close vertex: 4
close vertex: 3
close vertex: 2
examine neighbor: 1 -> 3 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 4 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 5 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 6 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 7 (vertexcolor = 2, edgecolor= 0)
examine neighbor: 1 -> 8 (vertexcolor = 2, edgecolor= 0)
close vertex: 1

We can use Dijkstra’s Algorithm to calculate the distance from a given vertex to all other vertices in the graph. We see, for instance, that the distance from vertex 1 to vertex 4 is three steps. Since vertex 1 and vertex 20 are not connected, the distance between them is infinite. There are a couple of other algorithms available for calculating shortest paths.

julia> distances = ones(num_edges(g1a));   # Assign distance of 1 to each edge.
julia> d = dijkstra_shortest_paths(g1a, distances, 1);
julia> d.dists                             # Vector of distances to all other vertices.
20-element Array{Float64,1}:
   0.0
   1.0
   2.0
   3.0
   3.0
   2.0
   1.0
   1.0
   3.0
   4.0
   2.0
   2.0
 Inf  
 Inf  
 Inf  
 Inf  
 Inf  
 Inf  
 Inf  
 Inf  

As with the most of the packages that I have looked at already, the functionality summarised above is just a small subset of what’s available. Have a look at the home pages for these packages and check out the full code for today (which looks at a number of other features) on github. Some time in the future I plan on looking at the EvolvingGraphs which caters for graphs where the structure changes with time.

493-drawing-stars-0

The post #MonthOfJulia Day 24: Graphs appeared first on Exegetic Analytics.