Tag Archives: Programming

Julia and Apache

By: Terence Copestake

Re-posted from: http://thenewphalls.wordpress.com/2014/02/15/julia-and-apache/

Click here if you’re just looking for a “how to”

Julia caught my attention recently. As I understand it, it’s intended to be a fast, all-purpose language, so that we need not have one language for scripting, one for parallel programming, another for mathematics, etc. As someone who works primarily with the web – an area where Julia is reportedly lacking – I decided to see how it would fare in that arena.

I’ll clarify this early: Julia is still in its infancy. The stable release at the time of writing is version 0.2. It’s likely that difficulties like those I encountered will be ironed out as the project evolves.

I figured (very wrongly) that the quickest and easiest way to get up and running would be to configure Apache to run .jl files as CGI. That didn’t work out the way that I hoped it would – and after an hour of basic debugging, it became clear that I had two choices: 1) use a Julia HTTP server or 2) potentially spend hours of my life figuring out why it wasn’t working – and just maybe find a workaround. If you’re familiar with this blog, you’ll know which option I went for.

Debugging – Apache

Apache was giving me the “End of script output before headers” 500 error page. The Julia code I was testing was supposed to write logs to a file to confirm that it was running, which worked fine in the console window, but there was no file activity when I loaded the page in my browser, suggesting the program/script wasn’t even being executed.

I ruled out file permissions as a cause. The CGI error log was useless (for reasons which now make sense). I was flying blind – so I tried something else…

Debugging – CGI proxy

I wrote a C program to be invoked by Apache instead of Julia. The proxy would execute Julia itself, giving me a console window with some I/O so that I could hopefully get a better idea of what was going on. Using this, I was able to identify two issues:

Issue 1: Environment variables.

Julia is hard-coded to access the APPDATA environment variable. This isn’t an issue when running from the console, but the environment created by mod_cgi doesn’t contain this variable. I was able to work around this by having an APPDATA variable be set in either the .htaccess or httpd.conf using SetEnv (see far below).

Issue 2: Julia’s bizarre I/O

I don’t mind admitting that I’m unfamiliar with the I/O library used by Julia. However, given some of the errors I was getting in my console window (inability to open I/O pipe, unsupported stdio handles, etc) and one of the pull requests on the repository, I figured that Julia 0.2 must just be incompatible with the way mod_cgi was trying to redirect the child process’ I/O. It was with this that I decided to try the (potentially unstable) build of Julia 0.3.

Julia 0.3

At the time of writing, version 0.3 is a prerelease version – which is why I first chose to use 0.2.

Using my CGI proxy, I was able to see that Julia 0.3 requires two more environment variables: HOMEDRIVE and HOMEPATH. These can be set in either the httpd.conf or .htaccess as with APPDATA.

I could also see straight away that 0.3 had a better rapport with I/O handles. After some more experimentation, I felt confident enough to direct mod_cgi to invoke julia-basic.exe directly.

Putting it all together…

Download Julia 0.3+

Configure Apache to handle .jl files as CGI scripts and set the required environment variables (APPDATA, HOMEDRIVE, HOMEPATH). Below is an example from my own httpd-vhosts.conf:

<VirtualHost localhost:80>
    Options +ExecCGI
    AddHandler cgi-script .jl
    SetEnv APPDATA "G:/Dev/appdata"
    SetEnv HOMEDRIVE "C:"
    SetEnv HOMEPATH "/Users/Phalls"
    ScriptLog logs/julia_cgi.log
    DocumentRoot "D:/htdocs"
    ServerName localhost
</VirtualHost>

Restart Apache (or just reload config if you have that option).

… and that should be all there is to it. As with the likes of Perl and Ruby, the first line of your .jl files should be the path to the Julia binary. I chose to use julia-basic.exe, but I also tried it with julia-readline.exe and it was no different.

Below is some example code:

#!"G:\\Julia3\\bin\\julia-basic"
redirect_stderr(STDOUT)
print("Content-Type: text/html\n\n")
print("<html>
 <body>
 <b>Hello, world!</b>
 </body>
</html>")

(language=”scala” is the closest I could get to Julia for syntax highlighting)

On line 2 (i.e. 1) I redirect STDERR to STDOUT because errors (e.g. syntax errors) sent to STDERR cause Apache to hang and display a 500 error page. Redirecting to STDOUT shows the error message on the page, which is cleaner and more helpful for debugging. It may even be possible to redirect STDERR to something else, such as a stream or an IOBuffer where it can be handled in a more controlled manner.

Line 3 (i.e. 2) just outputs a header. Bear in mind that, with Julia not being strictly CGI-compatible, that becomes your app’s responsibility.

Next on the “to do” list:

  • Compare performance vs Ruby and PHP
  • Compare Apache/CGI performance vs HttpServer.jl
  • Write basic web framework (despite this)

Estimated delivery date: I’ll get to it when I get to it.

Julia language documentation

Slowly moving to the Julia language from R

Julia is a new computing language that’s gotten a lot of attention lately (e.g., this Wired piece) and that I’ve ignored until recently. But I checked it out a few days ago and, holy crap, it’s a nice language. I’m rewriting the code in my book to use Julia instead of R and I’m almost certainly going to use it instead of R in my PhD class next fall.

So, why Julia and why not R? (And, I suppose why not Python/Matlab/other languages?)

  • Multiple dispatch. So you can define a function
    function TrickyAlgorithm(aThing, bThing)

    differently to depend on whether aThing is a matrix of real numbers and bThing is is a vector, or aThing is a vector and bThing is a matrix, or any other combinations of data types. And you can do this without lots of tedious, potentially slow, and confusing (to people reading and maintaining the code) argument checks and conversion within the function.

    Note that this is kind of similar to Object Oriented Programming, but in OOP TrickyAlgorithm would need to be a method of aThing or bThing. Also note that this is present in R as well.

  • Homoiconicity — the code can be operated on by other parts of the code. Again, R kind of has this too! Kind of, because I’m unaware of a good explanation for how to use it productively, and R’s syntax and scoping rules make it tricky to pull off. But I’m still excited to see it in Julia, because I’ve heard good things about macros and I’d like to appreciate them (I’ve started playing around with Clojure and like it a lot too…). And because stuff like this is amazing:
    @devec r = exp(-abs(x-y))

    which devectorizes x and y (both vectors) and evaluates as

    for i = 1:length(x)
        r[i] = exp(-abs(x[i]-y[i]))
    end
    

    (this example and code is from Dahua Lin’s blog post, Fast Numeric Computation in Julia). Note that “evaluates as” does not mean “gives the same answer as,” it means that the code r = exp(-abs(x-y)) is replaced with the loop by @devec and then the loop is what’s run.

  • Decent speed. Definitely faster than well written R; I don’t have a great feel for how well it compares to highly optimized R (using inline C++, for example), but I write one-off simulation programs and don’t write highly optimized R.And the language encourages loops, which is a relief. R discourages loops and encourages “vectorized” operations that operate on entire objects at once (which are then converted to fast loops in C…). But I use loops all the time anyway, because avoiding loops in time series applications is impossible. R’s poor support for recursion doesn’t help either.And, more to the point, I teach econometrics to graduate students. Many of them haven’t programmed before. Most of them are not going to write parts of their analysis in C++.
  • The syntax is fine and unthreatening, which will help for teaching. It basically looks like Matlab done right. Matlab’s not a bad language because its programs look like they’re built out of Legos, it’s a bad language because of its horrendous implementation of functions, anonymous functions, objects, etc. Compared to R, Matlab and Julia look downright friendly. Compared to Clojure… I can’t even imagine asking first year PhD students (some with no programming experience at all) to work with a Lisp.
  • The last point that’s always mentioned in these language comparisons. What about all of the R packages? There are thousands and thousands of statistical packages coded up for R, and you’re giving that up by moving to a different language.This is apparently a big concern for a lot of people, but… have you looked at the source code for these packages? Most of them are terrible! But some are good, and it might take some time to port them to Julia. Not that much time, I think, because most high-performance popular R packages are a thin layer of interoperability over a fast implementation in C or C++, so the port is just a matter of wrapping it up for Julia. And most of the well designed packages are tools for other package developers.That’s not quite true of R’s statistical graphics, though. They’re really great and could be hard to port. And that’s more or less the only thing that I’m sure that I’ll miss in Julia. (But hopefully not for too long.)
  • Lastly, and this is important: the same massive quantity of packages for R is a big constraint on its future development. Breaking backwards compatibility is a big deal but avoiding it too much imposes costs.

Anyway, since I converted some R code to Julia I thought it would be fun to compare speeds. The first example is used to show the sampling distribution of an average of uniform(0,1) random variables. In R, we have

rstats <- function(rerror, nobs, nsims = 500) {
  replicate(nsims, mean(rerror(nobs)))}

which is (I think) pretty idiomatic R (and is vectorized, so it’s supposed to be fast). Calling it gives

R> system.time(rstats(runif, 500))

[out]:  user  system elapsed 
       0.341   0.002   0.377

For comparison to the Julia results, we’re going to care about the “elapsed” result of 0.377 seconds; the “system” column isn’t relevant here.  Calling it for more observations and more simulations (50,000 of each) gives

R> system.time(rstats(runif, 50000, 50000))

[out]:   user  system elapsed 
      204.184   0.217 215.526

so 216 seconds overall. And, just to preempt criticism, I ran these simulations a few times each and these results are representative; and I ran a byte-compiled version that got (unexpectedly) slightly worse performance.

Equivalent Julia code is

function rmeans(dist, nobs; nsims = 500)
    means = Array(Float64,nsims)
    for i in 1:nsims
        means[i] = mean(rand(dist, nobs))
    end
    return means
end

which is pretty easy to read, but I have no idea if it’s idiomatic. This is my first code in Julia. If you like to minimize lines of code and preallocation of arrays, Julia has list comprehensions and you can write the stylish one line definition (that gave similar times)

rmeans_pretty(dist, nobs; nsims = 500) =
    [ mean(rand(dist, nobs)) for i = 1:nsims ]

We can time  (after loading the Distributions packages):

julia> @elapsed rmeans(Uniform(), 500)

[out]: 0.093662961

so 0.09 seconds, or about a quarter the time as R. But (I forgot to mention earlier), Julia uses a Just In Time compiler, so the 0.09 seconds includes compilation and execution. Running it a second time gives

julia>  @elapsed rmeans(Uniform(), 500)

[out]: 0.004334132

which is half the time again. (Update on 3/17: as Jules pointed out in the comments, 0.004 is 1/20th of 0.09, so this is substantially faster than I’d initially thought. So we are getting into the ~100 times faster range. That’s actually a pretty exciting speed increase, but I’ll need to look into it some more. Well, that was embarrassing.)

Running the larger simulation, we have

julia> @elapsed rmeans_loop(Uniform(), 50000, nsims = 50000)

[out]: 77.318591953

so the R code is a little less than three times slower here. (The compilation step doesn’t make a meaningful difference.) So, Julia isn’t hundreds of times faster, but it is noticeably faster than R, which is nice.

But speed in this sort of test isn’t the main factor. I’m really excited about multiple dispatch — it’s one of the few things in R that I really, really liked from a language standpoint. I really like what I’ve read about Julia’s support for parallelism (but need to learn more). And I like metaprogramming, even if I can’t do it myself yet. So Julia’s trying to be a fast, easy to learn, and elegantly designed language. That’s awesome. I want it to work.

ps: and it’s open source! Can’t forget that.

Filed under: Blog Tagged: julia, programming, R

The Relationship between Vectorized and Devectorized Code

By: John Myles White

Re-posted from: http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/

Introduction

Some people have come to believe that Julia’s vectorized code is unusably slow. To correct this misconception, I outline a naive benchmark below that suggests that Julia’s vectorized code is, in fact, noticeably faster than R’s vectorized code. When experienced Julia programmers suggest that newcomers should consider devectorizing code, we’re not trying to beat R’s speed — our vectorized code does that already. Instead, we’re trying to match C’s speed.

As the examples below indicate, a little bit of devectorization goes a long way towards this loftier goal. In the specific examples I show, I find that:

  • Julia’s vectorized code is 2x faster than R’s vectorized code
  • Julia’s devectorized code is 140x faster than R’s vectorized code
  • Julia’s devectorized code is 1350x faster than R’s devectorized code

Examples of Vectorized and Devectorized Code in R

Let’s start by contrasting two pieces of R code: a vectorized and a devectorized implementation of a trivial snippet of code that does repeated vector addition.

First, we consider an example of idiomatic, vectorized R code:

vectorized <- function()
{
    a <- c(1, 1)
    b <- c(2, 2)
    x <- c(NaN, NaN)

    for (i in 1:1000000)
    {
        x <- a + b
    }

    return()
}

time <- function (N)
{
    timings <- rep(NA, N)

    for (itr in 1:N)
    {
        start <- Sys.time()
        vectorized()
        end <- Sys.time()
        timings[itr] <- end - start
    }

    return(timings)
}

mean(time(10))

This code takes, on average, 0.49 seconds per iteration to compute 1,000,000 vector additions.

Having considered the vectorized implementation, we can then consider an unidiomatic devectorized implementation of the same operation in R:

devectorized <- function()
{
    a <- c(1, 1)
    b <- c(2, 2)
    x <- c(NaN, NaN)

    for (i in 1:1000000)
    {
        for (index in 1:2)
        {
            x[index] <- a[index] + b[index]
        }
    }

    return()
}

time <- function (N)
{
    timings <- rep(NA, N)

    for (itr in 1:N)
    {
        start <- Sys.time()
        devectorized()
        end <- Sys.time()
        timings[itr] <- end - start
    }

    return(timings)
}

mean(time(10))

This takes, on average, 4.72 seconds per iteration to compute 1,000,000 vector additions.

Examples of Vectorized and Devectorized Code in Julia

Let’s now consider two Julia implementations of this same snippet of code. We’ll start with a vectorized implementation:

function vectorized()
    a = [1.0, 1.0]
    b = [2.0, 2.0]
    x = [NaN, NaN]

    for i in 1:1000000
        x = a + b
    end

    return
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    vectorized()

    for itr in 1:N
        timings[itr] = @elapsed vectorized()
    end

    return timings
end

mean(time(10))

This takes, on average, 0.236 seconds per iteration to compute 1,000,000 vector additions.

Next, let’s consider a devectorized implementation of this same snippet:

function devectorized()
    a = [1.0, 1.0]
    b = [2.0, 2.0]
    x = [NaN, NaN]

    for i in 1:1000000
        for index in 1:2
            x[index] = a[index] + b[index]
        end
    end

    return
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    devectorized()

    for itr in 1:N
        timings[itr] = @elapsed devectorized()
    end

    return timings
end

mean(time(10))

This takes, on average, 0.0035 seconds per iteration to compute 1,000,000 vector additions.

Comparing Performance in R and Julia

We can summarize the results of the four examples above in a single table:

Approach Language Average Time
Vectorized R 0.49
Devectorized R 4.72
Vectorized Julia 0.24
Devectorized Julia 0.0035

All of these examples were timed on my 2.9 GHz Intel Core i7 MacBook Pro. The results are quite striking: Julia is uniformly faster than R. And a very small bit of devectorization produces huge performance improvements. Of course, it would be nice if Julia’s compiler could optimize vectorized code as well as it optimizes devectorized code. But doing so requires a substantial amount of work.

Why is Optimizing Vectorized Code Hard?

What makes automatic devectorization tricky to get right is that even minor variants of the snippet shown above have profoundly different optimization strategies. Consider, for example, the following two snippets of code:

function vectorized2()
    a = [1.0, 1.0]
    b = [2.0, 2.0]

    res = {}

    for i in 1:1000000
        x = [rand(), rand()]
        x += a + b
        push!(res, x)
    end

    return res
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    vectorized2()

    for itr in 1:N
        timings[itr] = @elapsed vectorized2()
    end

    return timings
end

mean(time(10))

This first snippet takes 1.29 seconds on average.

function devectorized2()
    a = [1.0, 1.0]
    b = [2.0, 2.0]

    res = {}

    for i in 1:1000000
        x = [rand(), rand()]
        for dim in 1:2
            x[dim] += a[dim] + b[dim]
        end
        push!(res, x)
    end

    return res
end

function time(N)
    timings = Array(Float64, N)

    # Force compilation
    devectorized2()

    for itr in 1:N
        timings[itr] = @elapsed devectorized2()
    end

    return timings
end

mean(time(10))

This second snippet takes, on average, 0.27 seconds.

The gap between vectorized and devectorized code is much smaller here because this second set of code snippets uses memory in a very different way than our original snippets did. In the first set of snippets, it was possible to entirely avoid allocating any memory for storing changes to x. The devectorized code for the first set of snippets explicitly made clear to the compiler that no memory needed to be allocated. The vectorized code did not make this clear. Making it clear that no memory needed to be allocated led to a 75x speedup. Explicitly telling the compiler what it can avoid spending time on goes a long way.

In contrast, in the second set of snippets, a new chunk of memory has to be allocated for every x vector that gets created. And the result is that even the devectorized variant of our second snippet cannot offer much of a performance boost over its vectorized analogue. The devectorized variant is slightly faster because it avoids allocating any memory during the steps in which x has a and b added to it, but this makes less of a difference when there is still a lot of other work being done that cannot be avoided by devectorizing operations.

This reflects a more general statement: the vectorization/devectorization contrast is only correlated, not causally related, with the actual performance characteristics of code. What matters for computations that take place on modern computers is the efficient utilization of processor cycles and memory. In many real examples of vectorized code, it is memory management, rather than vectorization per se, that is the core causal factor responsible for performance.

The Reversed Role of Vectorization in R and Julia

Part of what makes it difficult to have a straightforward discussion about vectorization is that vectorization in R conflates issues that are logically unrelated. In R, vectorization is often done for both (a) readability and (b) performance. In Julia, vectorization is only used for readability; it is devectorization that offers superior performance.

This confuses some people who are not familiar with the internals of R. It is therefore worth noting how one improves the speed of R code. The process of performance improvement is quite simple: one starts with devectorized R code, then replaces it with vectorized R code and then finally implements this vectorized R code in devectorized C code. This last step is unfortunately invisible to many R users, who therefore think of vectorization per se as a mechanism for increasing performance. Vectorization per se does not help make code faster. What makes vectorization in R effective is that it provides a mechanism for moving computations into C, where a hidden layer of devectorization can do its mgic.

In other words, R is doing exactly what Julia is doing to get better performance. R’s vectorized code is simply a thin wrapper around completely devectorized C code. If you don’t believe me, go read the C code for something like R’s distance function, which involves calls to functions like the following:

static double R_euclidean(double *x, int nr, int nc, int i1, int i2)
{
    double dev, dist;
    int count, j;

    count= 0;
    dist = 0;
    for(j = 0 ; j < nc ; j++) {
    if(both_non_NA(x[i1], x[i2])) {
        dev = (x[i1] - x[i2]);
        if(!ISNAN(dev)) {
        dist += dev * dev;
        count++;
        }
    }
    i1 += nr;
    i2 += nr;
    }
    if(count == 0) return NA_REAL;
    if(count != nc) dist /= ((double)count/nc);
    return sqrt(dist);
}

It is important to keep this sort of thing in mind: the term vectorization in R actually refers to a step in which you write devectorized code in C. Vectorization, per se, is a red herring when reasoning about performance.

To finish this last point, let’s summarize the performance hierarchy for R and Julia code in a simple table:

Worst Case Typical Case Best Case
Julia Vectorized Code Julia Devectorized Code
R Devectorized Code R Vectorized Code C Devectorized Code

It is the complete absence of one column for Julia that makes it difficult to compare vectorization across the two languages. Nothing in Julia is as bad as R’s devectorized code. On the other end of the spectrum, the performance of Julia’s devectorized code simply has no point of comparison in pure R: it is more similar to the C code used to power R behind the scenes.

Conclusion

Julia aims to (and typically does) provide vectorized code that is efficient as the vectorized code available in other high-level languages. What sets Julia apart is the possibility of writing, in pure Julia, high performance code that uses CPU and memory resources as effectively as can be done in C.

In particular, vectorization and devectorization stand in the opposite relationship to one another in Julia as they do in R. In R, devectorization makes code unusably slow: R code must be vectorized to perform at an acceptable level. In contrast, Julia programmers view vectorized code as a convenient prototype that can be modified with some clever devectorization to produce production-performance code. Of course, we would like prototype code to perform better. But no popular language offers that kind of functionality. What Julia offers isn’t the requirement for devectorization, but the possibility of doing it in Julia itself, rather than in C.