Alvaro “Blag” Tejada Galindo | juliabloggers.com

Re-posted from: http://blagrants.blogspot.com/2014/05/julia-versus-r-playing-around.html

So…as time goes by, I’m getting more proficient with Julia…which is something fairly easy as the learning curve is pretty fast…

I decided to load a file with 590,209 records that I got from Freebase…the file in question contains Actors and Actresses from movies…you can have a quick look here…

For this test, I’m using my Linux box on VMWare running on 2 GB of RAM…running Ubuntu 12.04.4 (Precise)

For R, I’m not using any special package…just plain R…version 2.14.1 and for Julia version 0.2.1, I’m using the DataFrames package…

Let’s take a look at the R source code first along with its runtime processing…

Actors_Info.R
start.time <- Sys.time() if(!exists("Actors")){ Actors<-read.csv("Actors_Table.csv", header=TRUE, stringsAsFactors=FALSE, colClasses="character", na.strings = "") } Actors<-unique(Actors) Actors<-Actors[complete.cases(Actors),] Actor_Info<-data.frame(Actor_Id=Actors$Actor_Id,Name=Actors$Name,Gender=Actors$Gender) Actor_Info<-Actor_Info[order(Actor_Info$Gender),] write.csv(Actor_Info,"Actor_Info_R.csv",row.names=TRUE) end.time <- Sys.time() time.taken <- end.time - start.time time.taken

Actors_Info.R

start.time <- Sys.time()
if(!exists("Actors")){
Actors<-read.csv("Actors_Table.csv", header=TRUE,

                     stringsAsFactors=FALSE, colClasses="character", na.strings = "")
}
Actors<-unique(Actors)
Actors<-Actors[complete.cases(Actors),]
Actor_Info<-data.frame(Actor_Id=Actors$Actor_Id,Name=Actors$Name,Gender=Actors$Gender)
Actor_Info<-Actor_Info[order(Actor_Info$Gender),]
write.csv(Actor_Info,"Actor_Info_R.csv",row.names=TRUE)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

This source will first ask if the file was loaded already, if not…it will load it…then, it will eliminate the repeated records, delete all the null or NA’s and the create a new Data Frame, sort it by “Gender” and then write a new CSV file…time will be taken to measure its speed…we will run it twice…first time the file is not loaded…second time it will…and that should improve greatly the execution time…

As we can see…the times are really good…and the different between the first and second run are pretty obvious…for the record…the generated file contains 105874 records…

Now…let’s see the Julia version of the code…

Actors_Info.jl
using DataFrames start = time() isdefined(:Actors) \|\| (Actors = readtable("Actors_Table.csv", header=true, nastrings=["","NA"])) drop_duplicates!(Actors) complete_cases!(Actors) Actor_Info = DataFrame(Actor_Id=Actors["Actor_Id"],Name=Actors["Name"],Gender=Actors["Gender"]) sortby!(Actor_Info, [:Gender]) writetable("Actor_Info_Julia.csv", Actor_Info) finish = time() println("Time: ", finish-start)

Actors_Info.jl

using DataFrames
start = time()
isdefined(:Actors) || (Actors = readtable("Actors_Table.csv", header=true, nastrings=["","NA"]))
drop_duplicates!(Actors)
complete_cases!(Actors)
Actor_Info = DataFrame(Actor_Id=Actors["Actor_Id"],Name=Actors["Name"],Gender=Actors["Gender"])
sortby!(Actor_Info, [:Gender])
writetable("Actor_Info_Julia.csv", Actor_Info)
finish = time()
println("Time: ", finish-start)

Here…we’re doing the same…we load the DataFrames package (But exclude that from the execution time), check if the file is loaded so we don’t load it again on the second run…eliminate duplicates, delete all null or NA, create a new DataFrame, sort it by “Gender” and finally write a new CVS file…

Well…the difference between the second and first run is very significative…but of course…way slower than R…

But…let me tell you one simple thing…Julia is still a brand new language…the DataFrames package is not part of the core Julia language, which means…that its even newer…and optimizations are being performed as we speak…I would say that for a young language…18 seconds to process 590,209 records is pretty awesome…and of course…my R experience surpasses greatly my Julia experience…

So…I don’t really want to leave you with the impression that Julia is not good or not fast enough…because believe me…it is…and you going to love my next experiment -;)

Let’s take a look at the R source code first…

Random_Names.R
start.time <- Sys.time() names<-c("Anne","Gigi","Blag","Juergen","Marek","Ingo","Lars","Julia", "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven") last_names<-c("Hardy","Read","Tejada","Schmerder","Kowalkiewicz","Sauerzapf", "Karg","Satsuta","Keene","Ongkowidjojo","Vayssiere","Kylau", "Fenlon","Flynn","Taylor") full_names<-c() for(i in 1:100000){ name<-sample(1:15, 1) last_name<-sample(1:15, 1) full_name<-paste(names[name],last_names[last_name],sep=" ") full_names<-append(full_names,full_name) } end.time <- Sys.time() time.taken <- end.time - start.time time.taken

Random_Names.R

start.time <- Sys.time()
names<-c("Anne","Gigi","Blag","Juergen","Marek","Ingo","Lars","Julia",
         "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven")

last_names<-c("Hardy","Read","Tejada","Schmerder","Kowalkiewicz","Sauerzapf",
              "Karg","Satsuta","Keene","Ongkowidjojo","Vayssiere","Kylau",
              "Fenlon","Flynn","Taylor")
full_names<-c()
for(i in 1:100000){
  name<-sample(1:15, 1)
  last_name<-sample(1:15, 1)
  full_name<-paste(names[name],last_names[last_name],sep=" ")
  full_names<-append(full_names,full_name)
}
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

So this code is fairly simple…we have a couple of vectors with names and last names…then we loop 100000 times and then generate a couple of random numbers simply to read the vectors, create a full name and populate a new vector… with some random funny name combinations…

Well….the different between both runs is not really good…second time was a little bit higher…and 1 minute is kind of a lot…let’s see how Julia behaves…

Here’s the Julia source code…

Random_Numbers.jl
start = time() names=["Anne","Gigi","Blag","Juergen","Marek","Ingo","Lars","Julia", "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven"] last_names=["Hardy","Read","Tejada","Schmerder","Kowalkiewicz","Sauerzapf", "Karg","Satsuta","Keene","Ongkowidjojo","Vayssiere","Kylau","Fenlon","Flynn","Taylor"] full_names=String[] full_name = "" for i = 1:100000 name=rand(1:15) last_name=rand(1:15) full_name = names[name] * " " * last_names[last_name] push!(full_names,full_name) end finish = time() println("Time: ", finish-start)

Random_Numbers.jl

start = time()
names=["Anne","Gigi","Blag","Juergen","Marek","Ingo","Lars","Julia",
       "Danielle","Rocky","Julien","Uwe","Myles","Mike", "Steven"]
last_names=["Hardy","Read","Tejada","Schmerder","Kowalkiewicz","Sauerzapf",
            "Karg","Satsuta","Keene","Ongkowidjojo","Vayssiere","Kylau","Fenlon","Flynn","Taylor"]
full_names=String[]
full_name = ""
for i = 1:100000
        name=rand(1:15)
        last_name=rand(1:15)
        full_name = names[name] * " " * last_names[last_name]
        push!(full_names,full_name)
end
finish = time()
println("Time: ", finish-start)

So this code as well, creates two arrays with names and last names, do a loop 100000 times, generate a couple of random numbers, mix a name with a last name and then populate a new array with some mixed full names…

Just like in the R code…the second time took Julia a little bit more…but…less than a second?! That’s something like…amazingly fast and really took R by storm…

Now…I believe you will start to take Julia more seriously -:D

Hope you liked this blog…

Greetings,

Blag.

Development Culture.

By: Alvaro "Blag" Tejada Galindo

Re-posted from: http://blagrants.blogspot.com/2014/05/my-first-post-on-julia.html

So…what Julia? Just another nice programming language -;)

According to it’s creators…

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

I just started learning it a couple of days ago…and I must say that I really like it…it has a Python like syntax so I felt comfortable from the very start…

Of course…it’s kind of a brand new language, so things are being added and fixed while we speak…but the community is growing and I’m glad to be amongst it’s “early” supporters -:)

What I did right after I read the documentation and watch a couple of videos was to simply port one my old Python applications to Julia…the app was “LCD Numbers” which ask for a number and return it printed like in LCD format…

This is the Python code…

LCD_Numbers.py
global line1, line2, line3 line1 = "" line2 = "" line3 = "" zero = {1: ' _ ', 2: '\| \| ', 3: '\|_\| '} one = {1: ' ', 2: '\| ', 3: '\| '} two = {1: ' _ ', 2: ' _\| ', 3: '\|_ '} three = {1: '_ ', 2: '_\| ', 3: '_\| '} four = {1: ' ', 2: '\|_\| ', 3: ' \| '} five = {1: ' _ ', 2: '\|_ ', 3: ' _\| '} six = {1: ' _ ', 2: '\|_ ', 3: '\|_\| '} seven = {1: '_ ', 2: ' \| ', 3: ' \| '} eight = {1: ' _ ', 2: '\|_\| ', 3: '\|_\| '} nine = {1: ' _ ', 2: '\|_\| ', 3: ' _\| '} num_lines = {0: zero, 1: one, 2: two, 3: three, 4: four, 5: five, 6: six, 7: seven, 8: eight, 9: nine} def Lines(number): global line1, line2, line3 line1 += number.get(1, 0) line2 += number.get(2, 0) line3 += number.get(3, 0) number = str(input("\nEnter a number: ")) length = len(number) for i in range(0, length): Lines(num_lines.get(int(number[i:i+1]), 0)) print ("\n") print line1 print line2 print line3 print ("\n")

LCD_Numbers.py

global line1, line2, line3

line1 = ""
line2 = ""
line3 = ""

zero = {1: ' _  ', 2: '| | ', 3: '|_| '}
one = {1: '  ', 2: '| ', 3: '| '}
two = {1: ' _  ', 2: ' _| ', 3: '|_  '}
three = {1: '_  ', 2: '_| ', 3: '_| '}
four = {1: '    ', 2: '|_| ', 3: '  | '}
five = {1: ' _  ', 2: '|_  ', 3: ' _| '}
six = {1: ' _  ', 2: '|_  ', 3: '|_| '}
seven = {1: '_   ', 2: ' |  ', 3: ' |  '}
eight = {1: ' _  ', 2: '|_| ', 3: '|_| '}
nine = {1: ' _  ', 2: '|_| ', 3: ' _| '}

num_lines = {0: zero, 1: one, 2: two, 3: three, 4: four,
             5: five, 6: six, 7: seven, 8: eight, 9: nine}

def Lines(number):
    global line1, line2, line3
    line1 += number.get(1, 0)
    line2 += number.get(2, 0)
    line3 += number.get(3, 0)

number = str(input("\nEnter a number: "))
length = len(number)
for i in range(0, length):
    Lines(num_lines.get(int(number[i:i+1]), 0))

print ("\n")
print line1
print line2
print line3
print ("\n")

And this is in turn…the Julia version of it…

LCD_Numbers.jl
zero = [1=> " _ ", 2=> "\| \| ", 3=> "\|_\| "] one = [1=> " ", 2=> "\| ", 3=> "\| "] two = [1=> " _ ", 2=> " _\| ", 3=> "\|_ "] three = [1=> "_ ", 2=> "_\| ", 3=> "_\| "] four = [1=> " ", 2=> "\|_\| ", 3=> " \| "] five = [1=> " _ ", 2=> "\|_ ", 3=> " _\| "] six = [1=> " _ ", 2=> "\|_ ", 3=> "\|_\| "] seven = [1=> "_ ", 2=> " \| ", 3=> " \| "] eight = [1=> " _ ", 2=> "\|_\| ", 3=> "\|_\| "] nine = [1=> " _ ", 2=> "\|_\| ", 3=> " _\| "] num_lines = [0=> zero, 1=> one, 2=> two, 3=> three, 4=> four, 5=> five, 6=> six, 7=> seven, 8=> eight, 9=> nine] line = ""; line1 = ""; line2 = ""; line3 = "" function Lines(number, line1, line2, line3) line1 = number[1] line2 = number[2] line3 = number[3] line1, line2, line3 end println("Enter a number: "); number = chomp(readline(STDIN)) len = length(number) for i in [1:len] line = Lines(num_lines[parseint(string(number[i]))],line1,line2,line3) line1 = line[1]; line2 = line[2]; line3 = line[3] end println(line1) println(line2) println(line3 "\n")

LCD_Numbers.jl

zero = [1=> " _  ", 2=> "| | ", 3=> "|_| "]
one = [1=> "  ", 2=> "| ", 3=> "| "]
two = [1=> " _  ", 2=> " _| ", 3=> "|_  "]
three = [1=> "_  ", 2=> "_| ", 3=> "_| "]
four = [1=> "    ", 2=> "|_| ", 3=> "  | "]
five = [1=> " _  ", 2=> "|_  ", 3=> " _| "]
six = [1=> " _  ", 2=> "|_  ", 3=> "|_| "]
seven = [1=> "_   ", 2=> " |  ", 3=> " |  "]
eight = [1=> " _  ", 2=> "|_| ", 3=> "|_| "]
nine = [1=> " _  ", 2=> "|_| ", 3=> " _| "]

num_lines = [0=> zero, 1=> one, 2=> two, 3=> three, 4=> four,
             5=> five, 6=> six, 7=> seven, 8=> eight, 9=> nine]

line = ""; line1 = ""; line2 = ""; line3 = ""

function Lines(number, line1, line2, line3)
    line1 *= number[1]
    line2 *= number[2]
    line3 *= number[3]
    line1, line2, line3
end

println("Enter a number: "); number = chomp(readline(STDIN))
len = length(number)
for i in [1:len]
    line = Lines(num_lines[parseint(string(number[i]))],line1,line2,line3)
    line1 = line[1]; line2 = line[2]; line3 = line[3]
end

println(line1)
println(line2)
println(line3 * "\n")

As you can see…the code looks somehow similar…but of course…I got rid of those ugly global variables…and used some of the neat Julia features, like multiple value return and variable definition on one line… If you want to see the output…here it is…

Of course…this is just a test…things are going to become interesting when I port some R code into Julia and run some speed comparisons -;)

Greetings,

Blag.
Development Culture.

juliabloggers.com

A Julia Language Blog Aggregator

Author Archives: Alvaro "Blag" Tejada Galindo

Julia versus R – Playing around

My first post on Julia