Author Archives: Julia Computing, Inc.

Julia and Spark, Better Together

The use of Apache Spark as a distributed data and computation engine has grown rapidly in recent times. Leveraging the Hadoop ecosystem, enterprise workloads have swiftly migrated to Spark. Hosted Spark instances from AWS and Azure have made it even easier to get started, and to run large, on-demand clusters for dynamic workloads.

Scala, the primary language of Spark, is not everyone’s cup of tea when it comes to numeric computing problems. Mostly arising out the JVM, problems include floating point inaccuracy, lack of performance on user defined mathematical constructs, and library support for complex optimisation or linear algebra routines.

Being built for numerical computing, Julia is however perfectly suited to create fast and accurate numerical applications, while leveraging the large scale data handling capabilities of the Spark platform.

Spark.jl

The Spark.jl package, created by Andrei Zhabinsky, with subsequent contributions by a larger worldwide group of developers, enables the use of Julia programs on Spark. It allows you to connect to a Spark cluster from the Julia REP and load data and submit jobs. The typical operating model involves creating a Spark RDD by loading file, or from any Julia iterator. Then, Julia functions can be applied to the RDD using the standard Spark verbs, all from within Julia. This first class integration is enabled via the JavaCall julia package that allow interoperability of Julia and Java codebases.

As an example, a typical session to compute a distributed wordcount (the “Hello World” of distributed computing) from Julia would look like this (all code typed in the Julia REPL)

using Spark
Spark.init()
sc = SparkContext(master="local")
text = parallelize(sc, ["hello world", "the world is one", "we are the world"])
words = flat_map(text, split)
words_tuple = cartesian(words, parallelize(sc, [1]))
counts = reduce_by_key(words_tuple, +)
result = collect(counts)

   7-element Array{Any,1}:
   ("are", 1)
   ("is", 1)
   ("one", 1)
   ("we", 1)
   ("hello", 1)
   ("world", 3)
   ("the", 2)

A second example shows the code to calculate using a simple Monte Carlo method.

NUM_SAMPLES = 10000
samples = parallelize(sc, 1:NUM_SAMPLES)
c = filter(samples, (_)->begin;x=rand(2); x[1]^2 + x[2]^2 <1;end) |> count
print(4 * c / NUM_SAMPLES)
    3.1432

It is important to note that in these examples, the core domain calculations are being done in Julia code – in the spilt and + functions of the first example, and in the anonymous function of the second example. In addition however, familiar Spark API functions names such as parallelize/map/reduce/reduce_by_key, are being used to distribute the code and the data to the various Spark nodes that make up the cluster.

A large proportion of the Spark RDD api is accessible from Julia, as well as the beginnings of support for the Dataframes and Spark SQL api. Detailed documentation can be perused at http://dfdx.github.io/Spark.jl/.

Installing and Running

Installing the Julia Spark bindings is as simple as adding the package via the Pkg.add(“Spark.jl”) command from the julia REPL. This will install a local standalone Spark environment for testing, in addition to the Julia bindings. Java and maven are prerequisites, and the latter should be present in the system path.

When running this in a production setting, a Julia process is used as a driver, and it connects to an existing Spark cluster in client mode. Standalone, Mesos and YARN clusters are supported. On the cluster, Julia and it’s dependencies needs to be installed on all nodes. This should be automated, and pre-built scripts are available for the major cloud providers. This makes the cloud hosted Spark clusters provided by Amazon EMR and Azure HDInsight the easiest environments to run this on.

Julia on Azure HDInsight

Creating an HDInsight cluster on Azure is a matter of following the online wizard on the Azure portal. Choose Spark 2.1 on Linux (HDI 3.6) as the cluster type. Default settings can be used for everything else.

Create an Azure Data Lake Store principal if you intend to load data out of ADL Store. Choose a cluster size based on your requirements. By default HDInsight creates a cluster with 2 master nodes, and 4 workers. One the basic settings are provided, choose to edit the Advanced Settings and configure a script action. You can use the example supplied with the package to create a basic Julia installation on the cluster. For production use, you will want to edit the script to satisfy your requirements, for example adding packages, or installing JuliaPro.

Finally, once the cluster has been created, SSH to the master node, where you will find Julia available on the PATH. The cluster is running using the YARN cluster manager, where all endpoints are configured using property files. As a result, connecting to the cluster from Julia is simply a matter of specifying YARN as the cluster mode.

This post was formatted for the Julia Computing blog by Rajshekar Behar

Newsletter December 2017

Happy holidays from Julia Computing and best wishes for a prosperous and productive 2018.

  1. New Julia Developments
    • Major New Release of DataFrames.jl v0.11
    • JuliaBox – Commercial Version Now Available for University and Corporate Users with Enhancements and Support
    • Improved C++ Interoperability Interface
    • JuliaPro Amazon Machine Image and Docker Image
  2. Julia Computing at SC17 and Intel HPC Developer Conference
  3. Julia Computing at Analytics Vidhya’s DataHack Summit, Sponsored by Intel
  4. Julia for Astrodynamics
  5. Julia in Linux Magazine
  6. Julia and Julia Computing in the News
  7. Julia en Français
  8. Upcoming Events Featuring Julia
  9. Recent Events Featuring Julia
  10. Contact Us


1. New Julia Developments

i. Major New Release of DataFrames.jl v0.11:

DataFrames v0.11 has been released by the Julia community with a number of important updates:

  • ‘NA’ has been replaced with ‘missing’, making it much easier to work with missing data
  • Faster joining and grouping
  • Better display of DataFrames
  • Improved documentation
  • Modeling features are now in StatsModels.jl, whereas data import/export features are now in CSV.jl
  • Many other improvements

Don’t let the version number fool you. This version has been in the works for a long time, and leverages important new features in the Julia compiler. See the release announcement and try it out!

ii. JuliaBox – Commercial Version Now Available for University and Corporate Users with Enhancements and Support:

In response to user demand, Julia Computing has introduced a new and improved JuliaBox experience with increased memory and more support. For pricing and more information about the new commercial version of JuliaBox, please contact us. The free version of JuliaBox remains available for current and new users.

iii. Improved C++ Interoperability Interface:

Cxx.jl and CxxWrap.jl allow users to wrap C++ libraries in Julia. Use Cxx.jl to write the wrapper package in Julia code or CxxWrap.jl to write it entirely in C++ and call from Julia with a single line of Julia code. It is also possible to write and call Julia code from within C++, giving Julia and C++ complete two-way interoperability.

iv. JuliaPro Amazon Machine Image and Docker Image:

JuliaPro, the fastest on-ramp for quants, data scientists and researchers, is now available as an Amazon Machine Image on AWS EC2 (Red Hat Enterprise Linux v7.4 and Ubuntu 16.04) and as a Docker image (Ubuntu 16.04 and Centos 7) for use in containerized environments such as Kubernetes. More information is available here.

2. Julia Computing at SC17 and Intel HPC Developer Conference

Denver hosted the Intel HPC Developer Conference November 11-12 and SC17 November 12-17. Julia Computing participated in both conferences and presented the Celeste case study, one of the latest and most exciting developments in high performance computing using Julia. Julia Computing’s Ranjan Anantharaman was recognized for providing the Best Tutorial at the Intel HPC Developer Conference.


Julia Computing’s Ranjan Anantharaman (left), winner of the Best Tutorial Award at the Intel HPC Developer Conference 2017

3. Julia Computing at Analytics Vidhya’s DataHack Summit, Sponsored by Intel

Julia Computing was featured as part of the Intel keynote presentation about the future of high performance computing at Analytics Vidhya’s DataHack Summit in Bangalore, India held November 9-11. Julia Computing’s Rajshekar Behar presented Julia’s work with Celeste, Intel and Intel Skylake architecture.

4. Julia for Astrodynamics

Helge Eichhorn, Software Engineer at Telespazio VEGA Deutschland, presented Astrodynamics.jl: An Open-Source Framework for Interactive High-Performance Mission Analysis at the Open Source Cubesat Workshop on Nov 23 at the European Space Operations Center (ESOC/ESA) in Darmstadt, Germany.

5. Julia in Linux Magazine

Professor Mark Vogelsberger, Theoretical Astrophysicist at MIT, published an article in Linux Magazine in Jan 2016 titled “Getting Parallel: Creating Parallel Applications with the Julia Programming Language.” According to Professor Vogelsberger: “The Julia code is … more than 100 times faster than the equivalent Python code. Multiple dispatch with function calls gives Julia extremely efficient code that is practically superior to any high-level language. Faster code in Julia can be achieved without any tricks like vectorization or outsourcing to C extensions. By contrast, such tricks are often necessary to speed up Python or R code.”

6. Julia and Julia Computing in the News

  • Tangent Works Uses Julia to Win IEEE Global Energy Forecasting Competition 2017: Tangent Works, a European machine learning company, used Julia to win the IEEE Global Energy Forecasting Competition 2017 (GEFCom2017).

  • Julia Featured in insideHPC’s “AI-HPC Is Happening Now” White Paper: insideHPC, a leading blog in the high performance computing community, featured Julia and Julia Computing in this white paper about artificial intelligence and high performance computing.

  • Julia Climbs to #35 on TIOBE Index of Most Popular Programming Languages: Julia entered the Top 50 most popular programming languages for the first time in September 2016, and has climbed to #35 since last year.

  • Julia Computing Featured Among 10 Most Innovative Startups in India That Will Rule in 2018 and Beyond: KnowStartup featured Julia Computing among the 10 Most Innovative Startups in India That Will Rule in 2018 and Beyond.

  • Julia Language Delivers Petascale HPC Performance: TheNextPlatform explains that “the Celeste team demonstrated that the Julia language can support both petascale compute and terascale big data analysis on a leadership HPC system plus scale to handle the seven petabytes of data expected to be produced by the Large Synoptic Survey Telescope (LSST) every year.”

  • Julia Computing CEO Viral Shah Featured in FactorDaily Outliers Podcast: FactorDaily’s Outliers Podcast with Pankaj Mishra featured an interview with Julia Computing CEO Viral Shah: “You can thank Viral, … along with Alan Edelman, Jeff Bezanson, Stefan Karpinski, Keno Fischer and Deepak Vinchhi … the next time you have a safe flight in US airspace.”

  • Intel Reports Faster Stock Price Estimation Using Julia: @IntelBusiness reports that Julia Computing’s stock price estimation tool runs up to 38% faster – “a big gain for a fast-moving industry.”

  • Plotting in Julia: Tom Breloff published a blog post titled “Plots: Past, Present and Future” about plotting in Julia.

  • Feigenbaum’s Alpha: Professor Stuart Brorson from Northeastern University’s Department of Mathematics published a blog post entitled “A High Precision Calculation of Feigenbaum’s Alpha in Julia”.

7. Julia en Français

Xavier Gandibleux, Professor of Operations Research and Computer Science at the Université de Nantes, is writing a book in French about using Julia and JuMP for modeling and solving linear optimization problems in Operations Research.

8. Upcoming Events Featuring Julia

Do you know of any upcoming conferences, meetups, trainings, hackathons, talks, presentations or workshops involving Julia? Would you like to organize a Julia event on your own, or in partnership with your company, university or other organization? Let us help you spread the word and support your event by sending us an email with details. Here are some upcoming events:

9. Recent Events Featuring Julia

Do you want to share photos, videos or details of your most recent conference, meetup, training, hackathon, talk, presentation or workshop involving Julia? Please send us an email with details and links.

10. Contact Us

Please contact us if you wish to:

  • Purchase or obtain license information for Julia products such as JuliaPro, JuliaPro Enterprise, JuliaRun, JuliaDB, JuliaFin or JuliaBox
  • Obtain pricing for Julia consulting projects for your enterprise
  • Schedule Julia training for your organization
  • Share information about exciting new Julia case studies or use cases
  • Spread the word about an upcoming conference, workshop, training, hackathon, meetup, talk or presentation involving Julia
  • Partner with Julia Computing to organize a Julia meetup, conference, workshop, training, hackathon, talk or presentation involving Julia

About Julia and Julia Computing

Julia is the fastest high performance open source computing language for data, analytics, algorithmic trading, machine learning, artificial intelligence, and many other domains. Julia solves the two language problem by combining the ease of use of Python and R with the speed of C++. Julia provides parallel computing capabilities out of the box and unlimited scalability with minimal effort. For example, Julia has run at petascale on 650,000 cores with 1.3 million threads to analyze over 56 terabytes of data using Cori, the world’s sixth-largest supercomputer. With more than 1.2 million downloads and +161% annual growth, Julia is one of the top programming languages developed on GitHub. Julia adoption is growing rapidly in finance, insurance, machine learning, energy, robotics, genomics, aerospace, medicine and many other fields.

Julia Computing was founded in 2015 by all the creators of Julia to develop products and provide professional services to businesses and researchers using Julia. Julia Computing offers the following products:

  • JuliaPro for data science professionals and researchers to install and run Julia with more than one hundred carefully curated popular Julia packages on a laptop or desktop computer.
  • JuliaRun for deploying Julia at scale on dozens, hundreds or thousands of nodes in the public or private cloud, including AWS and Microsoft Azure.
  • JuliaFin for financial modeling, algorithmic trading and risk analysis including Bloomberg and Excel integration, Miletus for designing and executing trading strategies and advanced time-series analytics.
  • JuliaDB for in-database in-memory analytics and advanced time-series analysis.
  • JuliaBox for students or new Julia users to experience Julia in a Jupyter notebook right from a Web browser with no download or installation required.

To learn more about how Julia users deploy these products to solve problems using Julia, please visit the Case Studies section on the Julia Computing Website.

Julia users, partners and employers hiring Julia programmers in 2017 include Amazon, Apple, BlackRock, Capital One, Citibank, Comcast, Disney, Facebook, Ford, Google, IBM, Intel, KPMG, Microsoft, NASA, Oracle, PwC, Uber, and many more.

Introduction to the packages Cxx.jl and CxxWrap.jl

Introduction

Cxx.jl is a Julia package that provides a C++ interoperability interface for Julia. It also provides an experimental C++ REPL mode for the Julia REPL. With Cxx.jl, it is possible to directly access C++ using the @cxx macro from Julia.

With Cxx.jl and CxxWrap.jl. when facing the task of wrapping a C++ library in a Julia package, authors now have 2 options:

  • Use Cxx.jl to write the wrapper package in Julia code
  • Use CxxWrap to write the wrapper completely in C++ (and one line of Julia code to load the .so)

Cxx.jl

Functionality

There are two ways to access the main functionality provided by this package. The first is using the @cxx macro, which puns on Julia syntax to provide C++ compatibility.

The macro supports two main usages:

  • A static function call @cxx mynamespace::func(args...)
  • A membercall (where m is a CppPtr, CppRef or a CppValue) @cxx m->foo(args...)

Additionally, this package provides the cxx"" and icxx"" custom string literals for inputting C++ syntax directly. The two string literals are distinguished by the C++ level scope they represent.

In summary, the two approaches to embed C++ functions in Julia discussed above would look like this :

# Using @cxx (e.g.):   
cxx""" void cppfunction(args){ . . .} """ => @cxx cppfunction(args)

# Using icxx (e.g.):
julia_function (args) icxx""" *code here*  """

The C++ REPL

This package contains an experimental C++ REPL feature. Using the package will automatically add a new pane to your REPL that is accessible by pressing the < key.

Installation

The package is installable on Julia 0.5 and newer and is available through Julia’s package manager:

Pkg.add("Cxx")

Building the C++ code requires the same system tools necessary for building Julia from source. Further, Debian/Ubuntu users should install libedit-dev and libncurses5-dev, and RedHat/CentOS users should install libedit-devel.

Using Cxx.jl with examples

Example 1: Embedding a simple C++ function in Julia

# include headers
julia> using Cxx
julia> cxx""" #include<iostream> """  

# Declare the function
julia> cxx"""  
         void mycppfunction() {   
            int z = 0;
            int y = 5;
            int x = 10;
            z = x*y + 2;
            std::cout << "The number is " << z << std::endl;
         }
      """
# Convert C++ to Julia function
julia> julia_function() = @cxx mycppfunction()
julia_function (generic function with 1 method)

# Run the function
julia> julia_function()
The number is 52

Example 2: Pass numeric arguments from Julia to C++

julia> jnum = 10
10

julia> cxx"""
           void printme(int x) {
              std::cout << x << std::endl;
           }
       """

julia> @cxx printme(jnum)
10

Example 3: Pass strings from Julia to C++

julia> cxx"""
         void printme(const char *name) {
            // const char* => std::string
            std::string sname = name;
            // print it out
            std::cout << sname << std::endl;
         }
     """

julia> @cxx printme(pointer("John"))
   John

Example 4: Pass a Julia expression to C++

julia> cxx"""
          void testJuliaPrint() {
              $:(println("\nTo end this test, press any key")::Nothing);
          }
       """

julia> @cxx testJuliaPrint()
       To end this test, press any key

Example 5: Embedding C++ code inside a Julia function

function playing()
    for i = 1:5
        icxx"""
            int tellme;
            std::cout<< "Please enter a number: " << std::endl;
            std::cin >> tellme;
            std::cout<< "\nYour number is "<< tellme << "\n" <<std::endl;
        """
    end
end
playing();

Click here for more information, examples, and documentation.

CxxWrap.jl

This package lets you write the code for the Julia wrapper in C++, and then use a one-liner on the Julia side to make the wrapped C++ library available there.

The mechanism behind this package is that functions and types are registered in C++ code that is compiled into a dynamic library. This dynamic library is then loaded into Julia, where the Julia part of this package uses the data provided through a C interface to generate functions accessible from Julia. The functions are passed to Julia either as raw function pointers (for regular C++ functions that don’t need argument or return type conversion) or std::functions (for lambda expressions and automatic conversion of arguments and return types). The Julia side of this package wraps all this into Julia methods automatically.

Installation

Like any other registered Julia package, installation completes by running the following package manager command:

Pkg.add("CxxWrap")

Features

  • Support for C++ functions, member functions and lambdas
  • Classes with single inheritance, using abstract base classes on the Julia side
  • Trivial C++ classes can be converted to a Julia isbits immutable
  • Template classes map to parametric types, for the instantiations listed in the wrapper
  • Automatic wrapping of default and copy constructor (mapped to deepcopy) if defined on the wrapped C++ class
  • Facilitate calling Julia functions from C++

A Hello, World example with CxxWrap.jl

Suppose we want to expose the following C++ function to Julia in a module called CppHello:

std::string greet()
{
   return "hello, world";
}

Using the C++ side of CxxWrap, this can be exposed as follows:

#include "jlcxx/jlcxx.hpp"

JULIA_CPP_MODULE_BEGIN(registry)
  jlcxx::Module& hello = registry.create_module("CppHello");
  hello.method("greet", &greet);
JULIA_CPP_MODULE_END

Once this code is compiled into a shared library (say libhello.so) it can be used in Julia as follows:

using CxxWrap

# Load the module and generate the functions
wrap_modules(joinpath("path/to/built/lib","libhello"))
# Call greet and show the result
@show CppHello.greet()

More such examples and documentation for the package can be found here.

This post was formatted for the Julia Computing blog by Rajshekar Behar