Tag Archives: Uncategorized

Julia on Azure

By: pkofod

Re-posted from: http://www.pkofod.com/2017/02/09/julia-on-azure/

Microsoft recently added Julia to Azure, their cloud computing service. I got curious, and headed over to Azure to check it out. On the platform, Julia is provided in the form of the JuliaPro bundle shipped by Julia Computing. JuliaPro consists of the latest stable Julia release, the Juno IDE (Atom based) + debugger, Jupyter notebooks, and a bundle of curated packages from different categories such as: statistics (DataFrames.jl, Distributions.jl, …), optimization (JuMP.jl, Optim.jl), language interoperability (RCall.jl, JavaCall.jl, PyCall.jl), Deep Learning (Mocha.jl, MXNet.jl, Knet.jl), and more. The packages come precompiled, so when JuliaPro is installed, it should “Just Work” (of course packages installed manually also works, but some packages take a long time to build).

As with many other services, you pay a price per hour you spend on the VMs. The smallest ones are quite cheap, but you can scale up to very powerful setups. Most Julia packages are quite quick to build (precompile), but when you’re paying by the hours, precompiled packages (that JuliaPro comes with) can be quite neat. Luckily, there is a trial account where you get some “getting started” credit. If you don’t boot up the most powerful VM configuration right away, there is plenty of credit to get started. I won’t explain the process of setting up a VM here, as it is very easy and self-explanatory. I set up a windows/data science VM, and connected using Remmina – and wouldn’t you know it Julia is right there on the desktop: REPL, Juno, and Jupyter right next to each other:

Let’s try to open Juno, because… you’re worth it! (click to enlarge)

I’m just adding  a package, loading it, creating some data, and plotting it (using a theme from PlotThemes.jl). Everything works fine, although I should note that Plots.jl took a while to load the first time around, as I’ve chosen the smallest VM available. Since we’re using JuliaPro, we could have using Gadfly or PyPlot instead, and it would have been compiled already.

From here on, it’s up to you what to do. Analyse the stars, try out MXNet.jl, predict company defaults, or whatever you think is interesting.

Julia – A Fresh Approach to Numerical Computing

This post is authored by Viral B. Shah, co-creator of the Julia language and co-founder and CEO at Julia Computing, and Avik Sengupta, head of engineering at Julia Computing.

The Julia language provides a fresh new approach to numerical computing, where there is no longer a compromise between performance and productivity. A high-level language that makes writing natural mathematical code easy, with runtime speeds approaching raw C, Julia has been used to model economic systems at the Federal Reserve, drive autonomous cars at University of California Berkeley, optimize the power grid, calculate solvency requirements for large insurance firms, model the US mortgage markets and map all the stars in the sky

It would be no surprise then that Julia is a natural fit in many areas of machine learning. ML, and in particular deep learning, drives some of the most demanding numerical computing applications in use today. And the powers of Julia make it a perfect language to implement these algorithms.

julia

One of key promises of Julia is to eliminate the so-called “two language problem.” This is the phenomenon of writing prototypes in a high-level language for productivity, but having to dive down into C for performance-critical sections, when working on real-life data in production. This is not necessary in Julia, because there is no performance penalty for using high-level or abstract constructs.

This means both the researcher and engineer can now use the same language. One can use, for example, custom kernels written in Julia that will perform as well as kernels written in C. Further, language features such as macros and reflection can be used to create high-level APIs and DSLs that increase the productivity of both the researcher and engineer.

GPU

Modern ML is heavily dependent on running on general-purpose GPUs in order to attain acceptable performance. As a flexible, modern, high-level language, Julia is well placed to take advantage of modern hardware to the fullest.

First, Julia’s exceptional FFI capabilities make it trivial to use the GPU drivers and CUDA libraries to offload computation to the GPU without any additional overhead. This allows Julia deep learning libraries to use GPU computation with very little effort.

Beyond that, libraries such as ArrayFire allow developers to use natural-looking mathematical operations, while performing those operations on the GPU instead of the CPU. This is probably the easiest way to utilize the power of the GPU from code. Julia’s type and function abstractions make this possible with, once again, very little performance overhead.

Julia has a layered code generation and compilation infrastructure that leverages LLVM. (Incidentally, it also provides some amazing introspection facilities into this process.) Based on this, Julia has recently developed the ability to directly compile code onto the GPU. This is an unparalleled feature among high-level programming languages.

While the x86CPU with a GPU is currently the most popular hardware setup for deep learning applications, there are other hardware platforms that have very interesting performance characteristics. Among them, Julia now fully supports the Power platform, as well as the Intel KNL architecture.

Libraries

The Julia ecosystem has, over the last few years, matured sufficiently to materialize these benefits in many domains of numerical computing. Thus, there are a set of rich libraries for ML available in Julia right now. Deep learning framework with natural bindings to Julia include MXNet and TensorFlow. Those wanting to dive into the internals can use the pure Julia libraries, Mocha and Knet. In addition, there are libraries for random forests, SVMs, and Bayesian learning.

Using Julia with all these libraries is now easier than ever. Thanks to the Data Science Virtual Machine (DSVM), running Julia on Azure is just a click away. The DSVM includes a full distribution of JuliaPro, the professional Julia development environment from Julia Computing Inc, along with many popular statistical and ML packages. It also includes the IJulia system, with brings Jupyter notebooks to the Julia language. Together, it creates the perfect environment for data science, both for exploration and production.

Viral Shah
@Viral_B_Shah

New Year & New Updates to the Windows Data Science Virtual Machine

This post is authored by Gopi Kumar, Principal Program Manager in the Data Group at Microsoft.

First of all, a big thank you to all users of the Data Science Virtual Machine (DSVM) for your tremendous response to our offering in 2016. We’re looking forward to a similarly great year in 2017.

The new year also brings in some interesting new tools to our DSVM users, to help you be more productive with data science. In this post, we summarize key recent changes on the Windows Server side of our DSVM offering, below.

  1. Microsoft R Server 9.0.1 (MRS9) developer edition, a major update to the enterprise scalable R extension from Microsoft, is now available on the VM. This version brings a lot of exciting changes including several fast ML / deep learning algorithms developed by Microsoft in a new library called Microsoft ML. There’s a new architecture and interface for deploying R models and functions as web services, this follows a paradigm and interface library very similar to Azure ML operationalization. The library is called mrsdeploy. We have some R deployment samples for both notebook and R Tools for Visual Studio (RTVS) and RStudio. The olapR package in Microsoft R Server lets you run MDX queries and connect directly to OLAP cubes on SQL Server 2016 Analysis Services from your R solution. SQL Server 2016 Developer edition and the associated Microsoft R In-DB analytics is also updated to Service Pack 1.
  2. R Studio Desktop open source edition is now preinstalled into the VM, by popular demand.
  3. R Tools for Visual Studio is now updated to version 0.5, bringing in multi-window plotting and SQL tooling to run R code on SQL Server 2016.
  4. Microsoft Cognitive Toolkit (formerly called CNTK) is now on Version 2 Beta 6, and features several improvements and sample notebooks to perform fast deep learning using Python interface or the CNTK Brainscript interface.
  5. Apache Drill, a SQL based query tool that can work with various data sources and formats (e.g. JSON, CSV), was part of our previous update. We now prepackage and configure drivers to access various Azure data services such as Blobs, SQLDW/Azure SQL, HDI and Document DB. See this tutorial in our gallery for information on how to query data in various Azure data sources from within the Drill SQL query language.
  6. JuliaPro is available to DSVM users and is now pre-installed and pre-configured on the VM, thanks to Julia Computing (a company founded by the creators of Julia programming language). JuliaPro is a curated distribution of the open source Julia language along with a set of popular packages for scientific computing, data science, AI and optimization. The JuliaPro distribution comes with an Atom based IDE, Jupyter notebooks and several sample notebooks on the DSVM Jupyter instance to help you get started. Julia Computing also provides an Enterprise edition with commercial support.
  7. The Deep Learning Toolkit for the Windows DSVM is an extension to help you jump start deep learning on Azure GPU VMs, and without having to spend time installing GPU framework dependencies and drivers or configuring the various deep learning tools. This extension has been updated to include the latest versions of CNTK 2, mxNet for GPU along with new samples. It also features the Windows version of TensorFlow.

We also offer a Linux Edition of the data science virtual machine and there will be a separate post on major updates there.

Meanwhile, here are some resources to get you started with the DSVM.

Windows Edition

Linux Edition

Webinar

I’d like to end this post with a graphical summary of the DSVM, showing a [non-exhaustive] list of the various tools that are preinstalled. DSVM helps you focus more on data science and spend less time on installing, configuring and administering tools, thereby making you more productive. Give DSVM a shot today and send us feedback on how we can make it even better for your data science needs.


Gopi