Tag Archives: Programming

HDF5 in Julia

So, last summer, my program was producing three dimensional data, and I needed a way to export and save that data from my C++ program. Simple ASCII files, my default method, no longer covered my needs. Of course, I wasn’t the first person to encounter this problem, so I discovered the HDF5 standard.

Instead of storing data in a human readable format like ASCII, the Hierarchical Data Format, HDF, stores data in binary format. This preserves the shape of the data in the computer and keeps it at its minimum size. WOHOO!!

Sadly, the syntax for HDF5 in C++ and Fortran is just as bad as FFTW or OpenBLAS. But happily, just like FFTW and OpenBLAS, HDF5 has wonderful syntax in Julia, Python, and MATLAB, among others.

So how does it work?

We don’t just print a single variable. Each HDF5 file is like its own file system. In my home directory, I have my documents folder, my programming folder, my pictures, configuration files,… and inside each folder I can have subfolders or files.

The same is true for an HDF5 file. We have the root, and then we have groups and subgroups. A group is like a folder. Then we can have datasets. Datasets are objects that hold data (files).

Installing the Package

While running Pkg.add("HDF5"); should hopefully add the HDF5 library, additional steps may be required. I remember having a horrible time with the HDF installation when using C++ a year ago. If at all possible, just use a package manager, and do not try and install it from source! See the HDF5.jl or HDFGroup pages for details.

using HDF5;

Hello World

Firstly, lets open a file and then write some data to it.

We can open a file in three ways:

Symbol Meaning
“w” Write. Will overwrite anything already there.
“r” Ready-only.
“r+” Read-write. Preserving existing contents.

If we open with this syntax, we have to always remember to close it with close()

fid=h5open("test.h5","w")

fid["string"]="Hello World"

close(fid)

Now lets see if we were successful by reading. Instead of reading the dataset, we are going to checkout the structure of the file first.

names(fid) tells us what is inside the location fid.

dump(fid) is much more in depth, exploring everything below fid. If we had a bunch of subdirectories, it would go down each one to see what was there.

Both these functions help you find your way around a file.

fid=h5open("test.h5","r")

println("names n",names(fid))

println("n dump")
println(dump(fid))

close(fid)
names
Union{ASCIIString,UTF8String}["string"]

 dump
HDF5.HDF5File len 1
  string: HDF5Dataset () : Hello World
nothing

Reading Data

Now when we are reading data, we need to know the difference between dataset and the data the dataset contains.

Look at the below example

fid=h5open("test.h5","r")

dset=fid["string"]
println("the dataset: t", typeof(dset))

data=read(dset)
println("the string: t", typeof(data),"t",data)

data2=read(fid,"string")
println("read another way: t", typeof(data2),"t",data2)

close(fid)
the dataset: 	HDF5.HDF5Dataset
the string: 	ASCIIString	Hello World
read another way: 	ASCIIString	Hello World

A dataset is like the filename “fairytale.txt”, so we then need to read the file to get “Once upon a time …”.

Groups

I’ve talked about groups, but we haven’t done anything with them yet. Let’s make some!

Here we use g_create to create two groups, one inside the other. For the subgroup, it’s parent is g, so we have to create it at location g. Just like in a filesystem, it’s name/ path is nested within its parent’s path.

fid=h5open("test.h5","w")

g=g_create(fid,"mygroup");
h=g_create(g,"mysubgroup");

println(dump(fid))

println("n path of h:  ",name(h))

close(fid)
HDF5.HDF5File len 1
  mygroup: HDF5.HDF5Group len 1
    mysubgroup: HDF5.HDF5Group len 0
nothing

 path of h:  /mygroup/mysubgroup

Attributes

Say in a file I want to include the information that I ran the simulation with 100 sites, at 1 Kelvin, for 100,000 timesteps. Instead of creating new datasets for each of these individual numbers, I can create attributes and tie them to either a group or a dataset.

fid=h5open("test.h5","w")

fid["data"]=randn(3,3);

attrs(fid["data"])["Temp"]="1";
attrs(fid["data"])["N Sites"]="100";

close(fid)

fid=h5open("test.h5","r")

dset=fid["data"];

println("typeof attrs: t", typeof(attrs(dset)))
println("Temp: t",read( attrs(dset),"Temp"  ))
println("N Sites: t",read(  attrs(dset),"N Sites"  ))

close(fid)
typeof attrs: 	HDF5.HDF5Attributes
Temp: 	1
N Sites: 	100

Final Tips

Before diving in to learn how to use this, think about whether you need it or not. How large and complex is your data? Is it worth the time to learn? While the syntax might be relatively simple in Julia, ASCII files are still much easier to deal with.

If you are going to play around or use this format, I recommend getting an HDF viewer, like HDFViewer. While you can have much more control via code, sometimes it is just that much simpler to check everything is working with a GUI.

For more information, checkout the Package page at HDF5.jl or the HDFGroup page at HDFGroup

I’ve shown some of the basic functionality in simple test cases. If you want more control, you might just have to work a bit for it.

Julia with MKL on OSX

Introduction

One of the great things about Julia for those in scientific computing is the ease of accessing highly optimized libraries. For matrix operations, Julia comes inbuilt with OpenBLAS, an open source implementation of BLAS, the Basic Linear Algebra Subprograms.

For the majority of people, that’s wonderful. OpenBLAS is quite fast and optimized.

BUT, when you want to diagonalize the large matrices that I do, there’s something better, Intel’s Math Kernel Library, MKL.

As Intel designed the chips and the hardware drivers for just about everyone, they can design their implementation of BLAS to take advantage of the specifics of the hardware and get a speed boost. More to my purposes, it also doesn’t start aborting on larger matrices, even though I had plenty of RAM left. The downside: they get this boost from trade secrets, and thus the software is propriety and behind closed doors. Moral objections for some, monetary objections for others.

If you want to get MKL for yourself, you have two possible routes:

  • A free community license through Intel Aviary . I got this for my workstation.

  • Convince your company/ university/ institute to get the fully supported and expensive version. For example, my institute’s cluster has all of Intel’s tools.

Do you need this?

Before you start trying to implement this on your system, take a second and decide whether or not it is worth your while. What kind of systems are you trying to diagonalize? Are you going to be diagonalizing systems at all? Or multiplying large matrices… that would count too…

I generated matrices through A=randn(n,n); and then diagonalized them through @time eigfact(A);.

All of these specs are for my Mac Pro, Late 2013 model, running OSX El Capitan. Processor: 3.7 GHz Quad-Core Intel Xeon E5. Memory: 64 GB 1866 MHz DDR3 ECC. I would be interested in seeing data for other processors.

Time Scaling

Time scaling for MKL and OpenBLAS. Performed for a matrix with randomly generated values according to a normal distribution of unit standard deviation.

Factor Scaling

The ratio between OpenBLAS and MKL. While comporable at small system sizes, at larger matrices, MKL shows a significant improvement.

Memory Scaling

Both MKL and OpenBLAS showed the same memory usage for a given calculation. The scaling appears quadratic, except for a deviation at small system sizes.

What you need to do

For Intel

So in my .zshrc, or .bashrc for those who haven’t discovered the wonders of zsh, I now have

export TBBROOT=fdsljkfds
source /opt/intel/mkl/bin/mklvars.sh intel64 ilp64

The value for TBBROOT is non-zero gobblety-gook.
Once you have added that, either restart your terminal, or type

	source ~/.bashrc

to refresh your terminal.

Now, check that these environment variables are set up correctly:

  • MKLROOT
    • /opt/intel//compilers_and_libraries_2016.2.146/mac/mkl
  • DYLD_LIBRARY_PATH
    • /opt/intel//compilers_and_libraries_2016.2.146/mac/compiler/lib: /opt/intel//compilers_and_libraries_2016.2.146/mac/mkl/lib
  • LIBRARY_PATH
    • /opt/intel//compilers_and_libraries_2016.2.146/mac/compiler/lib: /opt/intel//compilers_and_libraries_2016.2.146/mac/mkl/lib
  • NLSPATH
    • /opt/intel//compilers_and_libraries_2016.2.146/mac/mkl/lib/locale/%l_%t/%N
  • MANPATH
    • /opt/intel//compilers_and_libraries_2016.2.146/mac/man/en_US
  • CPATH
    • /opt/intel//compilers_and_libraries_2016.2.146/mac/mkl/include

by typing

echo $NAME_OF_VARIABLE

Don’t just copy your variables against mine! Find your installation on the system, and checkout where the folders are.

For the Julia Installation

In the Julia file, edit Make.inc in this specific place

## Settings for various Intel tools
# Set to 1 to use MKL
USE_INTEL_MKL =1
# Set to 1 to use MKL FFT
USE_INTEL_MKL_FFT = 1
# Set to 1 to use Intel LIBM
USE_INTEL_LIBM ?= 0
# Set to 1 to enable profiling with Intel VTune Amplifier
USE_INTEL_JITEVENTS ?= 0
# Set to 1 to use Intel C, C++, and FORTRAN compilers
USEICC  ?= 0
USEIFC  ?= 0

Now in the Julia folder try

make
make install

How I eventually figured this out

I was getting complaints when makeing Julia, that

-L/opt/intel//compilers_and_libraries_2016.2.146/mac/tbb/lib

wasn’t found. There was good reason it wasn’t found. It didn’t exist. TBB stands for Threading Building Blocks, another one of Intel’s programs, but this one is meant for multicore C++ programs. Sounds fairly useful, but off topic to what I need right now.

So I wanted to figure out why it was trying to link to that directory. Looking in /opt/intel/mkl/bin/mklvars.sh, the program that sets environment variables for MKL, I discovered:

                if [ -z "${TBBROOT}" ]; then
                    mkl_ld_arch="${CPRO_PATH}/tbb/lib:${mkl_ld_arch}"
                fi

When the variable TBBROOT is zero, it adds this tbb folder to the path. Since I can’t change that file, proprietary stuff, my work around is making TBBROOT non-zero. Then DYLD_LIBRARY_PATH, which gets linked in the Julia make processes, only contains good locations.

Also, you can’t just run source ... to on the command line once and have the variables set for all eternity. When I restarted my terminal the next day, the variables had cleared. So I figured out the lines need to be put in the ~/.bashrc (or ~/.zshrc) instead of just run once.

Conculsions

I still got warnings when making Julia, but obviously none that broke the installation. Hopefully this work-around holds me over till this project is done, and hopefully it helps someone else too 🙂

Benchmarks of Multidimensional Stack Implementations in Julia

By: Christopher Rackauckas

Re-posted from: http://www.stochasticlifestyle.com/233-2/

Datastructures.jl claims it’s fast. How does it do? I wrote some quick codes to check it out. What I wanted to do is find out which algorithm does best for implementing a stack where each element is three integers. I tried filling a pre-allocated array, pushing into three separate vectors, and different implementations of the stack from the DataStructures.jl package.

function baseline()
  stack = Array{Int64,2}(1000000,3)
  for i=1:1000000,j=1:3
    stack[i,j]=i
  end
end 
function baseline2()
  stack = Array{Int64,2}(1000000,3)
  for j=1:3,i=1:1000000
    stack[i,j]=i
  end
end
function f0()
  stack = Array{Int64}(1000000,3)
  for i = 1:1000000
    stack[i,:] = [i,i,i]
  end
end
function f02()
  stack = Array{Int64}(3,1000000)
  for i = 1:1000000
    stack[:,i] = [i;i;i]
  end
end
function f1()
  stack1 = Vector{Int64}(1)
  stack2 = Vector{Int64}(1)
  stack3 = Vector{Int64}(1)
  for i = 1:1000000
    push!(stack1,i)
    push!(stack2,i)
    push!(stack3,i)
  end
end
function f2()
  stack1 = Stack(Int)
  stack2 = Stack(Int)
  stack3 = Stack(Int)
  for i = 1:1000000
    push!(stack1,i)
    push!(stack2,i)
    push!(stack3,i)
  end
end
function f3()
  stack = Stack{}(Tuple{Int64,Int64,Int64})
  for i = 1:1000000
    push!(stack,(i,i,i))
  end
end
function f4()
  stack = Stack{}(Vector{Int64})
  for i = 1:1000000
    push!(stack,[i,i,i])
  end
end
using Benchmark
using DataStructures
base = benchmark(baseline,"baseline",1000)
println(base)
base2 = benchmark(baseline2,"baseline2",1000)
println(base2)
df0 = benchmark(f0,"array",1000)
println(df0)
df02 = benchmark(f02,"arrayTranspose",1000)
println(df02)
df1 = benchmark(f1,"vectorStack",1000)
println(df1)
df2 = benchmark(f2,"dsStacks",1000)
println(df2)
df3 = benchmark(f3,"dsStackTuple",1000)
println(df3)
df4 = benchmark(f4,"dsStackVector",1000)
println(df4)

The results were as follows:

| Row | Category   | Benchmark  | Iterations | TotalWall | AverageWall |
|-----|------------|------------|------------|-----------|-------------|
| 1   | "baseline" | "baseline" | 1000       | 11.7169   | 0.0117169   |
 
| Row | MaxWall   | MinWall    | Timestamp             |
|-----|-----------|------------|-----------------------|
| 1   | 0.0158455 | 0.00978837 | "2016-02-29 23:23:51" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category    | Benchmark   | Iterations | TotalWall | AverageWall |
|-----|-------------|-------------|------------|-----------|-------------|
| 1   | "baseline2" | "baseline2" | 1000       | 9.84362   | 0.00984362  |
 
| Row | MaxWall   | MinWall    | Timestamp             |
|-----|-----------|------------|-----------------------|
| 1   | 0.0126953 | 0.00694176 | "2016-02-29 23:24:01" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category | Benchmark | Iterations | TotalWall | AverageWall | MaxWall  |
|-----|----------|-----------|------------|-----------|-------------|----------|
| 1   | "array"  | "array"   | 1000       | 114.288   | 0.114288    | 0.172499 |
 
| Row | MinWall   | Timestamp             |
|-----|-----------|-----------------------|
| 1   | 0.0775942 | "2016-02-29 22:45:42" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category         | Benchmark        | Iterations | TotalWall |
|-----|------------------|------------------|------------|-----------|
| 1   | "arrayTranspose" | "arrayTranspose" | 1000       | 110.981   |
 
| Row | AverageWall | MaxWall  | MinWall   | Timestamp             |
|-----|-------------|----------|-----------|-----------------------|
| 1   | 0.110981    | 0.183495 | 0.0741138 | "2016-02-29 22:47:34" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category      | Benchmark     | Iterations | TotalWall | AverageWall |
|-----|---------------|---------------|------------|-----------|-------------|
| 1   | "vectorStack" | "vectorStack" | 1000       | 34.4623   | 0.0344623   |
 
| Row | MaxWall   | MinWall   | Timestamp             |
|-----|-----------|-----------|-----------------------|
| 1   | 0.0455326 | 0.0285367 | "2016-02-29 22:48:09" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category   | Benchmark  | Iterations | TotalWall | AverageWall |
|-----|------------|------------|------------|-----------|-------------|
| 1   | "dsStacks" | "dsStacks" | 1000       | 38.0762   | 0.0380762   |
 
| Row | MaxWall   | MinWall   | Timestamp             |
|-----|-----------|-----------|-----------------------|
| 1   | 0.0508213 | 0.0303853 | "2016-02-29 22:48:47" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category       | Benchmark      | Iterations | TotalWall | AverageWall |
|-----|----------------|----------------|------------|-----------|-------------|
| 1   | "dsStackTuple" | "dsStackTuple" | 1000       | 19.3516   | 0.0193516   |
 
| Row | MaxWall   | MinWall   | Timestamp             |
|-----|-----------|-----------|-----------------------|
| 1   | 0.0296347 | 0.0140451 | "2016-02-29 22:49:06" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category        | Benchmark       | Iterations | TotalWall |
|-----|-----------------|-----------------|------------|-----------|
| 1   | "dsStackVector" | "dsStackVector" | 1000       | 184.126   |
 
| Row | AverageWall | MaxWall  | MinWall | Timestamp             |
|-----|-------------|----------|---------|-----------------------|
| 1   | 0.184126    | 0.227575 | 0.16454 | "2016-02-29 22:52:11" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |
1x12 DataFrames.DataFrame
| Row | Category      | Benchmark     | Iterations | TotalWall | AverageWall |
|-----|---------------|---------------|------------|-----------|-------------|
| 1   | "vectorTuple" | "vectorTuple" | 1000       | 23.65     | 0.02365     |
 
| Row | MaxWall   | MinWall   | Timestamp             |
|-----|-----------|-----------|-----------------------|
| 1   | 0.0375346 | 0.0200302 | "2016-02-29 23:29:45" |
 
| Row | JuliaHash                                  | CodeHash | OS        |
|-----|--------------------------------------------|----------|-----------|
| 1   | "a2f713dea5ac6320d8dcf2835ac4a37ea751af05" | NA       | "Windows" |
 
| Row | CPUCores |
|-----|----------|
| 1   | 8        |

Things to learn from this are:

  • Using a tuple is by far the fastest.
  • Datastructures.jl does beat out all except the pre-allocated array
  • The standard vector is pretty close to the DataStructures.jl result

The end result is: use arrays when you can pre-allocate and need mutability, but if you want to throw and retrieve things from a dynamic data structure, using tuples is key. Datastructures.jl has some nice features and (obviously) implementations of data structures, and although they are slightly faster than the native implementation, don’t expect a massive speedup. Still, it’s a well-made package you should try out.

The post Benchmarks of Multidimensional Stack Implementations in Julia appeared first on Stochastic Lifestyle.