Deep Learning is now ubiquitous in the machine learning world, with useful
applications in a number of areas. In this blog post, we explore the use of
Julia for deep learning experiments on Power Systems + NVIDIA hardware.
We shall demonstrate:
-
The ease of specifying deep neural network architectures in Julia and
visualizing them. We useMXNet.jl
, a Julia package for deep learning. -
The ease of running Julia on Power Systems. We ran all our experiments on
an PowerNV 8335-GCA, which has 160 CPU cores, and a Tesla K80 (dual) GPU
accelerator. IBM and OSUOSL have generously provided us
with the infrastructure for this analysis.
Introduction
Deep neural networks have been around since the
1940s,
but have only recently been deployed in research and analytics because of
strides and improvements in computer technology and computational horsepower.
Neural networks have a wide range of applications in machine learning: vision,
speech processing, and even self driving cars.
An interesting use case for neural networks could be the ability to drive down
costs in medical diagnosis. Automated detection of diseases would be of immense
help to doctors, especially in places around the world where access to
healthcare is limited.
Diabetic retinopathy is
an eye disease brought on by diabetes. There are over 126.2 million people in
the world (as of 2010) with diabetic retinopathy, and this is
expected to rise to
over 191.2 million by 2030. According to the WHO in 2006,
it accounted
for 5% of world blindness.
Hence, early and automatic detection of diabetic retinopathy would be desirable.
To that end, we took up an image classification problem using real clinical data.
This data was provided to us by Drishti Care, which is
a social enterprise that provides affordable eye care in India. Dhristi Care CEO
Kiran Anandampillai explains that “India is home to 62 million diabetics,
of whom live in rural areas with limited access to health facilities. Timely
screening for changes in the retina can help get them to treatment and prevent
vision loss.” He further goes onto state that “Julia Computing’s work using
deep learning makes retinal screening an activity that can be performed by a
trained technician using a low cost fundus camera.”
We obtained a number of eye fundus
images from a variety of patients. The eyes affected by retinopathy are generally
marked by inflamed veins and cotton spots. The following picture on the left
is a normal fundus image whereas the one on the right is affected by diabetic retinopathy.
Setup
We built MXNet from source with CUDA and OpenCV. This was essential for training
our networks on GPUs with CUDNN, and reading our image record files. We had to
build GCC 4.8 from source so that our various libraries could compile and link
without error, but once we did, we were set up and ready to start working with
the data.
The Hardware: IBM Power Systems
We chose to run this experiment on an IBM Power System because, at the time of
this writing, we believe it is among the best environments available for this sort of
work. The Power platform is ideal for deep learning, big data, and machine
learning due to its high performance, large caches, 2x-3x higher memory
bandwidth, very high I/O bandwidth, and of course, tight integration with
GPU accelerators. The parallel multi-threaded Power architecture with high
memory and I/O bandwidth is particularly well adapted to ensure that GPUs are
used to their fullest potential.
We’re also encouraged by the industry’s commitment to the platform, especially
with regard to AI, noting that NVIDIA made its premier machine learning-focused
GPU (the Tesla P100) available on Power well before the x86, and that
innovations like NVLink are only available on Power.
The Model
The idea is to train a deep neural network to classify all these fundus images
into infected and uninfected images. Along with the fundus images, we have at
our disposal a number of training labels, identifying if the patient is infected or not.
We used MXNet.jl, a Julia package for
deep learning. As a first step, it’s good to load a pretrained model which is
known to be good at classifying images. So we decided to download and use the
ImageNet model called Inception
with weights in their 39th epoch. On top of that we specify a simple classifier.
# Extend model as we wish
arch = mx.@chain mx.get_internals(inception)[:global_pool_output] =>
mx.Flatten() =>
mx.FullyConnected(num_hidden = 128) =>
mx.Activation(act_type=:relu) =>
mx.FullyConnected(num_hidden = 2) =>
mx.WSoftmax(name = :softmax)
And now we train our model:
mx.fit(
model,
optimizer,
dp,
n_epoch = 100,
eval_data = test_data,
callbacks = [
mx.every_n_epoch(save_acc, 1, call_on_0=false),
mx.do_checkpoint(prefix, save_epoch_0=true),
],
eval_metric = mx.MultiMetric([mx.Accuracy(), WMultiACE(2)])
)
One feature of the data is that it is highly
imbalanced.
For every 200 uninfected images, we have only 3 infected
images. One way of approaching that scenario is to penalize the network heavily
for every infected case it gets wrong. So we replaced the normal Softmax layer
towards the end of the network with a weighted softmax. To check whether we are
overfitting, we decided to have multiple
performance metrics.
However, from our cross-entropy
measures, we found that we were still overfitting. With fast training times on
dual GPUs, we were able to train our model quickly and understand the drawbacks
of our current approach.
Therefore we decided to employ a different approach.
The second way to deal with our imbalanced dataset is to generate smaller,
more balanced datasets that contained roughly equal numbers of uninfected
images and infected images. We produced two datasets: one for training and
another for cross validation, both of which had the same number of uninfected
and infected patients.
Additionally, we also decided to shuffle our data. Every epoch, we resampled
the uninfected images from the larger pool of uninfected images (and they were
many in number) in the training dataset to expose the model to a range of
uninfected images so that it can generalize well. Then we started doing the
same to the infected images. This was quite simple to implement in Julia: we
simply had to overload the a particular function and modify the data.
Most of these steps were done incrementally. Our Julia setup and environment
made it easy for us to quickly change code and train models and incrementally
add more tweaks and modifications to our models as well as our training methods.
We also augmented our data, by adding low levels of Gaussian noise
to random images from both the uninfected images and the infected images.
Additionally, some images were randomly rotated by 180 degrees. Rotations are
quite ideal for this use case because the important spatial features would be
preserved. This artificially expanded our training set.
The following code augments the infected images. In the following code segment,
good images refer to uninfected images and bad images refer to infected images.
function mx.eachbatch(p::ShuffleDataProvider)
# Find positions of all good/bad images
gidx = find(x -> x == 0, vec(p.label_array))
bidx = find(x -> x == 1, vec(p.label_array))
# Generate indices of good/bad images from global pool
goodidx = rand(1:size(good["data"], 4), length(gidx))
badidx = rand(1:size(bad["data"], 4), length(bidx))
# Add random noise to bad images or flip them
for i = 1:length(bidx)
flipping = rand(Bool)
noise = rand(Bool)
b = bad["data"][:, : , :, badidx[i]]
if noise
for dim = 1:3
b[:,:,dim] = make_noise(b[:, :, dim], 0.2, 0)
end
end
if flipping
for dim = 1:3
b[:,:,dim] = flip(b[:, :, dim])
end
end
p.data_array[:, bidx[i]] = vec(b)
end
p
end
However, we found that while these measures stopped our model from overfitting,
we could not obtain adequate performance. We explore the possible reason for
this in the subsequent section.
Challenges
The initial challenge we faced was that our data is imbalanced, and so we
experimented with penalizing incorrect decisions made by the classifier. We
tried generating a balanced (yet smaller) dataset in the first place and then
it turned out that we were overfitting. To counter this, we performed the
shuffling and data augmentation techniques. But we didn’t get much performance
from the model.
Why is that so? Why is it that a model as deep as Inception wasn’t able to train
effectively on our dataset?
The answer, we believe, lies in the data itself. On a randomized sample
from the data, we found that there were two inherent problems with the data:
firstly, there are highly blurred images with no features among both the healthy
and the infected retinas.
Secondly, there are some features in the healthy images that one might find in the
infected images! For instance, in some images the veins are somewhat puffed, and in
others there are cotton spots. Below are some examples. While we note that the
picture on the left is undoubtedly infected, notice that one on the right also
has a few cotton spots and inflamed veins. So how does one differentiate? More
importantly, how does our model differentiate?
So what do we do about this? For the training set, it would be helpful to have
each image, rather than each patient independently diagnosed as healthy or infected
by a doctor or by two or more doctors working independently. This would likely
improve the model’s predictions.
The Julia Advantage
Julia provides a distinct advantage at every stage for scientists engaged in
machine learning and deep learning.
-
First, Julia is very efficient at preprocessing data. A very important first
step in any machine learning experiment is to organize, clean up and preprocess
large amounts of data. This was extremely efficient in our Julia environment,
which is known to be orders of magnitude faster in comparable environments
such as Python. -
Second, Julia enables elegant code. Our models were chained together
using Julia’s flexible syntax. Macros, metaprogramming and syntax familiar to
users of any technical environement allows for easy-to-read code. -
Third, Julia facilitates innovate. Since Julia is a first-class technical
computing environment, we can easily deploy the models we create without
changing any code. Julia hence solves the famous “two-language” problem, by
obviating the need for different languages for prototyping and production. This
leads to significant productivity gains and shortening of innovation cycles.
Due to all the aforementioned advantages, we were able to complete these
experiments in a very short period of time compared with other comparable
technical computing environments.
Call for Collaboration
We have demonstrated in this blog post how to write an image classifier based
on deep neural networks in Julia and how easy it is to perform multiple
experiments. Unfortunately, there are challenges with the dataset that required
more fine-grained labelling. We have reached out to appropriate experts
for assistance in this regard.
Users who are interested in working with the dataset and would be interested in
possibly collaborating with us on this are invited to reach out via email at
ranjan at juliacomputing.com to discuss access to the dataset.
Acknowledgements
I should thank a number of people for helping me with this work:
Valentin Churavy and
Pontus Stenetorp for guiding and mentoring me,
and Viral Shah of Julia Computing.
Thanks to IBM and OSUOSL too for providing the hardware, as well as Drishti Care
for providing the data.