This is a demonstration of using JuliaML and TensorFlow to train an LSTM network.
It is based on Aymeric Damien’s LSTM tutorial in Python.
All the explinations are my own, but the code is generally similar in intent.
There are also some differences in terms of network-shape.
The task is to use LSTM to classify MNIST digits.
That is image recognition.
The normal way to solve such problems is a ConvNet.
This is not a sensible use of LSTM, after all it is not a time series task.
The task is made into a time series task, by the images arriving one row at at a time;
and the network is asked to output which class at the end after seeing the 28th row.
So the LSTM network must remember the last 27 prior rows.
This is a toy problem to demonstrate that it can.
To do this we are going to use a bunch of packages from the JuliaML Org, as well as a few others.
A lot of the packages in JuliaML are evolving fast, so somethings here may be wrong.
You can install the packages used in this demo by running:
Pkg.add.(["TensorFlow", "Distributions", "ProgressMeter", "MLLabelUtils", "MLDataUtils"])
,
and Pkg.clone("https://github.com/JuliaML/MLDatasets.jl.git")
.
MLDatasets.jl is not yet registers so you need to clone that one.
Also right now (24/01/2017), we are using the dev branch of MLDataUtils.jl,
so you will need to do the git checkout
stuff to make that work,
but hopefully very soon that will be merged into master, so just the normal Pkg.add
will surfice.
You also need to install TensorFlow, as it is not automatically installed by the TensorFlow.jl package.
We will go through each package we use in turn. Continue reading