Deep Learning: Exploring High Level APIs of Knet.jl and Flux.jl in comparison to Tensorflow-Keras

By: Estadistika -- Julia

Re-posted from: https://estadistika.github.io//julia/python/packages/knet/flux/tensorflow/machine-learning/deep-learning/2019/06/20/Deep-Learning-Exploring-High-Level-APIs-of-Knet.jl-and-Flux.jl-in-comparison-to-Tensorflow-Keras.html

When it comes to complex modeling, specifically in the field of deep learning, the go-to tool for most researchers is the Google’s TensorFlow. There are a number of good reason as to why, one of it is the fact that it provides both high and low level APIs that suit the needs of both beginners and advanced users, respectively. I have used it in some of my projects, and indeed it was powerful enough for the task. This is also due to the fact that TensorFlow is one of the most actively developed deep learning framework, with Bayesian inference or probabilistic reasoning as the recent extension (see TensorFlow Probability, another extension is the TensorFlow.js). While the library is written majority in C++ for optimization, the main API is served in Python for ease of use. This design works around the static computational graph that needs to be defined declaratively before executed. The static nature of this graph, however, led to difficulty on debugging the models since the codes are itself data for defining the computational graph. Hence, you cannot use a debugger to check the results of the models line by line. Thankfully, it’s 2019 already and we have a stable Eager Execution that allows users to immediately check the results of any TensorFlow operations. Indeed, this is more intuitive and more pythonic. In this article, however, we’ll attempt to explore, what else we have in 2019. In particular, let’s take look at Julia’s deep learning libraries and compare it to high level APIs of TensorFlow, i.e. Keras’ model specification.

As a language that leans towards numerical computation, it’s no surprise that Julia offers a number of choices for doing deep learning, here are the stable libraries:

  1. Flux.jl – The Elegant Machine Learning Stack.
  2. Knet.jl – Koç University deep learning framework.
  3. MLJ.jl – Julia machine learning framework by Alan Turing Institute.
  4. MXNet.jl – Apache MXNet Julia package.
  5. TensorFlow.jl – A Julia wrapper for TensorFlow.

Other related packages are maintained in JuliaML. For this article, we are going to focus on the usage of
Flux.jl and Knet.jl, and we are going to use the Iris dataset for classification task using Multilayer Perceptron. To start with, we need to install the following packages. I’m using Julia 1.1.0. and Python 3.7.3.


using Pkg
Pkg.add("CSV")
Pkg.add("DataFrames")
Pkg.add("Flux")
Pkg.add("Knet")
Pkg.add("Random")
Pkg.add("RDatasets")
Pkg.add("StatsBase")

Loading and Partitioning the Data

The Iris dataset is available in the RDatasets.jl Julia package and in Python’s Scikit-Learn. The following codes load the libraries and the data itself.



using CSV: write
using DataFrames: DataFrame
using Knet
using Random
using RDatasets
using StatsBase: sample
Random.seed!(123);

The random seed set above is meant for reproducibility as it will give us the same random initial values for model training. The iris variable in line 11 (referring to Julia code) contains the data, and is a data frame with 150 × 5 dimensions, where the columns are: Sepal Length, Sepal Width, Petal Length, Petal Width, and Species. There are several ways to partition this data into training and testing datasets, one procedure is to do stratified sampling, with simple random sampling without replacement as the sampling selection within each stratum — the species. The following codes define the function for partitioning the data with the mentioned sampling design:


function partition(xdat::Array{<:AbstractFloat, 2}, ydat::Array{<:Int, 1}, ratio::AbstractFloat = 0.3)
scnt = size(xdat, 1) / length(unique(ydat));
ntst = Int(ceil((size(xdat, 1) * ratio) / length(unique(ydat))));
idx = Int.(sample(1:(length(ydat) / length(unique(ydat))), ntst, replace = false));
for i in 2:length(unique(ydat))
idx = vcat(idx, Int.(sample(((scnt * (i - 1)) + 1):(scnt * i), ntst, replace = false)));
end
xtrn = xdat[.!in.(1:length(ydat), Ref(Set(idx))), :];
ytrn = ydat[.!in.(1:length(ydat), Ref(Set(idx)))];
xtst = xdat[idx, :];
ytst = ydat[idx];
return (xtrn, ytrn, xtst, ytst);
end

Extract the training and testing datasets using the function above as follows:



xtrn, ytrn, xtst, ytst = partition(xdat, ydat);
dtrn = minibatch(Float32.(xtrn'), ytrn, 10);
dtst = minibatch(Float32.(xtst'), ytst, 10);

All three codes above extract xtrn, the training data (feature) matrix of size 105 × 4 (105 observations by 4 features) dimensions; ytrn, the corresponding training target variable with 105 × 1 dimension; xtst, the feature matrix for testing dataset with 45 × 4 dimensions; and ytst, the target variable with 45 × 1 dimension for testing dataset. Moreover, contrary to TensorFlow-Keras, Knet.jl and Flux.jl need further data preparation from the above partitions. In particular, Knet.jl takes minibatch object as input data for model training, while Flux.jl needs one-hot encoding for the target variables ytrn and ytst. Further, unlike Knet.jl which ships with minibatch function, Flux.jl gives the user the flexibility to create their own.

Specify the Model

The model that we are going to use is a Multilayer Perceptron with the following architecture: 4 neurons for the input layer, 10 neurons for the hidden layer, and 3 neurons for the output layer. The first two layers contain bias, and the neurons of the last two layers are activated with Rectified Linear Unit (ReLU) and softmax functions, respectively. The diagram below illustrates the architecture described:


The codes below specify the model:



# Define the dense layer
struct Dense; w; b; f; end
Dense(i::Int, o::Int, f = relu) = Dense(param(o, i), param0(o), f); # constructor
(d::Dense)(x) = d.f.(d.w * x .+ d.b); # define method for dense layer
# Define the chain layer
struct Chain; layers; end
(c::Chain)(x) = (for l in c.layers; x = l(x); end; x); # define method for feed-forward
(c::Chain)(x, y) = nll(c(x), y, dims = 1); # define method for negative-log likelihood loss function
# Specify the model
model = Chain((Dense(size(xtrn, 2), 10), Dense(10, 3, identity)));

Coming from TensorFlow-Keras, Flux.jl provides Keras-like API for model specification, with Flux.Chain as the counterpart for Keras’ Sequential. This is different from Knet.jl where the highest level API you can get are the nuts and bolts for constructing the layers. Having said, however, Flux.Dense is defined almost exactly as the Dense struct of the Knet.jl code above (check the source code here). In addition, since both Flux.jl and Knet.jl are written purely in Julia, makes the source codes under the hood accessible to beginners. Thus, giving the user a full understanding of not just the code, but also the math. Check the screenshots below for the distribution of the file types in the Github repos of the three frameworks:



From the above figure, it’s clear that Flux.jl is 100% Julia. On the other hand, Knet.jl while not apparent is actually 100% Julia as well. The 41.4% of Jupyter Notebooks and other small percentages account for the tutorials, tests and examples and not the source codes.


Train the Model

Finally, train the model as follows for 100 epochs:



# Train the model for 100 epochs
function accuracy(m::Chain, d::Knet.Data)
_, yidx = findmax(m(d.x), dims = 1);
yprd = [i[1] for i in yidx];
return sum(yprd .== d.y) / length(d.y)
end
err = hcat(nll(model(dtrn.x), dtrn.y), nll(model(dtst.x), dtst.y))
acc = hcat(accuracy(model, dtrn), accuracy(model, dtst))
for x in adam(model, repeat(dtrn, 100))
global err = vcat(err, hcat(nll(model(dtrn.x), dtrn.y), nll(model(dtst.x), dtst.y)))
global acc = vcat(acc, hcat(accuracy(model, dtrn), accuracy(model, dtst)))
end
# Save loss and accuracy to csv for visualization
write("error-knet.csv", DataFrame(err, [:training, :testing]))
write("accuracy-knet.csv", DataFrame(acc, [:training, :testing]))

The codes (referring to Julia codes) above save both loss and accuracy for every epoch into a data frame and then into a CSV file. These will be used for visualization. Moreover, unlike Flux.jl and Knet.jl which require minibatch preparation prior to training, TensorFlow-Keras specifies this on fit method as shown above. Further, it is also possible to train the model in Knet.jl using a single function without saving the metrics. This is done as follows:


adam!(model, repeat(dtrn, 100));

The Flux.jl code above simply illustrates the use of Flux.@epochs macro for looping instead of the for loop. The loss of the model for 100 epochs is visualized below across frameworks:

From the above figure, one can observe that Flux.jl had a bad starting values set by the random seed earlier, good thing Adam drives the gradient vector rapidly to the global minimum. The figure was plotted using Gadfly.jl. Install this package using Pkg as described in the first code block, along with Cario.jl and Fontconfig.jl. The latter two packages are used to save the plot in PNG format, see the code below to reproduce:

using Cairo
using Compose
using DataFrames
using Fontconfig
using CSV
using Gadfly
using Measures
Gadfly.push_theme(:dark)
tf = CSV.read("error-tf.csv");
kn = CSV.read("error-knet.csv");
fl = CSV.read("error-flux.csv");
tf_tl = tf[:, :loss];
tf_vl = tf[:, :val_loss];
kn_tl = kn[:, :training];
kn_vl = kn[:, :testing];
fl_tl = fl[:, :training];
fl_vl = fl[:, :testing];
lossdf = DataFrame(
Epochs = repeat(1:100, outer = 6),
Loss = vcat(
vcat(kn_tl[2:10:end], fl_tl[2:end], tf_tl),
vcat(kn_vl[2:10:end], fl_vl[2:end], tf_vl)
),
Dataset = repeat(["Train", "Test"], inner = 300),
Frameworks = repeat(repeat(["Knet.jl", "Flux.jl", "TensorFlow"], inner = 100), outer = 2)
);
p = plot(
lossdf,
x = :Epochs,
y = :Loss,
xgroup = :Frameworks,
color = :Dataset,
Guide.colorkey(pos = [.05w, -0.35h]),
Geom.subplot_grid(Geom.line)
);
p |> PNG("loss.png", 7inch, 4.7inch, dpi = 200)

Evaluate the Model

The output of the model ends with a vector of three neurons. The index or location of the neurons in this vector defines the corresponding integer encoding, with 1st index as setosa, 2nd as versicolor, and 3rd as virginica. Thus, the codes below take the argmax of the vector to get the integer encoding for evaluation.



# Predict species
_, trn_yidx = findmax(model(dtrn.x), dims = 1); # training set
trn_yprd = [i[1] for i in trn_yidx];
_, tst_yidx = findmax(model(dtst.x), dims = 1); # testing set
tst_yprd = [i[1] for i in tst_yidx];
# Check accuracy of the model
accuracy(model, dtrn) #> 0.9714285714285714
accuracy(model, dtst) #> 0.9555555555555556

The figure below shows the traces of the accuracy during training:

TensorFlow took 25 epochs before surpassing 50% again. To reproduce the figure, run the following codes (make sure to load Gadfly.jl and other related libraries mentioned earlier in generating the loss plots):

tf = CSV.read("error-tf.csv");
kn = CSV.read("accuracy-knet.csv");
fl = CSV.read("accuracy-flux.csv");
tf_ta = tf[:, :acc];
tf_va = tf[:, :val_acc];
kn_ta = kn[:, :training];
kn_va = kn[:, :testing];
fl_ta = fl[:, :training];
fl_va = fl[:, :testing];
lossdf = DataFrame(
Epochs = repeat(1:100, outer = 6),
Accuracy = vcat(
vcat(kn_ta[2:10:end], fl_ta[2:end], tf_ta),
vcat(kn_va[2:10:end], fl_va[2:end], tf_va)
),
Dataset = repeat(["Train", "Test"], inner = 300),
Frameworks = repeat(repeat(["Knet.jl", "Flux.jl", "TensorFlow"], inner = 100), outer = 2)
);
p = plot(
lossdf,
x = :Epochs,
y = :Accuracy,
xgroup = :Frameworks,
color = :Dataset,
Guide.colorkey(pos = [.86w, 0.21h]),
Geom.subplot_grid(Geom.line)
);
p |> PNG("accuracy.png", 7inch, 4.7inch, dpi = 200)

Benchmark

At this point, we are going to record the training time of each framework.



@time begin
err = hcat(nll(model(dtrn.x), dtrn.y), nll(model(dtst.x), dtst.y))
acc = hcat(accuracy(model, dtrn), accuracy(model, dtst))
for x in adam(model, repeat(dtrn, 100))
global err = vcat(err, hcat(nll(model(dtrn.x), dtrn.y), nll(model(dtst.x), dtst.y)))
global acc = vcat(acc, hcat(accuracy(model, dtrn), accuracy(model, dtst)))
end
end
#> 0.268033 seconds (579.36 k allocations: 94.214 MiB, 3.98% gc time)
# or even faster without saving the loss and accuracy
@time adam!(model, repeat(dtrn, 100));
#> 0.158206 seconds (393.18 k allocations: 22.853 MiB, 2.33% gc time)

The benchmark was done by running the above code repeatedly for about 10 times for each framework, I then took the lowest timestamp out of the results. In addition, before running the code for each framework, I keep a fresh start of my machine.

The code of the above figure is given below (make sure to load Gadfly.jl and other related libraries mentioned earlier in generating the loss plots):

benchmark = DataFrame(
Time = [0.268033, 0.158206, 0.278393, 0.261845, 1.511277675628662, 0],
Frameworks = repeat(["Knet.jl", "Flux.jl", "TensorFlow"], inner = 2),
Type = repeat(["w/ Metrics Tracker", "w/o Metrics Tracker"], outer = 3)
)
p = plot(
benchmark,
x = :Time,
y = :Frameworks,
color = :Type,
Guide.xlabel("Training Time in Seconds (Shorter is Better)"),
Guide.colorkey(title = "Code Type", pos = [.7w, 0.35h]),
Geom.bar(position = :dodge, orientation = :horizontal),
Guide.annotation(
compose(
context(),
(context(), Compose.text(0.05w, 0.27h, "Even if we set metrics=None, TensorFlow tracks loss by default."), Compose.fill("#a0a0a0"))
)
)
);
p |> PNG("benchmark.png", 6.5inch, 4inch, dpi = 200)

Conclusion

In conclusion, I would say Julia is worth investing even for deep learning as illustrated in this article. The two frameworks, Flux.jl and Knet.jl, provide a clean API that introduces a new way of defining models, as opposed to the object-oriented approach of the TensorFlow-Keras. One thing to emphasize on this is the for loop which I plainly added in training the model just to save the accuracy and loss metrics. The for loop did not compromise the speed (though Knet.jl is much faster without it). This is crucial since it let’s the user spend more on solving the problem and less on optimizing the code. Further, between the two Julia frameworks, I find Knet.jl to be Julia + little-else, as described by Professor Deniz Yuret (the main developer), since there are no special APIs for Dense, Chains, etc., you have to code it. Although this is also possible for Flux.jl, but Knet.jl don’t have these out-of-the-box, it ships only with the nuts and bolts, and that’s the highest level APIs the user gets. Having said, I think Flux.jl is a better recommendation for beginners coming from TensorFlow-Keras. This is not to say that Knet.jl is hard, it’s not if you know Julia already. In addition, I do love the extent of flexibility on Knet.jl by default which I think is best for advanced users. Lastly, just like the different extensions of TensorFlow, Flux.jl is flexible enough that it works well with Turing.jl for doing Bayesian deep learning, which is a good alternative for TensorFlow Probability. For Neural Differential Equations, Flux.jl works well with DifferentialEquations.jl, checkout DiffEqFlux.jl.

Next Steps

In my next article, we will explore the low level APIs of Flux.jl and Knet.jl in comparison to the low level APIs of TensorFlow. One thing that’s missing also from the above exercise is the use of GPU for model training, and I hope to tackle this in future articles. Finally, I plan to test these Julia libraries on real deep learning problems, such as computer vision and natural language processing (checkout the workshop on these from JuliaCon 2018).

Complete Codes

If you are impatient, here are the complete codes excluding the benchmarks and the plots. These should work after installing the required libraries shown above:



using CSV: write
using DataFrames: DataFrame
using Knet
using Random
using RDatasets
using StatsBase: sample
Random.seed!(123);
iris = dataset("datasets", "iris");
xdat = Matrix(iris[:, 1:4]);
ydat = iris[:, 5];
ydat = map(x -> x == "setosa" ? 1 : x == "versicolor" ? 2 : 3, ydat);
function partition(xdat::Array{<:AbstractFloat, 2}, ydat::Array{<:Int, 1}, ratio::AbstractFloat = 0.3)
scnt = size(xdat, 1) / length(unique(ydat));
ntst = Int(ceil((size(xdat, 1) * ratio) / length(unique(ydat))));
idx = Int.(sample(1:(length(ydat) / length(unique(ydat))), ntst, replace = false));
for i in 2:length(unique(ydat))
idx = vcat(idx, Int.(sample(((scnt * (i - 1)) + 1):(scnt * i), ntst, replace = false)));
end
xtrn = xdat[.!in.(1:length(ydat), Ref(Set(idx))), :];
ytrn = ydat[.!in.(1:length(ydat), Ref(Set(idx)))];
xtst = xdat[idx, :];
ytst = ydat[idx];
return (xtrn, ytrn, xtst, ytst);
end
xtrn, ytrn, xtst, ytst = partition(xdat, ydat);
dtrn = minibatch(Float32.(xtrn'), ytrn, 10, shuffle = true);
dtst = minibatch(Float32.(xtst'), ytst, 10);
# Define the dense layer
struct Dense; w; b; f; end
Dense(i::Int, o::Int, f = relu) = Dense(param(o, i), param0(o), f); # constructor
(d::Dense)(x) = d.f.(d.w * x .+ d.b); # define method for dense layer
# Define the chain layer
struct Chain; layers; end
(c::Chain)(x) = (for l in c.layers; x = l(x); end; x); # define method for feed-forward
(c::Chain)(x, y) = nll(c(x), y, dims = 1); # define method for negative-log likelihood loss function
# Specify the model
model = Chain((Dense(size(xtrn, 2), 10), Dense(10, 3, identity)));
# Train the model for 100 epochs
function accuracy(m::Chain, d::Knet.Data)
_, yidx = findmax(m(d.x), dims = 1);
yprd = [i[1] for i in yidx];
return sum(yprd .== d.y) / length(d.y)
end
err = hcat(nll(model(dtrn.x), dtrn.y), nll(model(dtst.x), dtst.y))
acc = hcat(accuracy(model, dtrn), accuracy(model, dtst))
for x in adam(model, repeat(dtrn, 100))
global err = vcat(err, hcat(nll(model(dtrn.x), dtrn.y), nll(model(dtst.x), dtst.y)))
global acc = vcat(acc, hcat(accuracy(model, dtrn), accuracy(model, dtst)))
end
# Save loss and accuracy to csv for visualization
write("error-knet.csv", DataFrame(err, [:training, :testing]))
write("accuracy-knet.csv", DataFrame(acc, [:training, :testing]))
# or even faster without saving the loss
# adam!(model, repeat(dtrn, 100));
# Predict species
_, trn_yidx = findmax(model(dtrn.x), dims = 1); # training set
trn_yprd = [i[1] for i in trn_yidx];
_, tst_yidx = findmax(model(dtst.x), dims = 1); # testing set
tst_yprd = [i[1] for i in tst_yidx];
# Check accuracy of the model
accuracy(model, dtrn)
accuracy(model, dtst)
view raw knet.jl hosted with ❤ by GitHub

References

Software Versions

========
Julia
========
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
JULIA = /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia
JULIA_EDITOR = "/Applications/Visual Studio Code.app/Contents/Frameworks/Code Helper.app/Contents/MacOS/Code Helper"
========
Python
========
3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)]