Author Archives: Dean Markwick's Blog -- Julia

Crypto Data using AlphaVantatge.jl

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2021/03/27/CryptoAlphaVantage.html

Julia 1.6 is hot off the press, so I’ve installed it and fired off this quick blog post to give 1.6 a test drive. So far, so good and there is a real decrease now in the latencies in both loading up packages and getting things going.

AlphaVantage have data on cryptocurrencies and not just stocks and fx. Each of which are implemented in AlphaVantage.jl. This is a simple blogpost that takes you through each function and how it might be useful to analyse cryptocurrencies.

Firstly, what coins are available? Over 500 (542 to be precise). Now as a crypto tourist, I’m only really familiar with the most popular ones that are causing the headlines. So I’ve taken the top 10 from coinmarketcap and will use those to demonstrate what AlphaVantage can do.

using AlphaVantage
using Plots
using DataFrames, DataFramesMeta
using CSV, Dates, Statistics

ccys = ["BTC", "ETH", "ADA", "DOT", "BNB", "USDT", "XRP", "UNI", "THETA", "LTC"]

FCAS Health Index from Flipside Crypto

AlphaVantage have partnered with Flipside Crypto to provide their ratings of different coins. This is designed to give some further info on different coins rather than just looking at what recently increased massively.

ratings = crypto_rating.(ccys);

Simple broadcasted call to get the ratings for each of the 10 currencies above. We format the response into a dataframe and get a nice table out. Not all the coins have a rating, so we have to filter out any empty ratings.

inds = findall(.!isempty.(ratings))
ratingsFrame = vcat(map(x->DataFrame(x["Crypto Rating (FCAS)"]), ratings[inds])...)
rename!(ratingsFrame, Symbol.(["Symbol", "Name", "Rating", "Score", "DevScore", "Maturity", "Utility", "LastRefresh", "TZ"]))
for col in (:Score, :DevScore, :Maturity, :Utility)
    ratingsFrame[!, col] .= parse.(Int64, ratingsFrame[!, col])
end
ratingsFrame

7 rows × 9 columns (omitted printing of 1 columns)

Symbol Name Rating Score DevScore Maturity Utility LastRefresh
String String String Int64 Int64 Int64 Int64 String
1 BTC Bitcoin Superb 910 868 897 965 2021-03-26 00:00:00
2 ETH Ethereum Superb 973 966 896 997 2021-03-26 00:00:00
3 ADA Cardano Superb 964 969 931 966 2021-03-26 00:00:00
4 BNB Binance Coin Attractive 834 745 901 932 2021-03-26 00:00:00
5 XRP XRP Attractive 842 881 829 794 2021-03-26 00:00:00
6 THETA THETA Caution 588 726 915 353 2021-03-26 00:00:00
7 LTC Litecoin Attractive 775 652 899 905 2021-03-26 00:00:00

Three superb, three attractive and one caution. THETA gets a lower utility score which is dragging down its overal rating. By the looks of it, THETA is some sort of streaming/YouTube-esque project, get paid their token by giving your excess computing power to video streams. There website is here and I’ll let you judge whether they deserve that rating.

To summarise briefly each of the ratings is on a 0 to 1000 scale in three different areas:

  • Developer Score (DevScore)

Things like code changes, improvements all taken from the repositories of the coins.

  • Market Maturity (Maturity)

This looks at the market conditions around the coin, so things like liquidity and volatility.

  • User Activity (Utility)

On chain activities, network activity and transactions, so is the coin being used for something actually useful. Hence why you can see why ETH is ranked the highest here.

More details are on their website here.

Timeseries Data

AlphaVantage also offer the usual time series data at daily, weekly and monthly frequencies. Hopefully you’ve read my other posts (basic market data and fundamental data), so this is nothing new!

Now for each 10 tokens we can grab their monthly data and calculate some stats and plot some graphs.

monthlyData = digital_currency_monthly.(ccys[inds], datatype = "csv");

Again formatting the returned data into a nice dataframe gives us a monthly view of the price action for each of the currencies. I format the date column, calculate the monthly log return and cumulative log return.

function format_data(x, ccy)
    df = DataFrame(x[1])
    rename!(df, Symbol.(vec(x[2])), makeunique=true)
    df[!, :timestamp] = Date.(df[!, :timestamp])
    sort!(df, :timestamp)
    df[!, :Return] = [NaN; diff(log.(df[!, Symbol("close (USD)")]))]
    df[!, :CumReturn] = [0; cumsum(diff(log.(df[!, Symbol("close (USD)")])))]
    df[!, :Symbol] .= ccy
    df
end

prices = vcat(map(x -> format_data(x[1], x[2]), zip(monthlyData, ccys[inds]))...)
first(prices, 5)

5 rows × 14 columns (omitted printing of 7 columns)

timestamp open (USD) high (USD) low (USD) close (USD) open (USD)_1 high (USD)_1
Date Any Any Any Any Any Any
1 2018-08-31 7735.67 7750.0 5880.0 7011.21 7735.67 7750.0
2 2018-09-30 7011.21 7410.0 6111.0 6626.57 7011.21 7410.0
3 2018-10-31 6626.57 7680.0 6205.0 6371.93 6626.57 7680.0
4 2018-11-30 6369.52 6615.15 3652.66 4041.32 6369.52 6615.15
5 2018-12-31 4041.27 4312.99 3156.26 3702.9 4041.27 4312.99
returnPlot = plot(prices[!, :timestamp], prices[!, :CumReturn], group=prices[!, :Symbol],
                  title="Cummulative Return",
                  legend=:topleft)
mcPlot = plot(prices[!, :timestamp], 
              prices[!, Symbol("market cap (USD)")] .* prices[!, Symbol("close (USD)")], 
              group=prices[!, :Symbol],
              title="Market Cap",
              legend=:none)

plot(returnPlot, mcPlot)

svg

There we go, solid cumulative monthly returns (to the moon!) but bit of a decline in market cap recently after a week of negative returns. If you want higher frequencies there is always

  • digital_currency_daily
  • digital_currency_weekly

which will return the same type of data, just indexed differently.

Is the Rating Correlated with Monthly Trading Volume?

We’ve got two data sets, now we want to see if we can explain some the crypto scores with how much is traded each month. For this we simply take the monthly data, average the monthly volume traded and join it with the ratings dataframe.

gdata = groupby(prices, :Symbol)
avgprices = @combine(gdata, MeanVolume = mean(:volume .* cols(Symbol("close (USD)"))))
avgprices = leftjoin(avgprices, ratingsFrame, on=:Symbol)

7 rows × 10 columns (omitted printing of 2 columns)

Symbol MeanVolume Name Rating Score DevScore Maturity Utility
String Float64 String? String? Int64? Int64? Int64? Int64?
1 BTC 2.473e10 Bitcoin Superb 910 868 897 965
2 ETH 9.6248e9 Ethereum Superb 973 966 896 997
3 ADA 2.6184e9 Cardano Superb 964 969 931 966
4 BNB 3.7598e9 Binance Coin Attractive 834 745 901 932
5 XRP 3.28003e9 XRP Attractive 842 881 829 794
6 THETA 6.36549e8 THETA Caution 588 726 915 353
7 LTC 1.71448e9 Litecoin Attractive 775 652 899 905

Visually, lets just plot the different scores on the x-axis and the monthly average volume on the y-axis. Taking logs of both variables stops BTC dominating the plots.

scorePlots = [plot(log.(avgprices[!, x]), 
                   log.(avgprices.MeanVolume), 
                   seriestype=:scatter, 
                   series_annotations = text.(avgprices.Symbol, :bottom),
                   legend=:none, 
                   title=String(x)) 
    for x in (:Score, :DevScore, :Maturity, :Utility)]
plot(scorePlots...)

svg

Solid linear relationship in the score and dev score metrics, not so much for the maturity and utility scores. Of course, as this is a log-log plot a linear relationship indicates power law behaviour.

Side note though, the graphs are a bit rough around the edges, labels are overlapping and even crossing though the axis. Julia needs a ggrepel equivalent.

Summary

Much like the other functions in AlphaVantage.jl everything comes through quite nicely and once you have the data its up to you to find something interesting!

Proper Bayesian Estimation of a Point Process in Julia

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2020/11/03/BayesPointProcess.html

I know how to use Stan and I know how to use Turing. But how do
those packages perform the posterior sampling for the underlying
models. Can I write a posterior distribution down and get
AdvancedHMC.jl to sample it? This is exactly what I want to do with
a point process where the posterior distribution of the model is a
touch more complicated than your typical regression problems.

This post will take you through my thought process and how you got from an idea, to a simulation of that idea, frequentist estimation of the simulated data and then a full Bayesian sampling of the problem.

But first, these are the Julia libraries that we will be using.

using Plots
using PlotThemes
using StatsPlots
using Distributions

Inhomogeneous Point Processes

A point process basically describes the time when something happens. That “thing” we can call an event and they happen between \(0\) and some maximum time \(T\). We describe the probability of an event happening at time \(t\) with an intensity \(\lambda\). Specifically we are going to use 4 different parameters for a polynomial.

\[\lambda (t) = \exp \left( \beta _0 + \beta _1 t + \beta _2 t^2 + \beta _3 ^2 t^3 \right)\]

We take the exponent to ensure that the function is positive throughout the time period. What does this look like? We can simple plot the function from 0 to 100 with some random values for the \(\beta _i\)s.

λ(t::Number, params::Array{<:Number}) = exp(params[1] + params[2]*(t/100) + params[3]*(t/100)^2 + params[4]*(t/100)^3)
λ(t::Array{<:Number}, params::Array{<:Number}) = map(x-> λ(x, params), t)

testParams = [3, -0.5, -0.8, -2.9]
maxT = 100

plot(λ(collect(0:maxT), testParams), label=:none)

This looks like something that definitely changes over time. When
\(\lambda(t)\) is high we expect more events and likewise when it is
low there will be fewer events.

Simulating by Thinning

Let us simulate a point process using this intensity function. To do
so we use a procedure called thinning. This can be explained as a
three step process:

  1. Firstly simulate a constant Poisson process with intensity \(\lambda ^\star\) which is greater than \(\lambda (t)\) for all \(t\). This gives the un-thinned events, \(t^*_i\).
  2. For each un-thinned event calculate the probability it will become one of the final events as \(\frac{\lambda (t^*_i)}{\lambda ^\star}\).
  3. Sample from these probabilities to get the final events.

Simple enough to code up in a few lines of Julia.

lambdaMax = maximum(λ(collect(0:0.1:100), testParams)) * 1.1
rawEvents = rand(Poisson(lambdaMax * maxT), 1)[1]
unthinnedEvents = sort(rand(Uniform(0, maxT), rawEvents))
acceptProb = λ(unthinnedEvents, testParams) / lambdaMax
events = unthinnedEvents[rand(length(unthinnedEvents)) .< acceptProb];
histogram(events,label=:none)

svg

A steady decreasing amount of events following the intensity function from above.

Maximum Likelihood Estimation

The log likelihood of a point process can be written as:

\[\mathcal{L} = \Sigma _{i = 1} ^N \log \lambda (t_i) – \int _0 ^T \lambda (t) \mathrm{d} t\]

Again, easy to write the code for this. The only technical difference is I am using the QuadGK.jl package to numerically integrate the function rather than doing the maths myself. This keeps it simple and also flexible if we decided to change the intensity function later.

function likelihood(params, rate, events, maxT)
    sum(log.(rate(events, params))) - quadgk(t-> rate(t, params), 0, maxT)[1]
end

For maximum likelihood estimation we simply pass this function through to an optimiser and find the maximum point. As optimize actually finds minimum points we have to invert the function.

using Optim
using QuadGK
opt = optimize(x-> -1*likelihood(x, λ, events, maxT), rand(4))
plot(λ(collect(0:maxT), testParams), label="True")
plot!(λ(collect(0:maxT), Optim.minimizer(opt)), label = "MLE")

svg

Not a bad result! Our estimated intensity function is pretty close to
the actual function. So now we know that we can both simulate from a inhomogeneous point
process and that our likelihood can infer the correct parameters.

Bayesian Inference

Now for the good stuff. All of the above is needed for the Bayesian inference procedure. If you can’t get the maximum likelihood working for a relatively simple problem like above, adding in the complications of Bayesian inference will just get you knotted up without any results. So with the good results from above let us proceed to the Bayes methods. With the AdvancedHMC.jl package I can use all the fancy MCMC algos and upgrade from the basic Metropolis Hastings sampling.

I’ve shamelessly copied the README from AdvancedHMC.jl and changed the bits needed for this problem.

using AdvancedHMC, ForwardDiff

D = 4; initial_params = rand(D)

n_samples, n_adapts = 5000, 2000

target(x) = likelihood(x, λ, events, maxT) + sum(logpdf.(Normal(0, 5), x))

metric = DiagEuclideanMetric(D)
hamiltonian = Hamiltonian(metric, target, ForwardDiff)

initial_ϵ = find_good_stepsize(hamiltonian, initial_params)
integrator = Leapfrog(initial_ϵ)
proposal = NUTS{MultinomialTS, GeneralisedNoUTurn}(integrator)
adaptor = StanHMCAdaptor(MassMatrixAdaptor(metric), StepSizeAdaptor(0.8, integrator))

samples1, stats1 = sample(hamiltonian, proposal, initial_params, 
                        n_samples, adaptor, n_adapts; progress=true);
samples2, stats2 = sample(hamiltonian, proposal, initial_params, 
                        n_samples, adaptor, n_adapts; progress=true);

Samples done, now to manipulate the results to get the parameter
estimation.

a11 = map(x -> x[1], samples1)
a12 = map(x -> x[1], samples2)
a21 = map(x -> x[2], samples1)
a22 = map(x -> x[2], samples2)
a31 = map(x -> x[3], samples1)
a32 = map(x -> x[3], samples2)
a41 = map(x -> x[4], samples1)
a42 = map(x -> x[4], samples2)

bayesEst = map( x -> mean(x[1000:end]), [a11, a21, a31, a41])
bayesLower = map( x -> quantile(x[1000:end], 0.25), [a11, a21, a31, a41])
bayesUpper = map( x -> quantile(x[1000:end], 0.75), [a11, a21, a31, a41])
density(a21, label="Chain 1")
density!(a22, label="Chain 2")
vline!([testParams[2]], label="True")
plot!(-4:4, pdf.(Normal(0, 5), -4:4), label="Prior")

svg

The chains have sampled correctly and are centered around the correct
value. Plus it’s suitably different from the prior, which shows it has
updated with the information from the events.

plot(a11, label="Chain 1")
plot!(a12, label="Chain 2")

svg

Looking at the convergence of the chains is also positive. So for this
simple model, everything looks like it has worked correctly.

plot(λ(collect(0:maxT), testParams), label="True")
plot!(λ(collect(0:maxT), Optim.minimizer(opt)), label = "MLE")
plot!(λ(collect(0:maxT), bayesEst), label = "Bayes")

svg

Again, the bayesian estimate of the function isn’t too far from the true intensity. Success!

Conclusion

So what have I learnt after writing all this:

  • AdvancedHMC.jl is easy to use and despite all the scary terms and settings you can get away with the defaults.

What I have hopefully taught you after reading this:

  • Point process simulation through thinning.
  • What the likelihood of a point process looks like.
  • Maximum likelihood using Optim.jl
  • How to use AdvancedHMC.jl for that point process likelihood to get the posterior distribution.

Hawkes Processes and DIC

By: Dean Markwick's Blog -- Julia

Re-posted from: https://dm13450.github.io/2020/08/26/Hawkes-and-DIC.html

My post on the deviance information criteria on my blog is one of the most popular ones I’ve ever written. So to take that theoretical concept and apply it to my new package HawkesProcesses.jl and show you how to construct the different functions needed to calculate the DIC.

Firstly, a recap on the DIC,

\[\begin{align*}
\text{DIC} & = – 2 \log p (y \mid \hat{\theta}) + 2 p_\text{DIC}, \\
p_\text{DIC} & = 2 \left( \log p(y \mid \hat{\theta} ) – \mathbb{E} \left[
\log p (y \mid \theta ) \right] \right),
\end{align*}\]

where these components are balancing up both the variation in the
parameters and how well the parameters fit the model to come up with a
number to assess the overall performance. Think of it as a more
intelligent likelihood calculation.

To calculate the DIC we need to construct the posterior distribution of the Hawkes process

\[p(\theta \mid y) = p(y \mid \theta) p(\theta),\]

where \(y\) are the event times. We’ve got the likelihood, \(p(y \mid \theta)\), already exposed from the package, so just have to add on the prior distribution of the parameters.

using HawkesProcesses
using Distributions

function posterior(events::Array{<:Number}, bg::Number, kappa::Number, kernelParam::Number, maxT::Number)
    kernel = Distributions.Exponential(1/kernelParam)
    lik::Float64 = HawkesProcesses.likelihood(events, bg, kappa, kernel, maxT)
    bgPrior = logpdf(Distributions.Gamma(0.01, 0.01), bg)
    kappaPrior = logpdf(Distributions.Gamma(0.01, 0.01), bg)
    kernPrior = logpdf(Distributions.Gamma(0.01, 0.01), kernelParam)
    lik + bgPrior + kappaPrior + kernPrior
end

This allows us evaluate the posterior for one sample, but what about multiple samples? Thankfully in Julia you don’t get punished for using for loops, so we can simply iterate through all the samples to calculate the posterior values.

function posterior(events::Array{<:Number}, bg::Array{<:Number}, kappa::Array{<:Number}, kernelParam::Array{<:Number}, maxT::Number)
    posteriorVals = Array{Float64}(undef, length(bg))
    for i in 1:length(bg)
        posteriorVals[i] = posterior(events, bg[i], kappa[i], kernelParam[i], maxT)
    end
    posteriorVals
end

We’ve got some functions, now just need some events and parameter samples to put everything into practise. Lets set up a standard Hawkes process.

bg = 0.5
kappa = 0.5
kernel = Distributions.Exponential(1/0.5)

simEvents = HawkesProcesses.simulate(bg, kappa, kernel, 1000)

bgSamples, kappaSamples, kernelSamples = HawkesProcesses.fit(simEvents, 1000, 1000)

bgSamples = bgSamples[500:end]
kappaSamples = kappaSamples[500:end]
kernelSamples = kernelSamples[500:end]

(mean(bgSamples), mean(kappaSamples), mean(kernelSamples))
(0.41608630082628845, 0.5735136278565907, 0.45633101066029247)

The final posterior samples are quite close the actually values, which is reassuring! We can now calculate the components of the DIC.

posteriorSamples = posterior(simEvents, bgSamples, kappaSamples, kernelSamples, 1000)
posteriorMean = posterior(simEvents, mean(bgSamples), mean(kappaSamples), mean(kernelSamples), 1000)
pdic = 2*(posteriorMean - mean(posteriorSamples))
dic = -2*mean(posteriorSamples) + 2*pdic
2147.693328671947

There we have it, simple to calculate and can now be used to critique the model. For example, we could fit another Hawkes model with a different kernel, calculate the DIC using the new samples and compare the values, the better fitting model will have a lower DIC value.

Bonus: Multithreading

using BenchmarkTools

The above function for calculating the posterior across the parameter samples can be easily parallelised in Julia 1.5 with some multithreading. Giving Julia access to the threads and the decorating the for loop with Threads.@threads will give us an easy speed boost in calculating the values.

To let Julia know you’ve got threads available you’ll need to prefix
your Julia startup:

> NUM_JULIA_THREADS=4 julia

To see if it worked then call (in Julia)

Threads.nthreads()

which should print out what ever number you set it too above (or the
maximum number of threads you’ve got on your machine).

We can now adapt the posterior function to take advantage of
threads.

function posterior_threaded(events::Array{<:Number}, bg::Array{<:Number}, kappa::Array{<:Number}, kernelParam::Array{<:Number}, maxT::Number)
    posteriorVals = Array{Float64}(undef, length(bg))
    Threads.@threads for i in 1:length(bg)
        posteriorVals[i] = posterior(events, bg[i], kappa[i], kernelParam[i], maxT)
    end
    posteriorVals
end

Call both functions to make sure they are compiled then we are ready to benchmark them using the same data that we calculated the DIC with.

posterior(simEvents, bgSamples, kappaSamples, kernelSamples, 1000)
posterior_threaded(simEvents, bgSamples, kappaSamples, kernelSamples, 1000);

And now to benchmark:

benchmarkBasic = @benchmarkable posterior($simEvents, $bgSamples, $kappaSamples, $kernelSamples, $1000)
benchmarkThreaded = @benchmarkable posterior_threaded($simEvents, $bgSamples, $kappaSamples, $kernelSamples, $1000)

benchmarkBasic = run(benchmarkBasic, seconds=300)
benchmarkThreaded = run(benchmarkThreaded, seconds=300)
judge(median(benchmarkThreaded), median(benchmarkBasic))
BenchmarkTools.TrialJudgement: 
  time:   -36.63% => improvement (5.00% tolerance)
  memory: +0.00% => minvariant (1.00% tolerance)

There we have it, using 4 threads instead of just the 1 gives us a 35% time improvement without too much hard work, which is nice.