Author Archives: Tamás K. Papp

Emacs 25.2 on Ubuntu

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/emacs25-ubuntu/

Emacs is undoubtedly the most important program on my computers. On my
laptop, I use it to keep track of stuff with org-mode, read mail with mu4e, edit LaTeX with AUCTeX, and of course program. On servers, the first alias I define is usually qe='emacs -Q -nw', which give me a fast and responsive editor. With helm, doing just about anything (eg locating files, rgreping for something) is orders of magnitude faster and more convenient than any alternative I have tried.

I also try to keep up with the latest versions for software in general. Usually, whatever Ubuntu stable/Debian testing has is good enough not to justify the extra effort, but when I really need it, I grab the source and compile. That is usually only a minor hassle, but I try to restrict it to a few critical programs, otherwise it adds up. The major issue is not compiling, but having cruft in the filesystem (despite stow and checkinstall, it piles up). So I try to avoid it if I can.

Emacs 25.2 was released in April 2017, but there is no sign of an Ubuntu package for it yet. On various forums the PPA of kelleyk is recommended, but that does not have 25.2 for 17.04 (some files clash if you install previous versions).

Fortunately, Robert Bruce Park has now added Emacs 25.2 to the Ubuntu Emacs Lisp PPA, so having the latest of your favorite editor is only an add-apt-repository away. You may want to add a file to /etc/apt/preferences.d with contents

Package: *
Pin: release o=LP-PPA-ubuntu-elisp
Pin-Priority: 600

to make sure the right packages are installed.

Sampling variation in effective sample size estimates (MCMC)

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/ess-sampling/

Introduction

MCMC samples, used in Bayesian statistics, are not independent — in fact, unless one uses specialized methods or modern HMC, posterior draws are usually at highly autocorrelated. For independent draws,

\[
\text{variance of simulation mean} \propto \frac1N
\]

where \(N\) is the sample size, but for correlated draws, one has to scale the sample size with a factor

\[
\tau = \frac{1}{1+2\sum_{k=1}^\infty \rho_k}
\]

where \(\rho_k\) is the lag-\(k\) autocorrelation. \(\tau N\) is the effective sample size.

Usually, \(\rho_k\) is estimated from the data using the variogram

\[
V_k = \frac{1}{N-k} \sum_{i=1}^{N-k} x_i x_{i+k}
\]

from which we obtain

\[
\rho_k = 1-\frac{V_k}{2\text{var}(x)}
\]

where an estimator for the variance is also used. Then, to avoid using noisy estimates, we only add up to the last \(K\) where

\[
\rho_{K} + \rho_{K+1} \ge 0
\]

I will call \(K\) the last lag. Stan does something slightly different, using FFT for autocorrelations, and cutting off at the first negative \(\rho_K\), but for HMC this does not make a whole lot of difference.

The sampling variation

I was coding up the above calculation, and needed some unit tests. Surprisignly, I could not find anything on the sampling variation of \(\tau\), so I wrote some simulations in Julia (source code for everything). I did the following simulation exercise:

  1. for a given autocorrelation coefficient \(\phi\), simulate \(N\) draws from the AR(1) process
    \(
    x_t = \phi x_{t-1} + \sigma \epsilon_t
    \qquad
    \epsilon_t \sim \text{Normal}(0,1), \text{IID}
    \)
  2. calculate \(\tau\) and \(K\),
  3. repeat 1000 times and plot the results.

I use \(N=1000\) and \(N=10000\), as these would be typical sample sizes, first for a fairly efficient algorithm, then for a more stubborn but still manageable posterior.

IID samples

Let \(\phi=0\), then we expect \(\tau=1\) (red line in histogram, coefficient of variation on top).

Results with \(\phi=0\) (IID), \(N=1000\). (a) \(\tau\), (b) last lag \(K\), (c) scatterplot.

Results with $$\phi=0$$ (IID), $$N=1000$$. (a) $$\tau$$, (b) last lag $$K$$, (c) scatterplot.

Results with \(\phi=0\) (IID), \(N=10000\). (a) \(\tau\), (b) last lag \(K\), (c) scatterplot.

Results with $$\phi=0$$ (IID), $$N=10000$$. (a) $$\tau$$, (b) last lag $$K$$, (c) scatterplot.

With \(1000\) samples, there is a lot of variation in ESS: 800 could show up very easily in practice. \(600\) is not improbable either. Using up to \(10\) lags is not uncommon. For \(10000\) samples, the precision is improved considerably, we commonly use \(2\) or \(4\) lags. For both sample sizes, notice the high correlation between the last lag \(K\), and \(\tau\): given the method above, using more lags increases \(\tau^{-1}\), so this is to be expected.

AR(1) samples with \(\rho=0.5\)

This is a more autocorrelated process, here theory tells us that \(\tau\)=1/3.

Results with \(\phi=0.5\), \(N=1000\). (a) \(\tau\), (b) last lag \(K\), (c) scatterplot.

Results with $$\phi=0.5$$, $$N=1000$$. (a) $$\tau$$, (b) last lag $$K$$, (c) scatterplot.

Results with \(\phi=0.5\), \(N=10000\). (a) \(\tau\), (b) last lag \(K\), (c) scatterplot.

Results with $$\phi=0.5$$, $$N=10000$$. (a) $$\tau$$, (b) last lag $$K$$, (c) scatterplot.

Notice that \(\tau\) is now more dispersed, compared to the IID case. Even with 10000 samples, the coefficient of variation is 6%, with 1000 it is around 1/6. In practice, expect effective sample sizes all over the place.

AR(1) samples with \(\rho=0.8\)

This is an even more autocorrelated process, here theory tells us that \(\tau\)=1/9.

Results with \(\phi=0.8\), \(N=1000\). (a) \(\tau\), (b) last lag \(K\), (c) scatterplot.

Results with $$\phi=0.8$$, $$N=1000$$. (a) $$\tau$$, (b) last lag $$K$$, (c) scatterplot.

Results with \(\phi=0.8\), \(N=10000\). (a) \(\tau\), (b) last lag \(K\), (c) scatterplot.

Results with $$\phi=0.8$$, $$N=10000$$. (a) $$\tau$$, (b) last lag $$K$$, (c) scatterplot.

There is now so much variation that in order to get an estimate for ESS that we can use for comparing various MCMC implementations, we need to run much more than \(1000\) samples.

Conclusion

  1. For unit testing ESS calculations, I will need to use 10000 samples, with \(\pm10\) or similar error bands.

  2. As a rule of thumb, I would ignore less than 1.5x variation in ESS for 1000 samples, or run longer chains: it may be just random noise.

Bibliography

  • Gelman, Andrew, et al. 2013. Bayesian data analysis. 3rd edition. Chapman & Hall/CRC.
  • Stan Development Team. 2016. Stan Modeling Language Users Guide and Reference Manual, Version 2.15.0. http://mc-stan.org

Blog redesign

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/blog-redesign/

I am in the process of rebuilding my personal website using Hugo. I tried various themes, including hugo-academic, but in the process of adapting them to my needs I realized that it is less work to write one from scratch.

The result is now 80% ready (blog works, automatic listing of research papers will take some more work), and the source is on Github. It is available under the CC-BY-SA license, feel free to adapt parts from it, not that there is anything special in there.

Hugo is really an excellent framework, it is clean, logical, and allows a lot of code reuse. I wasted spent most of the time on fiddling with CSS, and I am still not 100% satisfied with the result, but at some point I decided to stop. SCSS was extremely useful for writing organized CSS.

I am especially satisfied with moving the code highlighting to the generator side. The only non-static parts are now Disqus and MathJax.