Author Archives: Tamás K. Papp

Setting up Julia with continuous integration on Gitlab

Re-posted from: https://tamaspapp.eu/post/julia-ci-gitlab/

As an academic, I picked up good practices for programming mostly by osmosis. My approach to “testing” software went through the following stages:

See if the code runs (this got me through my undergrad years).
Check if the results “look OK”, then forget about testing.
Compare results occasionally to known results from papers or other code (eg in a different language).
Write some unit tests ex post, as an afterthought after the code is finished (time pressure helps to ensure that overtesting is never a problem).
Use unit tests from the very beginning, especially before optimizing and refactoring code.
Set up automatic testing, as part of continuous integration.

I think that 1–3 above is a pretty usual path, naturally traversed after the recognition that some testing would be nice, but lacking the knowledge of how to implement it in a consistent manner. This is comparable to using copies of directories as crude attempts at “version control”.

Later, I picked up 4–6 while being exposed to these ideas when learning about new languages. Automated unit testing is one of those things one does not miss until learning about it, then subsequently cannot imagine doing without. In a research context, the two main advantages are scientific integrity — I should make a best effort to ensure that my results are correct — and dealing with bugs early. While the first one is abstract and philosophical, the second is a practical concern: I found that if I skimp on testing, the bugs show up much later, usually at an inconvenient time, and I will have to spent time locating the bug (not always trivial, especially with numerical code) and switch context to something I was working on months ago. It is my experience that while tests can be tedious to write, time spent on them is a very good investment.

I knew about unit tests before coming to Julia, but learned about automated CI in the Julia community. This was because package template libraries “do the right thing” by making it very easy to set up an automated testing framework: for example, PkgDev.generate creates the appropriate test configuration for Travis CI and various coverage services.

I never cease to be amazed by the fact that these services are available for free for public / open source projects, which is very generous of these providers. However, occasionally one would like to keep the project private for a little while. The usual scenario for me is working on code that is related to a paper, which I plan to make public with the latter; in this case one would need the pro (non-free) version of Travis and related tools.

Alternatively, Gitlab offers CI/CD with private repositories. I am exploring this at the moment for various projects, and boiled down the necessary configuration into the repository GitlabJuliaDemo.jl. It has

a CI setup for Pkg.test,
a coverage summary as a percentage.

While coverage analysis could be automated too with a custom Docker image, I leave his for future work.¹

So far, I am very satisfied with Gitlab. The interface is well-designed, clean, and intuitive; tests complete in a few minutes (just like Travis).

In the next post I will talk about local coverage analysis in Julia. ^[return]

Continuous integration for Julia packages using Docker

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/travis-docker-julia-ci/

This post may be useful for maintainers of Julia packages which require a large binary dependencies on CI services like Travis.

I have recently started using Kristoffer Carlsson’s excellent PGFPlotsX for plotting. The package is a thin wrapper which emits LaTeX code for use with pgfplots, which is extremely versatile and well-documented.¹ However, since most of the action happens in LaTeX, unit testing requires a lot of binary dependencies, including the TeXLive suite and some related packages. This is not a problem on one’s own machine where these would need to be installed just once, but when I submitted PRs, tests on Travis timed out more often than not because it had to install all of these for every run using apt-get.

The documentation of Travis suggested that docker may be a solution for such cases, and I have been looking an opportunity to experiment with it anyway. After reading their tutorial it was relatively quick to produce an image based on plain vanilla Ubuntu 17.10, which is available as a docker image to build on, and the required TeXLive and related packages, plus some utilities.

During building the image, I download the binaries for the stable version Julia, while nightly is downloaded on demand. This speeds up CI by 40–50 seconds for stable.

This is how it is run:

the directory of the Julia package is mounted in the container at /mnt,
Pkg.clone() and testing proceed as usual,
coverage results are copied back to /mnt when done.

The resulting image runs in 3–4 minutes consistently. In case someone finds it useful for Julia packages with similarly large binary dependencies, I made it available as texlive-julia-minimal-docker on Github.² Naturally, for projects with other large binary dependencies, one would install different Ubuntu packages or binaries.

Using this package accelerated my plotting workflow in Julia. A post on this will follow soon. ^[return]
“Minimal” turns out to be a misnomer, since some dependencies end up requiring X11 and the image is >700GB. ^[return]

Working with large Julia source files in Emacs

By: Tamás K. Papp

Re-posted from: https://tamaspapp.eu/post/large-files-julia/

When writing software, especially libraries, a natural question is how to organize source code into files. Some languages, eg Matlab, encourage a very fragmented style (one function per file), while for some other languages (C/C++), a separation between the interface (.h) and the implementation (.c/.cpp) is traditional.

Julia has no such constraint: include allows the source code for a module to be organized into small pieces, possibly scattered in multiple directories, or it can be a single monolithic piece of code. The choice on this spectrum is up to the authors, and is largely a matter of personal preference.

When I started working with Julia, I was following the example of some prominent packages, and organized code into small pieces (~ 500 LOC). Lately, whenever I refactored my code, I ended up putting it in a single file.

I found the following Emacs tools very helpful for navigation.

Form feeds and page-break-lines-mode

Form feed, or \f, is an ASCII control character that was used to request a new page in line printers. Your editor may display it as ^L. It has a long history of being used as a separator, and Emacs supports it in various ways.

By default, C-x [ and C-x ] take you to the previous and next form feed separators. Combined with numerical prefixes, eg C-3 C-x [, you can jump across multiple ones very quickly. Other commands with page in their name allow narrowing, marking, and other functions.

Many Emacs packages provide extra functionality for page breaks. My favorite is page-break-lines, which replaces ^L with a horizontal line, so that the output looks like this:

export
    ML_estimator

# general API

"""
    ML_estimator(ModelType, data...)

Estimate `ModelType` using maximum likelihood
on `data`, which  is model-specific.
"""
function ML_estimator end

Finding things quickly

I am using helm pervasively. helm-occur is very handy for listing all occurrences of something, and navigating them. The following is an except from base/operators.jl, looking for isless:

operators.jl:213:types with a canonical total order should implement `isless`.
operators.jl:227:<(x, y) = isless(x, y)
operators.jl:300:# this definition allows Number types to implement < instead of isless,
operators.jl:302:isless(x::Real, y::Real) = x<y
operators.jl:303:lexcmp(x::Real, y::Real) = isless(x,y) ? -1 : ifelse(isless(y,x), 1, 0)

You can move across these matches, jump to one in an adjacent buffer while keeping this list open, or save the list for later use. Its big brother helm-do-grep-ag is even more powerful, using ag to find something in a directory tree.

With these two tools, I find navigating files around 5K LOC very convenient — the better I learn Emacs, the larger my threshold for a “large” file becomes.¹

In case you are wondering, the largest files are around 6K LOC in Julia Base at the moment. ^[return]

juliabloggers.com

A Julia Language Blog Aggregator