Measuring the prevalence of documentation, testing and continuous integration in Julia packages

Re-posted from: http://giordano.github.io/blog/2021-01-23-documentation-testing-julia/

Introduction

Documentation,
testing, and continuous
integration (CI) are some
of the key ingredients to make software maintinable and reliable in the long
term. During the first part of my (short) career in academia, when my daily
software tools were C, Fortran and IDL, I barely knew these practices (probably
didn’t know continuous integration at all), and never applied them in practice.
Some time around 5 years ago I started my journey with the Julia programming
language, and I learned to value and use these principles.

I have the general feeling that Julia makes documentation and testing very
simple and lowers the entry barriers for newcomers. Today, the Julia ecosystem
offers many tools about this:

Documenter.jl: a package to
generate HTML and PDF versions of the documentation
the Test standard library
provides very basic tools for testing, but it’s also extremely simple to use:
you don’t have a good reason for not testing your code
in addition to Test, there are third-party packages which offer more
advanced testing frameworks. Some of them are:
packages like PkgTemplates.jl
and PkgSkeleton.jl lets you
easily generate a new package with minimal setup for documentation, testing,
and continuous integration.

If you’re looking for tips about testing workflow in Julia, see these best
testing practices with
Julia
by Erik Engheim.

Analysing Julia packages in the General registry

Is my feeling that Julia makes documentation and testing easy actually true?
Prompted by the recent analysis of the prevalence of continuous integration in
JOSS made by my colleague
Jamie Quinn, I decided to look how Julia packages in the General
registry fare with regard to
documentation, testing and continuous integration. I quickly wrote a Julia
package, AnalyzeRegistry.jl
for this task: it clones all packages in the registry (only the last commit of
the default branch, to save time and bandwidth) and looks for some specific
files to decide whether a package uses documentation, testing and, continuous
integration.

The usage of the package is described in its
README.md
(yes, I haven’t set up proper documentation, yet!). I ran the analysis with 8
threads, it took less than 30 minutes to analyse the entire General registry (I
was mainly limited by my Internet connection, using some threads less wouldn’t
have changed much):

julia> using AnalyzeRegistry

julia> @time report = analyze(find_packages());
1567.008404 seconds (7.41 M allocations: 857.636 MiB, 0.01% gc time, 0.00% compilation time)

analyze returns a vector of a data structure, Package, which describes a
package:

name
URL of the git repository
can the repository be cloned? Some repositories have been deleted or made
private, so it can be accessed from the public anymore
does it have documentation? This is the hardest criterion: I looked for the
file docs/make.jl, or doc/make.jl, which is used to generate the
documentation with Documenter.jl, but many packages may do something else,
see more in the comments below
does it have the test/runtests.jl file? This is what the Test standard
library uses to launch the tests
does it use GitHub Actions?
does it use Travis CI?
does it use AppVeyor?
does it use Cirrus CI?
does it use Circle CI?
does it use Drone CI?
does it use Buildkite?
does it use Azure Pipelines?
does it use GitLab Pipeline?

Report of the analysis on 2021

Before summarising my findings, I saved the results of the
analysis
in a JLD2 archive, so that anyone can look
into them.

Now, some statistics:

total number of packages: 4312 (note: I excluded JLL
packages, which are automatically
generated, and not very useful to measure whether humans follow best
programming practices);
packages hosted on GitHub: 4296
packages hosted on GitLab: 16
packages that could be cloned: 4287 (99.4% of the total). The rest of
percentages will refer to this number, since I could look for documentation
and testing only if I could actually clone the repository of a package;
packages using Documenter.jl to publish documentation: 1887 (44%)
packages with testing: 4139 (96.5%)
packages using any of the CI services below: 4105 (95.8%)
packages using GitHub Actions: 3240 (75.6%)
packages using Travis CI: 2512 (58.6%)
packages using AppVeyor: 783 (18.3%)
packages using Cirrus CI: 60 (1.4%)
packages using Circle CI: 13 (0.3%)
packages using Drone CI: 43 (1%)
packages using Buildkite: 29 (0.7%)
packages using Azure Pipelines: 7 (0.2%)
packages using GitLab Pipeline: 85 (1.9%)

I ran the analysis on 2020-01-23, on this revision of the General
registry.

Comments

The results are biased by the criteria I’ve chosen to determine whether a
package “has documentation”, or “has tests” and should be takes cum grano
salis: these criteria aren’t bullet-proof, also an empty test/runtests.jl
would count as “has tests”, but see below;
remember that the General
registry is not a curated list
of Julia packages and at the moment there is no requirement to have
documentation, nor testing nor continuous integration to be accepted;
about 44% of packages are set up to use Documenter.jl to publish
documentation. While this fraction doesn’t look particularly high, consider
that many packages don’t keep the documentation within the repository but it
may be stored somewhere else (very notable examples are
DifferentialEquations.jl
and Plots.jl), or they are so
simple that the entire documentation is written down in the README file (for
example Tar.jl) or the wiki.
- I looked at a random sample of 43 packages (1% of total) “without
  documentation”, of this
  - 4 didn’t have any documentation, docstrings, or examples of uses
    whatsoever (but 2 of them still had meaningful tests!)
  - 1 didn’t have documentation nor examples, but only docstrings
  - 1 only had an examples directory, with poorly commented samples of code
  - all the others had documentation in the README, in the wiki, or anyway
    published on a different website.
So, about 6 packages out of 43 (14%) had a very poor or completely missing
meaningful documentation for users. Assuming this was a representative
sample, my conclusion is that about 8% of packages in the registry have a
lacking documentation, while the rest have some documentation to get users
started, and a bit less than half of the total are set up to use
Documenter.jl to publish documentation;
an overwhelming majority of packages, 96.5%, have tests!
- I looked at a random sample of 43 packages “with tests”, to check whether
  they actually have tests: only two of them (4.7%) had a dummy
  test/runtests.jl file. If this sample was representative of all Julia
  packages in the General registry, we can say that about 92% of Julia
  packages do test their code. Interestingly enough, this is about the same
  fraction of packages with some documentation;
almost all packages with testing also set up continuous integration, even
though a small fraction is probably running dummy tests. If compared to the
about 80% of papers in JOSS found by Jamie not to have any obvious CI setup,
the uptake of CI among Julia packages is remarkable (the audience of authors
of JOSS paper and Julia packages has a large overlap, so I believe it makes
sense to look at them together);
GitHub Actions is the most used CI service (75.6%), followed by Travis
(58.6%). Note that measuring usage of GitHub Actions for CI is tricky,
because, differently from all the other services, there isn’t a single
standard name for the configuration script, and GitHub Actions is also often
used for other tasks than pure CI. For example, I didn’t consider files for
CompatHelper or
TagBot. It would have been
interesting to look at these statistics before November, when the new pricing
model pushed
many open-source users away from their services.

My take-away message is that despite the General registry not being curated and
not enforcing specific styles or best practices, a large fraction of Julia
packages do embrace documentation, testing, and continuous integration, showing
that the ecosystem provides easy-to-use tools for these practices.

juliabloggers.com

A Julia Language Blog Aggregator

Measuring the prevalence of documentation, testing and continuous integration in Julia packages

Introduction

Analysing Julia packages in the General registry

Report of the analysis on 2021

Comments

Related