Annotating columns of a data frame with DataFramesMeta.jl

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2024/04/26/labels.html

Introduction

Today I want to discuss a functionality that was recently added to DataFramesMeta.jl.
These utility macros and functions make it easy to add custom labels and notes to columns
of a data frame. This functionality is especially useful when working with wide data frames,
as is often the case when e.g. analyzing economic data.

This post is written under Julia 1.10.1, DataFrames.jl 1.6.1, and DataFramesMeta.jl 0.15.2.

Column labels

A column label is a short description of the contents of a column.
When using DataFramesMeta.jl you can use the following basic commands to work with them:

  • @label! attaches a label to a column;
  • label allows you to retrieve column label;
  • printlabels presents you labels of all annotated columns in a data frame.

Here is a simple example:

julia> using DataFramesMeta

julia> df = DataFrame(year=[2000, 2001], rev=[12, 17])
2×2 DataFrame
 Row │ year   rev
     │ Int64  Int64
─────┼──────────────
   1 │  2000     12
   2 │  2001     17

julia> @label!(df, :rev = "Revenue (USD)")
2×2 DataFrame
 Row │ year   rev
     │ Int64  Int64
─────┼──────────────
   1 │  2000     12
   2 │  2001     17

julia> label(df, :rev)
"Revenue (USD)"

julia> printlabels(df)
┌────────┬───────────────┐
│ Column │         Label │
├────────┼───────────────┤
│   year │          year │
│    rev │ Revenue (USD) │
└────────┴───────────────┘

Note that if some column did not get an explicit label (like :year in our example)
by default its name is its label.

Column notes

Column notes are meant to give more detailed information about a column in a data frame.
You can use the following basic commands to work with them:

  • @note! attaches a note to a column;
  • note allows you to retrieve column note;
  • printnotes presents you notes of all columns in a data frame.
julia> @note!(df, :rev = "Total revenue of a company in in a calendar year in nominal USD")
2×2 DataFrame
 Row │ year   rev
     │ Int64  Int64
─────┼──────────────
   1 │  2000     12
   2 │  2001     17

julia> note(df, :rev)
"Total revenue of a company in in a calendar year in nominal USD"

julia> printnotes(df)
Column: rev
───────────
Total revenue of a company in in a calendar year in nominal USD

julia> @note!(df, :year = "Calendar year")
2×2 DataFrame
 Row │ year   rev
     │ Int64  Int64
─────┼──────────────
   1 │  2000     12
   2 │  2001     17

julia> printnotes(df)
Column: year
────────────
Calendar year

Column: rev
───────────
Total revenue of a company in in a calendar year in nominal USD

Observe that printnotes only prints notes that were actually added to
a column (as opposed to printlabels which prints labels of all columns,
using the default fallback to column name).

Conclusions

Today I covered the basic functions allowing to work with column
metadata of data frames. If you are interested in learning more
advanced functionalities please refer to DataFrames.jl
and TableMetadataTools.jl documentations.

I hope that you will find the metadata functionality provided by
DataFramesMeta.jl useful in your work.