Tables.jl 1.0 Release

By: Jacob Quinn

Re-posted from: https://quinnj.home.blog/2020/02/12/tables-jl-1-0-release/

Since its inception on a couple of whiteboards at JuliaCon 2018 in London, the Tables.jl package has grown to become a foundational set of interfaces for data packages in Julia, even accumulating 74 direct dependencies in the General registry as of this writing (Feb 2020)!

So why the 1.0 release? An official 1.0 release can signal stability and maturity, particularly in the case of an “interface package” like Tables.jl. With very little change in API since the package started, we figured it was time to send that signal out to the broader community: hey! this package can be pretty useful and we’re not going to break it (intentionally)! It also gives the opportunity to polish up a few APIs, iron out a few wrinkles, and cleanup a lot of documentation.

For those who aren’t familiar, the Tables.jl package is about providing powerful interfaces for “table types” in all their shapes, sizes, and formats, to talk with each other seamlessly, and most importantly, without direct knowledge of each other! At the highest level, it provides the Tables.rows and Tables.columns functions, which provide two of the most common access patterns to table data everywhere (row-by-row, or by entire columns). Table types become valid “sources” by implementing access to their data via Tables.rows or Tables.columns (or both!), and “sinks” can then operate on any source by calling Tables.rows or Tables.columns and processing the resulting data appropriately. A key feature for sinks is the “orientation agnostic” nature of Tables.rows and Tables.columns; i.e. sinks don’t need to worry if the source is row or column-oriented by nature, since performant, generic fallbacks are provided both ways. That is, calling Tables.rows on a source that only defined Tables.columns fallsback to a generic lazy row iteration over the input columns. And vice versa, calling Tables.columns on a source that only defined Tables.rows will process the input rows to build up columns. This works because it’s work the sink would have had to do anyway; if the sink really needs columns to do its work, then it would have to turn rows into columns anyway, so Tables.jl provides that with the best, community-sourced implementation possible.

Another core concept, and now clarified in the 1.0 release, is the interface for accessing data on an individual row, as well as a set of columns. It turns out, by viewing a “row” as an ordered set of named column values, and “columns” as an ordered set of named columns, lends itself naturally to a common interface, simplified implementations, and ability to provide powerful default functionality. Check out new the docs, particularly about Tables.AbstractRow and Tables.AbstractColumns to learn more. You can also checkout the Discourse announcement, which is geared a little bit more towards 1.0 release notes and upgrade guide.

A recent popular blog post highlighted features of Julia that lend itself to such widespread composability between packages and Tables.jl is a powerful example of this. Through just a few simple APIs, it allows a DataFrame to be serialized to disk in the binary feather format, read back, converted to JSON to be sent over the web, and loaded into a mysql database. All without any of those packages knowing about each other, and without needing a single intermediary table type to convert between. Applications and processing tasks can feel free to leverage any number of in-memory representations, disk formats, and databases to handle table manipulations.

Here’s to more integrations, more composability, and more table types in the future!