Tutorials for DataFrames.jl release 0.21. Part I

By: Blog by Bogumił Kamiński

Re-posted from: https://bkamins.github.io/julialang/2020/05/25/data-frames-part1.html

DataFrames.jl release 0.21

DataFrames.jl version 0.21 was a major release that introduced a number of
significant changes to DataFrames.jl API. The list is long, so
I briefly summarize here he most significant things in terms of functionality:

  • we now allow to select columns using strings (selection using Symbols is
    still allowed);
  • a completely new design of API for working with columns of a data frame or
    grouped data frame, covered by select/select!, transform/transform!,
    and combine functions; it is consistent (so you learn it once and reuse
    everywhere), more flexible, and has a better performance than the old one;
    in particular two wrappers ByRow and AsTable have been added to API;
  • major enhancements to push! and append!, which allow an easy way to
    digest heterogeneous data (varying element types, varying column sets)
    into a data frame;
  • GroupedDataFrame now supports a fast lookup by grouping columns (so making
    a GroupedDataFrame can be now seen as adding an index to a data frame)
  • filter/filter! are now fast using Pair-interface;
  • rules for pseudo-broadcasting (spreading single observations across multiple
    rows) have been established and are consistently applied in all methods that
    allow this operation.

All these changes combined mean that now all operations on data frames can be
expressed via function chaining (and you have a full control if you want to
make copies or perform operations in-place). There are many users who like this
style of expressing transformations made on data. If you want to go this way,
then probably you should consider learning one of the packages that makes
it easier to work with |> operator. There are many excellent alternatives
in the Julia ecosystem. Let me mention two Pipe.jl (easier) and
Underscores.jl (more powerful, but harder to master).

After the release I got several questions about showing how things work in
practice. Therefore in this post I list tutorials that are currently available
and have been updated to show how DataFrames.jl v0.21 works.

In the Part II post (that I plan to prepare next week) I will show some new
material that was prepared under DataFrames.jl v0.21.

Tutorials for release 0.21

There are four sources of information about the functionality of DataFrames.jl
0.21 that you can check out (and I maintain them so that they should be
up to date):

  1. An official DataFrames.jl Manual.
  2. A notebook-based DataFrames.jl Tutorial.
  3. Video materials at JuliaAcademy.
  4. Recently updated the materials about DataFrames.jl that I have presented
    during JuliaCon2019 workshop. You will be able to find
    there two notebooks that include worked examples how you can process
    real-life data sets.

I hope these materials will be useful for exploring the latest release
of DataFrames.jl!