Re-posted from: https://bkamins.github.io/julialang/2020/05/25/data-frames-part1.html
DataFrames.jl release 0.21
DataFrames.jl version 0.21 was a major release that introduced a number of
significant changes to DataFrames.jl API. The list is long, so
I briefly summarize here he most significant things in terms of functionality:
- we now allow to select columns using strings (selection using
Symbol
s is
still allowed); - a completely new design of API for working with columns of a data frame or
grouped data frame, covered byselect
/select!
,transform
/transform!
,
andcombine
functions; it is consistent (so you learn it once and reuse
everywhere), more flexible, and has a better performance than the old one;
in particular two wrappersByRow
andAsTable
have been added to API; - major enhancements to
push!
andappend!
, which allow an easy way to
digest heterogeneous data (varying element types, varying column sets)
into a data frame; GroupedDataFrame
now supports a fast lookup by grouping columns (so making
aGroupedDataFrame
can be now seen as adding an index to a data frame)filter
/filter!
are now fast usingPair
-interface;- rules for pseudo-broadcasting (spreading single observations across multiple
rows) have been established and are consistently applied in all methods that
allow this operation.
All these changes combined mean that now all operations on data frames can be
expressed via function chaining (and you have a full control if you want to
make copies or perform operations in-place). There are many users who like this
style of expressing transformations made on data. If you want to go this way,
then probably you should consider learning one of the packages that makes
it easier to work with |>
operator. There are many excellent alternatives
in the Julia ecosystem. Let me mention two Pipe.jl (easier) and
Underscores.jl (more powerful, but harder to master).
After the release I got several questions about showing how things work in
practice. Therefore in this post I list tutorials that are currently available
and have been updated to show how DataFrames.jl v0.21 works.
In the Part II post (that I plan to prepare next week) I will show some new
material that was prepared under DataFrames.jl v0.21.
Tutorials for release 0.21
There are four sources of information about the functionality of DataFrames.jl
0.21 that you can check out (and I maintain them so that they should be
up to date):
- An official DataFrames.jl Manual.
- A notebook-based DataFrames.jl Tutorial.
- Video materials at JuliaAcademy.
- Recently updated the materials about DataFrames.jl that I have presented
during JuliaCon2019 workshop. You will be able to find
there two notebooks that include worked examples how you can process
real-life data sets.
I hope these materials will be useful for exploring the latest release
of DataFrames.jl!