5.3 Column Transformation

Whereas the @select macro variants performs column selection, the @transform macro variants do not perform any column selection. It can either overwrite existent columns or create new columns that will be added to the right of our DataFrame.

For example, the previous operation on :grade can be invoked as a transformation with:

@rtransform df :grade_100 = :grade * 10
name grade grade_100
Sally 1.0 10.0
Bob 5.0 50.0
Alice 8.5 85.0
Hank 4.0 40.0
Bob 9.5 95.0
Sally 9.5 95.0
Hank 6.0 60.0

As you can see, @transform does not perform column selection, and the :grade_100 column is created as a new column and added to the right of our DataFrame.

DataFramesMeta.jl macros also support begin ... end statements. For example, suppose that you are creating two columns in a @transform macro:

@rtransform df :grade_100 = :grade * 10 :grade_5 = :grade / 2
name grade grade_100 grade_5
Sally 1.0 10.0 0.5
Bob 5.0 50.0 2.5
Alice 8.5 85.0 4.25
Hank 4.0 40.0 2.0
Bob 9.5 95.0 4.75
Sally 9.5 95.0 4.75
Hank 6.0 60.0 3.0

It can be cumbersome and difficult to read the performed transformations. To facilitate that, we can use begin ... end statements and put one transformation per line:

@rtransform df begin
    :grade_100 = :grade * 10
    :grade_5 = :grade / 2
end
name grade grade_100 grade_5
Sally 1.0 10.0 0.5
Bob 5.0 50.0 2.5
Alice 8.5 85.0 4.25
Hank 4.0 40.0 2.0
Bob 9.5 95.0 4.75
Sally 9.5 95.0 4.75
Hank 6.0 60.0 3.0

We can also use other columns in our transformations, which makes DataFramesMeta.jl more appealing than DataFrames.jl due to the easier syntax.

First, let’s revisit the leftjoined DataFrame from Chapter 4:

leftjoined = leftjoin(grades_2020(), grades_2021(); on=:name)
name grade_2020 grade_2021
Sally 1.0 9.5
Hank 4.0 6.0
Bob 5.0 missing
Alice 8.5 missing

Additionally, we’ll replace the missing values with 5 (Section 4.9, also note the ! in in-place variant @rtransform!):

@rtransform! leftjoined :grade_2021 = coalesce(:grade_2021, 5)
name grade_2020 grade_2021
Sally 1.0 9.5
Hank 4.0 6.0
Bob 5.0 5
Alice 8.5 5

This is how you calculate the mean of grades in both years using DataFramesMeta.jl:

@rtransform leftjoined :mean_grades = (:grade_2020 + :grade_2021) / 2
name grade_2020 grade_2021 mean_grades
Sally 1.0 9.5 5.25
Hank 4.0 6.0 5.0
Bob 5.0 5 5.0
Alice 8.5 5 6.75

This is how you would perform it in DataFrames.jl:

transform(leftjoined, [:grade_2020, :grade_2021] => ByRow((x, y) -> (x + y) / 2) => :mean_grades)
name grade_2020 grade_2021 mean_grades
Sally 1.0 9.5 5.25
Hank 4.0 6.0 5.0
Bob 5.0 5 5.0
Alice 8.5 5 6.75

As you can see, the case for easier syntax is not hard to argue for DataFramesMeta.jl.



Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso