Group by and Mutate Fun

I wanted to write this post simply because I use this trick a lot and it’s not super complicated but for new programmers, this is often one of the first challenging tasks that can be made simple with dplyr and data.table. As someone who uses both packages, this post will give examples of how to use group by and mutate together in both packages.

In this post, we’re going to use the following data and focus on ranking the ages of each student based on their location.

data<-data.table(Location=c("Los Angeles","San Francisco","Los Angeles","New York","New York","New York","Los Angeles","New York","New York","Los Angeles"),
                 Age=c(13,24,13,24,25,18,17,4,29,13))


print(data)
##          Location Age
##  1:   Los Angeles  13
##  2: San Francisco  24
##  3:   Los Angeles  13
##  4:      New York  24
##  5:      New York  25
##  6:      New York  18
##  7:   Los Angeles  17
##  8:      New York   4
##  9:      New York  29
## 10:   Los Angeles  13

So first, let’s do this in dplyr:

data %>% 
  as.data.frame() %>% 
  group_by(Location) %>% 
  mutate(Location_Rank = 1:n()) %>% 
  arrange(Location, Location_Rank)
## # A tibble: 10 x 3
## # Groups:   Location [3]
##    Location        Age Location_Rank
##    <chr>         <dbl>         <int>
##  1 Los Angeles      13             1
##  2 Los Angeles      13             2
##  3 Los Angeles      17             3
##  4 Los Angeles      13             4
##  5 New York         24             1
##  6 New York         25             2
##  7 New York         18             3
##  8 New York          4             4
##  9 New York         29             5
## 10 San Francisco    24             1

This isn’t revolutionary but for the first couple weeks of consistently using the group by and summarize functions together to analysis/play around with data, it seemed crazy that you could use other functions with group by. So when I discovered that you could use mutate with group by it was really a gamer changer.

As is common in my R blog posts, you can also do this in data.table!!!. For example:

data <- as.data.table(data)

data[, Location_Rank := seq_len(.N), by = Location]
data[, Location_Rank := rowid(Location)]
print(data)
##          Location Age Location_Rank
##  1:   Los Angeles  13             1
##  2: San Francisco  24             1
##  3:   Los Angeles  13             2
##  4:      New York  24             1
##  5:      New York  25             2
##  6:      New York  18             3
##  7:   Los Angeles  17             3
##  8:      New York   4             4
##  9:      New York  29             5
## 10:   Los Angeles  13             4

Other useful functions to use alongside group by and mutate are lag, lead and cumsum.

Next
Previous