Grouped Sequences in dplyr Part 2

I just wrote a post about grouped sequences in dplyr and following that, I’ve been made aware of another couple of solutions to this problem (credit John Mackintosh). The solution involves using the consecutive_id() function, available in dplyr since v1.1.0. In the help page for this function, it’s mentioned that it was inspired by rleid() function from the data.table package. These functions work similarly to the rle() function I used last time (in what I called ‘the complicated solution’) but provide neater outputs. I’ll try them both out here.

library(tidyverse)

sports_df <- tibble::tribble(
  ~team,    ~date,        ~home_or_away, ~num_matches_away,
  "Team A", "07/10/2022", "H",           0L,
  "Team A", "14/10/2022", "A",           1L,
  "Team A", "21/10/2022", "A",           2L,
  "Team A", "28/10/2022", "H",           0L,
  "Team A", "04/11/2022", "A",           1L,
  "Team A", "11/11/2022", "A",           2L,
  "Team A", "18/11/2022", "A",           3L,
  "Team A", "25/11/2022", "H",           0L,
  "Team A", "02/12/2022", "H",           0L,
  "Team A", "09/12/2022", "H",           0L,
  "Team B", "07/10/2022", "H",           0L,
  "Team B", "14/10/2022", "A",           1L,
  "Team B", "21/10/2022", "A",           2L,
  "Team B", "28/10/2022", "A",           3L,
  "Team B", "04/11/2022", "H",           0L,
  "Team B", "11/11/2022", "A",           1L,
  "Team B", "18/11/2022", "A",           2L,
  "Team B", "25/11/2022", "H",           0L,
  "Team B", "02/12/2022", "H",           0L,
  "Team B", "09/12/2022", "A",           1L
) %>%      
  mutate(date = as.Date(date, "%d/%m/%Y")) %>% 
  arrange(team, date)

dplyr::consecutive_id()

sports_df2 <- sports_df %>% 
  group_by(team) %>% 
  mutate(consec_id = consecutive_id(home_or_away)) %>% 
  group_by(team, consec_id) %>% 
  mutate(days_on_road = 1:n(),
         days_on_road = if_else(home_or_away == "H", 0L, days_on_road)) %>% 
  ungroup()

sports_df2
## # A tibble: 20 × 6
##    team   date       home_or_away num_matches_away consec_id days_on_road
##    <chr>  <date>     <chr>                   <int>     <int>        <int>
##  1 Team A 2022-10-07 H                           0         1            0
##  2 Team A 2022-10-14 A                           1         2            1
##  3 Team A 2022-10-21 A                           2         2            2
##  4 Team A 2022-10-28 H                           0         3            0
##  5 Team A 2022-11-04 A                           1         4            1
##  6 Team A 2022-11-11 A                           2         4            2
##  7 Team A 2022-11-18 A                           3         4            3
##  8 Team A 2022-11-25 H                           0         5            0
##  9 Team A 2022-12-02 H                           0         5            0
## 10 Team A 2022-12-09 H                           0         5            0
## 11 Team B 2022-10-07 H                           0         1            0
## 12 Team B 2022-10-14 A                           1         2            1
## 13 Team B 2022-10-21 A                           2         2            2
## 14 Team B 2022-10-28 A                           3         2            3
## 15 Team B 2022-11-04 H                           0         3            0
## 16 Team B 2022-11-11 A                           1         4            1
## 17 Team B 2022-11-18 A                           2         4            2
## 18 Team B 2022-11-25 H                           0         5            0
## 19 Team B 2022-12-02 H                           0         5            0
## 20 Team B 2022-12-09 A                           1         6            1
identical(sports_df2$days_on_road, sports_df$num_matches_away)
## [1] TRUE

data.table::rleid()

sports_df3 <- sports_df %>% 
  group_by(team) %>% 
  mutate(rleid_id = data.table::rleid(home_or_away)) %>% 
  group_by(team, rleid_id) %>% 
  mutate(days_on_road = 1:n(),
         days_on_road = if_else(home_or_away == "H", 0L, days_on_road)) %>% 
  ungroup()

sports_df3
## # A tibble: 20 × 6
##    team   date       home_or_away num_matches_away rleid_id days_on_road
##    <chr>  <date>     <chr>                   <int>    <int>        <int>
##  1 Team A 2022-10-07 H                           0        1            0
##  2 Team A 2022-10-14 A                           1        2            1
##  3 Team A 2022-10-21 A                           2        2            2
##  4 Team A 2022-10-28 H                           0        3            0
##  5 Team A 2022-11-04 A                           1        4            1
##  6 Team A 2022-11-11 A                           2        4            2
##  7 Team A 2022-11-18 A                           3        4            3
##  8 Team A 2022-11-25 H                           0        5            0
##  9 Team A 2022-12-02 H                           0        5            0
## 10 Team A 2022-12-09 H                           0        5            0
## 11 Team B 2022-10-07 H                           0        1            0
## 12 Team B 2022-10-14 A                           1        2            1
## 13 Team B 2022-10-21 A                           2        2            2
## 14 Team B 2022-10-28 A                           3        2            3
## 15 Team B 2022-11-04 H                           0        3            0
## 16 Team B 2022-11-11 A                           1        4            1
## 17 Team B 2022-11-18 A                           2        4            2
## 18 Team B 2022-11-25 H                           0        5            0
## 19 Team B 2022-12-02 H                           0        5            0
## 20 Team B 2022-12-09 A                           1        6            1
identical(sports_df3$days_on_road, sports_df$num_matches_away)
## [1] TRUE

We now have 4 solutions to this problem - it’s often the case that there are many ways to do the same thing in R! It can be extra effort to remember different solutions but they work in different ways and can be applied to different problems so I do think it’s worthwhile trying these out. Which solution is your favourite?

Avatar
Alan Yeung
Research Fellow and Healthcare Scientist

Applied statistician, currently working mainly on blood borne viruses and drugs. Supporter of all things R.

Related