data manipulation

Trying out timeplyr

The timeplyr R package, created by my colleague Nick, was accepted on CRAN in October 2023. A direct quote from the CRAN page is that it provides a set of fast tidy functions for wrangling, completing and summarising date and date-time data. It looks like a really neat package for working with time series data in a way consistent with what people have become used to with the tidyverse. From my chats with Nick, I believe some of the ideas for this package were inspired by problems that came up repeatedly while working with COVID-19 data.

Grouped Sequences in dplyr Part 2

I just wrote a post about grouped sequences in dplyr and following that, I’ve been made aware of another couple of solutions to this problem (credit John Mackintosh). The solution involves using the consecutive_id() function, available in dplyr since v1.1.0. In the help page for this function, it’s mentioned that it was inspired by rleid() function from the data.table package. These functions work similarly to the rle() function I used last time (in what I called ‘the complicated solution’) but provide neater outputs.

Grouped Sequences in dplyr

For a piece of work I had to calculate the number of matches that a team plays away from home in a row, which we will call days_on_the_road. I was not sure how to do this with dplyr but it’s basically a ‘grouped sequence’. For this post, I’ve created some dummy data to illustrate this idea. The num_matches_away variable is what we want to mimic using some data manipulation.

A couple of case_when() tricks

Combining case_when() and across() If you want to use case_when() and across() different variables, then here is an example that can do this with the help of the get() and cur_column() functions. library(tidyverse) iris_df <- as_tibble(iris) %>% mutate(flag_Petal.Length = as.integer(Petal.Length > 1.5), flag_Petal.Width = as.integer(Petal.Width > 0.2)) iris_df %>% mutate(across(c(Petal.Length, Petal.Width), ~case_when( get(glue::glue("flag_{cur_column()}")) == 1 ~ NA_real_, TRUE ~ .x ))) %>% select(contains("Petal")) ## # A tibble: 150 × 4 ## Petal.