Skip to content

Commit 272b320

Browse files
committed
feat: big time type handling refactor
1 parent 4ffe8b6 commit 272b320

30 files changed

+706
-943
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Type: Package
22
Package: epiprocess
33
Title: Tools for basic signal processing in epidemiology
4-
Version: 0.7.12
4+
Version: 0.7.13
55
Authors@R: c(
66
person("Jacob", "Bien", role = "ctb"),
77
person("Logan", "Brooks", email = "[email protected]", role = c("aut", "cre")),

NEWS.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,15 @@ Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.x.y will indicat
4343
- Added optional `decay_to_tibble` attribute controlling `as_tibble()` behavior
4444
of `epi_df`s to let `{epipredict}` work more easily with other libraries (#471).
4545

46+
## Breaking Changes
47+
48+
- `epix_slide` `before` argument now defaults to `Inf`, requires a `difftime`
49+
that matches the type of the `version` column and the `time_step` argument has
50+
been removed. Forbid `datetime` types in `version` column.
51+
- `epi_slide` `before` and `after` arguments now require a `difftime` type that
52+
matches the type of the `time_value` column. The `time_step` argument has been
53+
removed. Forbid `datetime` types in `time_value` column.
54+
4655
# epiprocess 0.7.0
4756

4857
## Breaking changes:

R/archive.R

Lines changed: 11 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -181,15 +181,14 @@ NULL
181181
#' object:
182182
#'
183183
#' * `geo_type`: the type for the geo values.
184-
#' * `time_type`: the type for the time values.
185184
#' * `additional_metadata`: list of additional metadata for the data archive.
186185
#'
187186
#' Unlike an `epi_df` object, metadata for an `epi_archive` object `x` can be
188-
#' accessed (and altered) directly, as in `x$geo_type` or `x$time_type`,
189-
#' etc. Like an `epi_df` object, the `geo_type` and `time_type` fields in the
190-
#' metadata of an `epi_archive` object are not currently used by any
191-
#' downstream functions in the `epiprocess` package, and serve only as useful
192-
#' bits of information to convey about the data set at hand.
187+
#' accessed (and altered) directly, as in `x$geo_type` etc. Like an `epi_df`
188+
#' object, the `geo_type` field in the metadata of an `epi_archive` object are
189+
#' not currently used by any downstream functions in the `epiprocess` package,
190+
#' and serve only as useful bits of information to convey about the data set
191+
#' at hand.
193192
#'
194193
#' @section Generating Snapshots:
195194
#' An `epi_archive` object can be used to generate a snapshot of the data in
@@ -211,15 +210,12 @@ NULL
211210
#' @param geo_type Type for the geo values. If missing, then the function will
212211
#' attempt to infer it from the geo values present; if this fails, then it
213212
#' will be set to "custom".
214-
#' @param time_type Type for the time values. If missing, then the function will
215-
#' attempt to infer it from the time values present; if this fails, then it
216-
#' will be set to "custom".
217213
#' @param other_keys Character vector specifying the names of variables in `x`
218214
#' that should be considered key variables (in the language of `data.table`)
219215
#' apart from "geo_value", "time_value", and "version".
220216
#' @param additional_metadata List of additional metadata to attach to the
221-
#' `epi_archive` object. The metadata will have `geo_type` and `time_type`
222-
#' fields; named entries from the passed list or will be included as well.
217+
#' `epi_archive` object. The metadata will have the `geo_type` field; named
218+
#' entries from the passed list or will be included as well.
223219
#' @param compactify Optional; Boolean or `NULL`. `TRUE` will remove some
224220
#' redundant rows, `FALSE` will not, and missing or `NULL` will remove
225221
#' redundant rows, but issue a warning. See more information at `compactify`.
@@ -270,7 +266,6 @@ NULL
270266
#'
271267
#' toy_epi_archive <- tib %>% as_epi_archive(
272268
#' geo_type = "state",
273-
#' time_type = "day"
274269
#' )
275270
#' toy_epi_archive
276271
#'
@@ -296,14 +291,12 @@ NULL
296291
#'
297292
#' x <- df %>% as_epi_archive(
298293
#' geo_type = "state",
299-
#' time_type = "day",
300294
#' other_keys = "county"
301295
#' )
302296
#'
303297
new_epi_archive <- function(
304298
x,
305299
geo_type = NULL,
306-
time_type = NULL,
307300
other_keys = NULL,
308301
additional_metadata = NULL,
309302
compactify = NULL,
@@ -381,7 +374,6 @@ new_epi_archive <- function(
381374
list(
382375
DT = DT,
383376
geo_type = geo_type,
384-
time_type = time_type,
385377
additional_metadata = additional_metadata,
386378
clobberable_versions_start = clobberable_versions_start,
387379
versions_end = versions_end
@@ -398,7 +390,6 @@ new_epi_archive <- function(
398390
validate_epi_archive <- function(
399391
x,
400392
geo_type = NULL,
401-
time_type = NULL,
402393
other_keys = NULL,
403394
additional_metadata = NULL,
404395
compactify = NULL,
@@ -411,8 +402,8 @@ validate_epi_archive <- function(
411402
if (any(c("geo_value", "time_value", "version") %in% other_keys)) {
412403
cli_abort("`other_keys` cannot contain \"geo_value\", \"time_value\", or \"version\".")
413404
}
414-
if (any(names(additional_metadata) %in% c("geo_type", "time_type"))) {
415-
cli_warn("`additional_metadata` names overlap with existing metadata fields \"geo_type\", \"time_type\".")
405+
if (any(names(additional_metadata) %in% c("geo_type"))) {
406+
cli_warn("`additional_metadata` names overlap with existing metadata fields \"geo_type\".")
416407
}
417408

418409
# Conduct checks and apply defaults for `compactify`
@@ -448,7 +439,6 @@ validate_epi_archive <- function(
448439
as_epi_archive <- function(
449440
x,
450441
geo_type = NULL,
451-
time_type = NULL,
452442
other_keys = NULL,
453443
additional_metadata = NULL,
454444
compactify = NULL,
@@ -465,18 +455,17 @@ as_epi_archive <- function(
465455
}
466456

467457
geo_type <- geo_type %||% guess_geo_type(x$geo_value)
468-
time_type <- time_type %||% guess_time_type(x$time_value)
469458
other_keys <- other_keys %||% character(0L)
470459
additional_metadata <- additional_metadata %||% list()
471460
clobberable_versions_start <- clobberable_versions_start %||% NA
472461
versions_end <- versions_end %||% max_version_with_row_in(x)
473462

474463
validate_epi_archive(
475-
x, geo_type, time_type, other_keys, additional_metadata,
464+
x, geo_type, other_keys, additional_metadata,
476465
compactify, clobberable_versions_start, versions_end
477466
)
478467
new_epi_archive(
479-
x, geo_type, time_type, other_keys, additional_metadata,
468+
x, geo_type, other_keys, additional_metadata,
480469
compactify, clobberable_versions_start, versions_end
481470
)
482471
}

R/epi_df.R

Lines changed: 14 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,14 @@
1515
#' the following fields:
1616
#'
1717
#' * `geo_type`: the type for the geo values.
18-
#' * `time_type`: the type for the time values.
1918
#' * `as_of`: the time value at which the given data were available.
2019
#'
2120
#' Metadata for an `epi_df` object `x` can be accessed (and altered) via
22-
#' `attributes(x)$metadata`. The first two fields in the above list,
23-
#' `geo_type` and `time_type`, can usually be inferred from the `geo_value`
24-
#' and `time_value` columns, respectively. They are not currently used by any
25-
#' downstream functions in the `epiprocess` package, and serve only as useful
26-
#' bits of information to convey about the data set at hand. More information
27-
#' on their coding is given below.
21+
#' `attributes(x)$metadata`. The first field in the above list, `geo_type`,
22+
#' can usually be inferred from the `geo_value` columns. They are not
23+
#' currently used by any downstream functions in the `epiprocess` package,
24+
#' and serve only as useful bits of information to convey about the data set
25+
#' at hand. More information on their coding is given below.
2826
#'
2927
#' The last field in the above list, `as_of`, is one of the most unique aspects
3028
#' of an `epi_df` object. In brief, we can think of an `epi_df` object as a
@@ -88,13 +86,13 @@ NULL
8886
#' Creates an `epi_df` object
8987
#'
9088
#' Creates a new `epi_df` object. By default, builds an empty tibble with the
91-
#' correct metadata for an `epi_df` object (ie. `geo_type`, `time_type`, and `as_of`).
89+
#' correct metadata for an `epi_df` object (ie. `geo_type` and `as_of`).
9290
#' Refer to the below info. about the arguments for more details.
9391
#'
9492
#' @template epi_df-params
9593
#'
9694
#' @export
97-
new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of,
95+
new_epi_df <- function(x = tibble::tibble(), geo_type, as_of,
9896
additional_metadata = list(), ...) {
9997
assert_data_frame(x)
10098
assert_list(additional_metadata)
@@ -106,11 +104,6 @@ new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of,
106104
geo_type <- guess_geo_type(x$geo_value)
107105
}
108106

109-
# If time type is missing, then try to guess it
110-
if (missing(time_type)) {
111-
time_type <- guess_time_type(x$time_value)
112-
}
113-
114107
# If as_of is missing, then try to guess it
115108
if (missing(as_of)) {
116109
# First check the metadata for an as_of field
@@ -135,7 +128,6 @@ new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of,
135128
# Define metadata fields
136129
metadata <- list()
137130
metadata$geo_type <- geo_type
138-
metadata$time_type <- time_type
139131
metadata$as_of <- as_of
140132
metadata <- c(metadata, additional_metadata)
141133

@@ -185,7 +177,7 @@ new_epi_df <- function(x = tibble::tibble(), geo_type, time_type, as_of,
185177
#'
186178
#' # The `other_keys` metadata (`"county_code"` in this case) is automatically
187179
#' # inferred from the `tsibble`'s `key`:
188-
#' ex1 <- as_epi_df(x = ex1_input, geo_type = "state", time_type = "day", as_of = "2020-06-03")
180+
#' ex1 <- as_epi_df(x = ex1_input, geo_type = "state", as_of = "2020-06-03")
189181
#' attr(ex1, "metadata")[["other_keys"]]
190182
#'
191183
#'
@@ -257,29 +249,23 @@ as_epi_df.epi_df <- function(x, ...) {
257249
#' be used.
258250
#' @importFrom rlang .data
259251
#' @export
260-
as_epi_df.tbl_df <- function(x, geo_type, time_type, as_of,
252+
as_epi_df.tbl_df <- function(x, geo_type, as_of,
261253
additional_metadata = list(), ...) {
262254
if (!test_subset(c("geo_value", "time_value"), names(x))) {
263255
cli_abort(
264256
"Columns `geo_value` and `time_value` must be present in `x`."
265257
)
266258
}
267259

268-
new_epi_df(
269-
x, geo_type, time_type, as_of,
270-
additional_metadata, ...
271-
)
260+
new_epi_df(x, geo_type, as_of, additional_metadata, ...)
272261
}
273262

274263
#' @method as_epi_df data.frame
275264
#' @describeIn as_epi_df Works analogously to `as_epi_df.tbl_df()`.
276265
#' @export
277-
as_epi_df.data.frame <- function(x, geo_type, time_type, as_of,
266+
as_epi_df.data.frame <- function(x, geo_type, as_of,
278267
additional_metadata = list(), ...) {
279-
as_epi_df.tbl_df(
280-
tibble::as_tibble(x), geo_type, time_type, as_of,
281-
additional_metadata, ...
282-
)
268+
as_epi_df.tbl_df(tibble::as_tibble(x), geo_type, as_of, additional_metadata, ...)
283269
}
284270

285271
#' @method as_epi_df tbl_ts
@@ -288,18 +274,15 @@ as_epi_df.data.frame <- function(x, geo_type, time_type, as_of,
288274
#' "geo_value") are added to the metadata of the returned object, under the
289275
#' `other_keys` field.
290276
#' @export
291-
as_epi_df.tbl_ts <- function(x, geo_type, time_type, as_of,
277+
as_epi_df.tbl_ts <- function(x, geo_type, as_of,
292278
additional_metadata = list(), ...) {
293279
tsibble_other_keys <- setdiff(tsibble::key_vars(x), "geo_value")
294280
if (length(tsibble_other_keys) != 0) {
295281
additional_metadata$other_keys <- unique(
296282
c(additional_metadata$other_keys, tsibble_other_keys)
297283
)
298284
}
299-
as_epi_df.tbl_df(
300-
tibble::as_tibble(x), geo_type, time_type, as_of,
301-
additional_metadata, ...
302-
)
285+
as_epi_df.tbl_df(tibble::as_tibble(x), geo_type, as_of, additional_metadata, ...)
303286
}
304287

305288
#' Test for `epi_df` format

R/grouped_epi_archive.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -252,10 +252,10 @@ epix_slide.grouped_epi_archive <- function(
252252
ref_time_values <- sort(ref_time_values)
253253
}
254254

255-
# Validate and pre-process `before`:
256-
if (!(identical(before, Inf) || test_int(before, lower = 0L))) {
257-
cli_abort("`before` must be a non-negative integer or Inf.")
255+
if (!checkmate::test_scalar(before)) {
256+
cli_abort("`before` is a required scalar value.")
258257
}
258+
validate_slide_window_arg(before, guess_time_type(x$private$ungrouped$DT$version))
259259

260260
# Symbolize column name
261261
new_col <- sym(new_col_name)

R/methods-epi_archive.R

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,6 @@ epix_as_of <- function(x, max_version, min_time_value = -Inf, all_versions = FAL
112112
dplyr::select(-"version") %>%
113113
as_epi_df(
114114
geo_type = x$geo_type,
115-
time_type = x$time_type,
116115
as_of = max_version,
117116
additional_metadata = c(
118117
x$additional_metadata,
@@ -270,7 +269,7 @@ epix_merge <- function(x, y,
270269
cli_abort("`x` and `y` must have the same `$geo_type`")
271270
}
272271

273-
if (!identical(x$time_type, y$time_type)) {
272+
if (!identical(guess_time_type(x$DT$version), guess_time_type(y$DT$version))) {
274273
cli_abort("`x` and `y` must have the same `$time_type`")
275274
}
276275

@@ -451,7 +450,6 @@ epix_merge <- function(x, y,
451450
return(as_epi_archive(
452451
result_dt[], # clear data.table internal invisibility flag if set
453452
geo_type = x$geo_type,
454-
time_type = x$time_type,
455453
other_keys = setdiff(key(result_dt), c("geo_value", "time_value", "version")),
456454
additional_metadata = result_additional_metadata,
457455
# It'd probably be better to pre-compactify before the merge, and might be

R/methods-epi_df.R

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ print.epi_df <- function(x, ...) {
5858
prettyNum(ncol(x), ","), "with metadata:\n"
5959
)
6060
cat(sprintf("* %-9s = %s\n", "geo_type", attributes(x)$metadata$geo_type))
61-
cat(sprintf("* %-9s = %s\n", "time_type", attributes(x)$metadata$time_type))
6261
cat(sprintf("* %-9s = %s\n", "as_of", attributes(x)$metadata$as_of))
6362
# Conditional output (silent if attribute is NULL):
6463
cat(sprintf("* %-9s = %s\n", "decay_to_tibble", attr(x, "decay_to_tibble")))
@@ -83,7 +82,6 @@ print.epi_df <- function(x, ...) {
8382
summary.epi_df <- function(object, ...) {
8483
cat("An `epi_df` x, with metadata:\n")
8584
cat(sprintf("* %-9s = %s\n", "geo_type", attributes(object)$metadata$geo_type))
86-
cat(sprintf("* %-9s = %s\n", "time_type", attributes(object)$metadata$time_type))
8785
cat(sprintf("* %-9s = %s\n", "as_of", attributes(object)$metadata$as_of))
8886
cat("----------\n")
8987
cat(sprintf("* %-27s = %s\n", "min time value", min(object$time_value)))
@@ -118,15 +116,14 @@ decay_epi_df <- function(x) {
118116
}
119117

120118
# Implementing `dplyr_extending`: we have a few metadata attributes to consider:
121-
# `as_of` is an attribute doesn't depend on the rows or columns, `geo_type` and
122-
# `time_type` are scalar attributes dependent on columns, and `other_keys` acts
123-
# like an attribute vectorized over columns; `dplyr_extending` advice at time of
124-
# writing says to implement `dplyr_reconstruct`, 1d `[`, `dplyr_col_modify`, and
125-
# `names<-`, but not `dplyr_row_slice`; however, we'll also implement
126-
# `dplyr_row_slice` anyway to prevent a `arrange` on grouped `epi_df`s from
127-
# dropping the `epi_df` class. We'll implement `[` to allow either 1d or 2d.
128-
# We'll also implement some other methods where we want to (try to) maintain an
129-
# `epi_df`.
119+
# `as_of` is an attribute doesn't depend on the rows or columns, `geo_type` is a
120+
# scalar attribute dependent on columns, and `other_keys` acts like an attribute
121+
# vectorized over columns; `dplyr_extending` advice at time of writing says to
122+
# implement `dplyr_reconstruct`, 1d `[`, `dplyr_col_modify`, and `names<-`, but
123+
# not `dplyr_row_slice`; however, we'll also implement `dplyr_row_slice` anyway
124+
# to prevent a `arrange` on grouped `epi_df`s from dropping the `epi_df` class.
125+
# We'll implement `[` to allow either 1d or 2d. We'll also implement some other
126+
# methods where we want to (try to) maintain an `epi_df`.
130127

131128
#' @param data tibble or `epi_df` (`dplyr` feeds in former, but we may
132129
#' directly feed in latter from our other methods)

0 commit comments

Comments
 (0)