Skip to content

fit.epi_workflow silently drops epi_workflow class when built or fit on a non-epi_df or non-epi_recipe in most cases #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brookslogan opened this issue Jul 18, 2024 · 2 comments · Fixed by #377

Comments

@brookslogan
Copy link
Contributor

I assume this will cause issues down the line when users are unexpectedly being missing out on slathering, or even earlier in the operation any step_epi*.

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(epipredict)
#> Loading required package: epiprocess
#> Registered S3 method overwritten by 'tsibble':
#>   method               from 
#>   as_tibble.grouped_df dplyr
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> Loading required package: parsnip
#> Registered S3 method overwritten by 'epipredict':
#>   method            from   
#>   print.step_naomit recipes
library(testthat)
#> 
#> Attaching package: 'testthat'
#> The following object is masked from 'package:dplyr':
#> 
#>     matches

tbl <- tibble::tibble(
  geo_value = 1,
  time_value = 1:100,
  x = 1:100,
  y = x + rnorm(100L)
)
edf <- as_epi_df(tbl)

rec_tbl <- recipe(y ~ x, data = tbl)
rec_edf <- recipe(y ~ x, data = edf)
erec_tbl <- epi_recipe(y ~ x, data = tbl) # warning, good; not testing further; why not error though?
#> Warning: epi_recipe has been called with a non-epi_df object, returning a
#> regular recipe. Various step_epi_* functions will not work.
erec_edf <- epi_recipe(y ~ x, data = edf)

ewf_rec_tbl <- epi_workflow(rec_tbl, linear_reg())
ewf_rec_edf <- epi_workflow(rec_edf, linear_reg())
ewf_erec_edf <- epi_workflow(erec_edf, linear_reg())

# above are all epi_workflows:

expect_s3_class(ewf_rec_tbl, "epi_workflow")
expect_s3_class(ewf_rec_edf, "epi_workflow")
expect_s3_class(ewf_erec_edf, "epi_workflow")

# but fitting drops the class or generates errors in many cases:

ewf_rec_tbl %>% fit(tbl) %>% class()
#> [1] "workflow"
ewf_rec_tbl %>% fit(edf) %>% class()
#> [1] "workflow"
ewf_rec_edf %>% fit(tbl) %>% class()
#> [1] "workflow"
ewf_rec_edf %>% fit(edf) %>% class()
#> [1] "workflow"
ewf_erec_edf %>% fit(tbl) %>% class()
#> Error in if (new_meta != old_meta) {: argument is of length zero
ewf_erec_edf %>% fit(edf) %>% class()
#> [1] "epi_workflow" "workflow"

Created on 2024-07-18 with reprex v2.1.1

@brookslogan
Copy link
Contributor Author

I assume the "correct" response here is to either output an epi_workflow or a clear error message. Not sure which cases should be which.

@brookslogan
Copy link
Contributor Author

I expect that this will cause some sort of breakage if using geo-grouped epi_slide to do naive backtesting, as .x there won't be an epi_df. Geo-grouped epix_slides should be okay, as they use keep = TRUE so they should feed in epi_dfs, but probably have some later annoyances from duplicate geo columns... I'm currently trying to remove the duplicate column complaining in epiprocess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant