Docs overhaul #431

dsweber2 · 2025-01-23T20:59:14Z

Checklist

Please:

Make sure this PR is against "dev", not "main".
Request a review from one of the current epipredict main reviewers:
dajmcdon.
Make sure to bump the version number in DESCRIPTION and NEWS.md.
Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).
Consider pinning the epiprocess version in the DESCRIPTION file if
- You anticipate breaking changes in epiprocess soon
- You want to co-develop features in epipredict and epiprocess

Change explanations for reviewer

Draft ready for review:

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves https://github.com/cmu-delphi/forecasting-team-meta/issues/26
Resolves More clearly document symmetrize for residuals #264
Resolves Switch vignettes to using standalone data #261
Resolves Vignette illustrating FluSight Forecaster #256
Resolves Clarify documentation for and sanity-check nafill_buffer usage #320
Resolves Vignettes #232

dajmcdon · 2025-01-23T23:26:47Z

/preview-docs

github-actions · 2025-01-23T23:37:31Z

🚀 Deployed on https://68152e45625710260c913094--epipredict.netlify.app

dshemetov · 2025-01-23T23:45:49Z

~~Our setup is generating docs in dev/, so the link is off, this works: https://6792d4953137ef0ce0547a4f--epipredict.netlify.app/dev/~~ Edit: this has been fixed

Also FYI: the bot edits its own comment for links. Each preview is a separate link and the links stick around for like 90 days. You can see the previous links in the comment edit history.

dsweber2 · 2025-01-31T23:50:39Z

/preview-docs

dsweber2 · 2025-01-31T23:51:30Z

So something weird is happening with the plot for flatline_forecaster, not really sure why. Going to dig into that next.

I added an option to replace the data for the autoplot so you can compare with new data instead

dsweber2 · 2025-02-03T18:03:29Z

Draft of the getting started is ready, moving on to a draft of the "guts" page (name a placeholder), which is an overview of creating workflows by hand

dsweber2 · 2025-02-05T23:34:15Z

So something weird is happening with the plot for flatline_forecaster, not really sure why. Going to dig into that next.

After some digging, I don't think there are any bugs, just some edge-case behavior that we may not want:

Thresholding and extrapolation don't interact well. In this case, the quantiles it fits are 0.05 and 0.95, and it correctly rounds the 5% quantile up to zero (b/c of the negative values it is actually negative w/out constraint). But, it also looks to plot the 2.5% and 97.5% quantiles, which extrapolate doesn't know should also be zero. This results in quantiles with negative values.
The other thing is that the interpolated quantiles change quite a bit after thresholding if there's not very many quantiles. For example, the median gets pushed up quite a bit, but the point prediction doesn't reflect that.

My take away: never fit just the 5% and 95% quantiles. At least do the 50%. That fixes most of the jank this uncovers.

dsweber2 · 2025-02-07T20:01:35Z

/preview-docs

dshemetov · 2025-02-07T20:14:31Z

Including 0.5 into the user's selection sounds simple and reasonable to me. They can always filter out what they don't want.

dsweber2 · 2025-02-07T21:50:01Z

/preview-docs

dsweber2 · 2025-02-25T20:24:22Z

/preview-docs

@nmdefries this also updates the backtesting vignette; I'm dropping the Canadian example because it basically had no revisions.

dsweber2 · 2025-02-25T23:08:11Z

/preview-docs

README.md

README.Rmd

dajmcdon · 2025-04-10T16:55:53Z

My linewrapping preference is largely due to code review:

It starts being hard to read on the PR viewer;
More importantly, git flags the whole line as changed, so shorter lines means its easier to track the diff.

But the specifics don't matter to me so much.

dsweber2 · 2025-04-15T16:50:17Z

\preview-docs

dshemetov · 2025-04-22T22:22:47Z

vignettes/epipredict.Rmd

+library(ggplot2)
+library(purrr)
+forecast_date <- as.Date("2021-08-01")
+used_locations <- c("ca", "ma", "ny", "tx")


suggestion: make these visible in the code, so the vignette code can be reproduced by someone following along.

Moved down a bit and done

dajmcdon

This is a partial review:

epipredict.Rmd
most of the root dir files.
README.Rmd.

Still to do:

other vignettes
any of the actual function docs in R/

_pkgdown.yml

dajmcdon · 2025-04-30T21:02:34Z

_pkgdown.yml

@@ -4,21 +4,27 @@ development:
  mode: devel

 template:
+  light-switch: true


Suggested change

light-switch: true

light-switch: false

The dark option looks terrible. We'd need lots of adjustments in delphidocs, so I suggest we remove it.

I for one would like an option to not be blinded while reading the docs after dark. I'm guessing anyone who's defaulted to the dark option would also prefer that.

Perfectly reasonable. But before turning it on and publishing it, you would need
to ensure that the text isn't the same color as the background. Currently, the
menu bar is illegible in dark mode. This has to be adjusted in delphidocs.

dajmcdon · 2025-04-30T21:04:53Z

inst/pkgdown-watch.R

This should probably not go in /inst. This directory gets installed as-is on every user's machine that installs the package. This is a developer script. I'm not sure where it should live, but likely outside the package.

oh yeah that doesn't make sense. Not sure where I picked up putting it in inst. How about just in the main directory? It's not like it will actively get in the way of someone trying to use the source otherwise.

README.Rmd

vignettes/epipredict.Rmd

dsweber2 · 2025-05-02T20:35:17Z

/preview-docs

dajmcdon

OK. I've made it through the other main vignettes.

Do I need to look at the ones under articles?

Do I need to look at the function documentation?

dajmcdon · 2025-05-14T21:22:18Z

vignettes/custom_epiworkflows.Rmd

+
+### Preprocessor
+
+A preprocessor (also called a recipe) transforms the data before model training and prediction.


Not entirely accurate. The preprocessor can be various things. Currently the
different types allowed are "recipe", "formula", or "variables".

At it's most basic, a preprocessor identifies the features and response. Both
recipes and formulas can be used to create transformations, but they operate
differently.

vignettes/custom_epiworkflows.Rmd

dajmcdon · 2025-05-14T21:27:39Z

vignettes/custom_epiworkflows.Rmd

+
+### Trainer
+
+A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object.


Suggested change

A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object.

A trainer (aso called a model or engine) fits a `{parsnip}` model to training data, and outputs a fitted model object.

The "also called" is doing a lot of work here, and not quite precise. Related to
our previous conversation. The "model" here is a statistical one, like "linear
regression" via linear_reg() , while the "engine" is computational, like
lm(). The computational engine can be changed without altering the fundamental
statistical model being estimated. For example, rand_forest() has 6 different
possible computational
engines who's
behaviour can differ. But they all produce random forests.

vignettes/backtesting.Rmd

dajmcdon · 2025-05-14T22:31:50Z

vignettes/backtesting.Rmd

-      .before = 120,
-      .versions = forecast_dates
-    )
+forecast_wrapper <- function(


Can we format this chunk a bit better: there's lots of available width. It just
kind of looks funny.

mostly trying to rely on tabs to indicate sub-calls, since it's pretty deeply nested

dajmcdon · 2025-05-14T22:33:31Z

vignettes/backtesting.Rmd

  scale_y_continuous(expand = expansion(c(0, 0.05))) +
  labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") +
  theme(legend.position = "none")
 ```

-```{r}
+```{r plot_fl_forecasts, warning = FALSE}


I would strongly prefer a different color scheme.

It's definitely not a great choice. The tricky thing about choosing a color scheme for this plot is that we actually want adjacent dates to be far-ish apart, which is more typical for discrete palettes, but 15 is a lot of colors for those; I guess hue is the default? Anyways, the result for some Viridis types:

Inferno (the one you suggested above)

Turbo

Maybe a bit too pastel? but it distinguishes feb/march 2021 better

Base Viridis

We should probably port whatever we decide on back to epiprocess at some point.

dajmcdon · 2025-05-14T22:35:01Z

vignettes/backtesting.Rmd

+The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons  
+(although neither approach produces amazingly accurate forecasts).

-### Example using case data from Canada
+In the version faithful case for California, the March 2021 forecast (turquoise)
+starts at a value just above 10, which is very well lined up with reported values leading up to that forecast.
+The measured and forecasted trends are also concordant (both increasingly moderately fast).

-<details>
+Because the data for this time period was later adjusted down with a decreasing trend, the March 2021 forecast looks quite bad compared to finalized data.

-<summary>Data and forecasts. Similar to the above.</summary>
-
-By leveraging the flexibility of `epiprocess`, we can apply the same techniques
-to data from other sources. Since some collaborators are in British Columbia,
-Canada, we'll do essentially the same thing for Canada as we did above.
-
-The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects
-daily time series data on COVID-19 cases, deaths, recoveries, testing and
-vaccinations at the health region and province levels. Data are collected from
-publicly available sources such as government datasets and news releases.
-Unfortunately, there is no simple versioned source, so we have created our own
-from the Github commit history.
-
-First, we load versioned case rates at the provincial level. After converting
-these to 7-day averages (due to highly variable provincial reporting
-mismatches), we then convert the data to an `epi_archive` object, and extract
-the latest version from it. Finally, we run the same forcasting exercise as for
-the American data, but here we compare the forecasts produced from using simple
-linear regression with those from using boosted regression trees.
-
-```{r get-can-fc, warning = FALSE}
-aheads <- c(7, 14, 21, 28)
-canada_archive <- can_prov_cases
-canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>%
-  mutate(version = time_value) %>%
-  as_epi_archive()
-# This function will add the 7-day average of the case rate to the data
-# before forecasting.
-smooth_cases <- function(epi_df) {
-  epi_df %>%
-    group_by(geo_value) %>%
-    epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav")
-}
-forecast_dates <- seq.Date(
-  from = min(canada_archive$DT$version),
-  to = max(canada_archive$DT$version),
-  by = "1 month"
-)
+The equivalent version un-faithful forecast starts at a value of 5, which is in line with the finalized data but would have been out of place compared to the version data.


Food for thought, do we even really want to demonstrate un-faithful forecasting?
We can just reference published work that says it's bad and do only the version
faithful case.

dajmcdon · 2025-05-14T22:36:51Z

vignettes/backtesting.Rmd

+The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons  
+(although neither approach produces amazingly accurate forecasts).


(The extra spaces create a strange looking new line.)

Suggested change

The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons

(although neither approach produces amazingly accurate forecasts).

The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons (although neither approach produces amazingly accurate forecasts).

dajmcdon

OK. I've made it through the other main vignettes.

Do I need to look at the ones under articles?

I've also reviewed all function documentation.

R/arx_classifier.R

R/arx_forecaster.R

dajmcdon · 2025-05-15T17:51:41Z

R/arx_forecaster.R

-#'   workflow
+#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`.
+#'   `predictions` is an `epi_df` of predicted values while `epi_workflow()` is
+#'   the fit workflow used to make those predictions


Suggested change

#' the fit workflow used to make those predictions

#' the trained workflow used to make those predictions

I thought we were using fit in the context of workflows?

dajmcdon · 2025-05-15T19:34:03Z

R/step_epi_shift.R

+#' @details The step assumes that the data's `time_value` column is already _in
+#'   the proper sequential order_ for shifting.
+#'


Is this true? I though it just added to the time_value and joined. In that case,
it wouldn't matter what order it was in.

dajmcdon · 2025-05-15T19:41:53Z

R/step_population_scaling.R

-#' argument is a common *multiplier* of the selected variables.
+#' `step_population_scaling()` creates a specification of a recipe step that
+#' will perform per-capita scaling. Typical usage would set `df` to be a dataset
+#' that contains state-level population, and use it to convert predictions made


Suggested change

#' that contains state-level population, and use it to convert predictions made

#' that contains population for each `geo_value`, and use it to convert predictions made

dajmcdon · 2025-05-15T19:42:42Z

R/step_population_scaling.R

-#' @param by A (possibly named) character vector of variables to join by.
+#' @param role For model terms created by this step, what analysis role should
+#'   they be assigned?
+#' @param df a data frame containing the scaling data (such as population). The


Suggested change

#' @param df a data frame containing the scaling data (such as population). The

#' @param df a data frame containing the scaling data (typically population). The

dajmcdon · 2025-05-15T19:43:35Z

R/step_population_scaling.R

+#' @param by A (possibly named) character vector of variables to join `df` onto
+#'   the `epi_df` by.
 #'


Suggested change

#' @param by A (possibly named) character vector of variables to join `df` onto

#' the `epi_df` by.

#'

#' @param by A (possibly named) character vector of variables by which to join `df` to

#' the `epi_df`.

#'

dajmcdon · 2025-05-15T19:45:01Z

R/step_training_window.R

+#'   `step_epi_naomit()`. Typical usage will have this function applied after
+#'   every other step.


Suggested change

#' `step_epi_naomit()`. Typical usage will have this function applied after

#' every other step.

#' `step_epi_naomit()`. Typical usage will use this step last

#' in an `epi_recipe()`.

dsweber2 requested a review from dajmcdon as a code owner January 23, 2025 20:59

dsweber2 force-pushed the docsDraft branch from e935edb to fbf578c Compare January 23, 2025 23:36

dsweber2 force-pushed the docsDraft branch 4 times, most recently from 8044b98 to d35363e Compare January 27, 2025 22:50

This was referenced Feb 7, 2025

extrapolate_quantiles and thresholding #434

Closed

Canned median quantile #435

Merged

dsweber2 self-assigned this Feb 7, 2025

dsweber2 force-pushed the docsDraft branch from e58a1ca to d1f3d7d Compare February 7, 2025 21:48

dsweber2 force-pushed the docsDraft branch from d1f3d7d to 13d0655 Compare February 10, 2025 17:40

dsweber2 mentioned this pull request Feb 11, 2025

Hotfix growth rate #437

Merged

5 tasks

dsweber2 force-pushed the docsDraft branch from 359f324 to 14dfcd8 Compare February 14, 2025 23:47

dajmcdon mentioned this pull request Feb 26, 2025

440 epiprocess imports #444

Merged

3 tasks

dsweber2 mentioned this pull request Feb 28, 2025

moving smooth-qr over, updating -> R 4.4.1 cmu-delphi/delphi-tooling-book#18

Draft

nmdefries reviewed Feb 28, 2025

View reviewed changes

README.md Show resolved Hide resolved

nmdefries reviewed Mar 1, 2025

View reviewed changes

README.Rmd Outdated Show resolved Hide resolved

dsweber2 force-pushed the docsDraft branch from 0578ace to ce1adfe Compare March 3, 2025 18:50

dsweber2 and others added 16 commits April 10, 2025 10:30

extra details for symmetrize

01f1d22

epipredict.Rmd

84ed412

backtesting.rmd

9bbf7bf

first half custom_epiworkflows.Rmd

bb4025c

second half custom_epiworkflows.Rmd

05f1507

various requested changes

c9361ae

backtesting version un/faithful clarification

cdf3730

why comparing to final data

7304110

fixing backtest truth data plot

4df2535

backtesting.rmd comment fixes

3aad6b8

add alternate step names and say if optional/not

5469dda

get_test_data help

b481a28

get_test_data forecasts identical

764b7f9

clarify changing frosting with model

c716235

classifier chunk comments

42b146d

model-specific layers

ef99e42

dsweber2 force-pushed the docsDraft branch from 00f8ce5 to ef99e42 Compare April 10, 2025 16:27

removing resolved todos

a958343

dshemetov reviewed Apr 22, 2025

View reviewed changes

dajmcdon requested changes Apr 30, 2025

View reviewed changes

dsweber2 added 2 commits May 1, 2025 19:51

dan's simple suggestions

2056e0a

move pkgdown-watch, better climate ex, some wording

ef1fd58

dsweber2 added 2 commits May 2, 2025 17:23

moving library, geo-pooling phrasing

4a9f43e

fit -> estimate

9f0af0a

dajmcdon requested changes May 14, 2025

View reviewed changes

dajmcdon requested changes May 15, 2025

View reviewed changes

dsweber2 mentioned this pull request May 16, 2025

layer_threshold should error on categorical transforms #465

Open


		### Preprocessor

		A preprocessor (also called a recipe) transforms the data before model training and prediction.


		### Trainer

		A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object.

		The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons
		(although neither approach produces amazingly accurate forecasts).

	#' the fit workflow used to make those predictions
	#' the trained workflow used to make those predictions

	#' that contains state-level population, and use it to convert predictions made
	#' that contains population for each `geo_value`, and use it to convert predictions made

	#' @param df a data frame containing the scaling data (such as population). The
	#' @param df a data frame containing the scaling data (typically population). The

		#' `step_epi_naomit()`. Typical usage will have this function applied after
		#' every other step.

Docs overhaul #431

Are you sure you want to change the base?

Docs overhaul #431

Conversation

dsweber2 commented Jan 23, 2025 • edited Loading

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dajmcdon commented Jan 23, 2025

github-actions bot commented Jan 23, 2025 • edited Loading

dshemetov commented Jan 23, 2025 • edited Loading

dsweber2 commented Jan 31, 2025

dsweber2 commented Jan 31, 2025

dsweber2 commented Feb 3, 2025

dsweber2 commented Feb 5, 2025 • edited Loading

dsweber2 commented Feb 7, 2025

dshemetov commented Feb 7, 2025

dsweber2 commented Feb 7, 2025

dsweber2 commented Feb 25, 2025

dsweber2 commented Feb 25, 2025

dajmcdon commented Apr 10, 2025

dsweber2 commented Apr 15, 2025

dshemetov Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dajmcdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsweber2 commented May 2, 2025

dajmcdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsweber2 May 16, 2025 • edited Loading

Choose a reason for hiding this comment

Inferno (the one you suggested above)

Turbo

Base Viridis

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dajmcdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsweber2 commented Jan 23, 2025 •

edited

Loading

github-actions bot commented Jan 23, 2025 •

edited

Loading

dshemetov commented Jan 23, 2025 •

edited

Loading

dsweber2 commented Feb 5, 2025 •

edited

Loading

dshemetov Apr 22, 2025 •

edited

Loading

dsweber2 May 16, 2025 •

edited

Loading