Skip to content

Docs overhaul #431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 62 commits into
base: dev
Choose a base branch
from
Open

Docs overhaul #431

wants to merge 62 commits into from

Conversation

dsweber2
Copy link
Contributor

@dsweber2 dsweber2 commented Jan 23, 2025

Checklist

Please:

  • Make sure this PR is against "dev", not "main".
  • Request a review from one of the current epipredict main reviewers:
    dajmcdon.
  • Make sure to bump the version number in DESCRIPTION and NEWS.md.
    Always increment the patch version number (the third number), unless you are
    making a release PR from dev to main, in which case increment the minor
    version number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    0.7.2, then write your changes under the 0.8 heading).
  • Consider pinning the epiprocess version in the DESCRIPTION file if
    • You anticipate breaking changes in epiprocess soon
    • You want to co-develop features in epipredict and epiprocess

Change explanations for reviewer

Draft ready for review:

  • Landing Page
  • Getting Started
  • Customized Forecasters
  • Reference
    • Using the add/update/remove and adjust functions
    • Smooth Quantile regression
  • preprocessing and models examples
  • backtesting forecasters

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@dsweber2 dsweber2 requested a review from dajmcdon as a code owner January 23, 2025 20:59
@dajmcdon
Copy link
Contributor

/preview-docs

Copy link

github-actions bot commented Jan 23, 2025

@dshemetov
Copy link
Contributor

dshemetov commented Jan 23, 2025

Our setup is generating docs in dev/, so the link is off, this works: https://6792d4953137ef0ce0547a4f--epipredict.netlify.app/dev/ Edit: this has been fixed

Also FYI: the bot edits its own comment for links. Each preview is a separate link and the links stick around for like 90 days. You can see the previous links in the comment edit history.

@dsweber2 dsweber2 force-pushed the docsDraft branch 4 times, most recently from 8044b98 to d35363e Compare January 27, 2025 22:50
@dsweber2
Copy link
Contributor Author

/preview-docs

@dsweber2
Copy link
Contributor Author

So something weird is happening with the plot for flatline_forecaster, not really sure why. Going to dig into that next.

I added an option to replace the data for the autoplot so you can compare with new data instead

@dsweber2
Copy link
Contributor Author

dsweber2 commented Feb 3, 2025

Draft of the getting started is ready, moving on to a draft of the "guts" page (name a placeholder), which is an overview of creating workflows by hand

@dsweber2
Copy link
Contributor Author

dsweber2 commented Feb 5, 2025

So something weird is happening with the plot for flatline_forecaster, not really sure why. Going to dig into that next.
image

After some digging, I don't think there are any bugs, just some edge-case behavior that we may not want:

  1. Thresholding and extrapolation don't interact well. In this case, the quantiles it fits are 0.05 and 0.95, and it correctly rounds the 5% quantile up to zero (b/c of the negative values it is actually negative w/out constraint). But, it also looks to plot the 2.5% and 97.5% quantiles, which extrapolate doesn't know should also be zero. This results in quantiles with negative values.
  2. The other thing is that the interpolated quantiles change quite a bit after thresholding if there's not very many quantiles. For example, the median gets pushed up quite a bit, but the point prediction doesn't reflect that.

My take away: never fit just the 5% and 95% quantiles. At least do the 50%. That fixes most of the jank this uncovers.

@dsweber2 dsweber2 self-assigned this Feb 7, 2025
@dsweber2
Copy link
Contributor Author

dsweber2 commented Feb 7, 2025

/preview-docs

@dshemetov
Copy link
Contributor

Including 0.5 into the user's selection sounds simple and reasonable to me. They can always filter out what they don't want.

@dsweber2
Copy link
Contributor Author

dsweber2 commented Feb 7, 2025

/preview-docs

@dsweber2
Copy link
Contributor Author

/preview-docs

@nmdefries this also updates the backtesting vignette; I'm dropping the Canadian example because it basically had no revisions.

@dsweber2
Copy link
Contributor Author

/preview-docs

@dajmcdon
Copy link
Contributor

My linewrapping preference is largely due to code review:

  1. It starts being hard to read on the PR viewer;
  2. More importantly, git flags the whole line as changed, so shorter lines means its easier to track the diff.

But the specifics don't matter to me so much.

@dsweber2
Copy link
Contributor Author

\preview-docs

library(ggplot2)
library(purrr)
forecast_date <- as.Date("2021-08-01")
used_locations <- c("ca", "ma", "ny", "tx")
Copy link
Contributor

@dshemetov dshemetov Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: make these visible in the code, so the vignette code can be reproduced by someone following along.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved down a bit and done

Copy link
Contributor

@dajmcdon dajmcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial review:

  • epipredict.Rmd
  • most of the root dir files.
  • README.Rmd.

Still to do:

  • other vignettes
  • any of the actual function docs in R/

@@ -4,21 +4,27 @@ development:
mode: devel

template:
light-switch: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
light-switch: true
light-switch: false

The dark option looks terrible. We'd need lots of adjustments in delphidocs, so I suggest we remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I for one would like an option to not be blinded while reading the docs after dark. I'm guessing anyone who's defaulted to the dark option would also prefer that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfectly reasonable. But before turning it on and publishing it, you would need
to ensure that the text isn't the same color as the background. Currently, the
menu bar is illegible in dark mode. This has to be adjusted in delphidocs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably not go in /inst. This directory gets installed as-is on every user's machine that installs the package. This is a developer script. I'm not sure where it should live, but likely outside the package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that doesn't make sense. Not sure where I picked up putting it in inst. How about just in the main directory? It's not like it will actively get in the way of someone trying to use the source otherwise.

@dsweber2
Copy link
Contributor Author

dsweber2 commented May 2, 2025

/preview-docs

Copy link
Contributor

@dajmcdon dajmcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've made it through the other main vignettes.

Do I need to look at the ones under articles?

Do I need to look at the function documentation?


### Preprocessor

A preprocessor (also called a recipe) transforms the data before model training and prediction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely accurate. The preprocessor can be various things. Currently the
different types allowed are "recipe", "formula", or "variables".

At it's most basic, a preprocessor identifies the features and response. Both
recipes and formulas can be used to create transformations, but they operate
differently.


### Trainer

A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object.
A trainer (aso called a model or engine) fits a `{parsnip}` model to training data, and outputs a fitted model object.

The "also called" is doing a lot of work here, and not quite precise. Related to
our previous conversation. The "model" here is a statistical one, like "linear
regression" via linear_reg() , while the "engine" is computational, like
lm(). The computational engine can be changed without altering the fundamental
statistical model being estimated. For example, rand_forest() has 6 different
possible computational
engines who's
behaviour can differ. But they all produce random forests.

.before = 120,
.versions = forecast_dates
)
forecast_wrapper <- function(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we format this chunk a bit better: there's lots of available width. It just
kind of looks funny.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly trying to rely on tabs to indicate sub-calls, since it's pretty deeply nested

scale_y_continuous(expand = expansion(c(0, 0.05))) +
labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") +
theme(legend.position = "none")
```

```{r}
```{r plot_fl_forecasts, warning = FALSE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer a different color scheme.

Copy link
Contributor Author

@dsweber2 dsweber2 May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not a great choice. The tricky thing about choosing a color scheme for this plot is that we actually want adjacent dates to be far-ish apart, which is more typical for discrete palettes, but 15 is a lot of colors for those; I guess hue is the default? Anyways, the result for some Viridis types:

Inferno (the one you suggested above)

tmp

Turbo

Maybe a bit too pastel? but it distinguishes feb/march 2021 better
turbocardinal

Base Viridis

viridisOG

We should probably port whatever we decide on back to epiprocess at some point.

Comment on lines +371 to +395
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons
(although neither approach produces amazingly accurate forecasts).

### Example using case data from Canada
In the version faithful case for California, the March 2021 forecast (turquoise)
starts at a value just above 10, which is very well lined up with reported values leading up to that forecast.
The measured and forecasted trends are also concordant (both increasingly moderately fast).

<details>
Because the data for this time period was later adjusted down with a decreasing trend, the March 2021 forecast looks quite bad compared to finalized data.

<summary>Data and forecasts. Similar to the above.</summary>

By leveraging the flexibility of `epiprocess`, we can apply the same techniques
to data from other sources. Since some collaborators are in British Columbia,
Canada, we'll do essentially the same thing for Canada as we did above.

The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects
daily time series data on COVID-19 cases, deaths, recoveries, testing and
vaccinations at the health region and province levels. Data are collected from
publicly available sources such as government datasets and news releases.
Unfortunately, there is no simple versioned source, so we have created our own
from the Github commit history.

First, we load versioned case rates at the provincial level. After converting
these to 7-day averages (due to highly variable provincial reporting
mismatches), we then convert the data to an `epi_archive` object, and extract
the latest version from it. Finally, we run the same forcasting exercise as for
the American data, but here we compare the forecasts produced from using simple
linear regression with those from using boosted regression trees.

```{r get-can-fc, warning = FALSE}
aheads <- c(7, 14, 21, 28)
canada_archive <- can_prov_cases
canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>%
mutate(version = time_value) %>%
as_epi_archive()
# This function will add the 7-day average of the case rate to the data
# before forecasting.
smooth_cases <- function(epi_df) {
epi_df %>%
group_by(geo_value) %>%
epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav")
}
forecast_dates <- seq.Date(
from = min(canada_archive$DT$version),
to = max(canada_archive$DT$version),
by = "1 month"
)
The equivalent version un-faithful forecast starts at a value of 5, which is in line with the finalized data but would have been out of place compared to the version data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought, do we even really want to demonstrate un-faithful forecasting?
We can just reference published work that says it's bad and do only the version
faithful case.

Comment on lines +387 to +388
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons
(although neither approach produces amazingly accurate forecasts).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The extra spaces create a strange looking new line.)

Suggested change
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons
(although neither approach produces amazingly accurate forecasts).
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons (although neither approach produces amazingly accurate forecasts).

Copy link
Contributor

@dajmcdon dajmcdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I've made it through the other main vignettes.

Do I need to look at the ones under articles?

I've also reviewed all function documentation.

#' workflow
#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`.
#' `predictions` is an `epi_df` of predicted values while `epi_workflow()` is
#' the fit workflow used to make those predictions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' the fit workflow used to make those predictions
#' the trained workflow used to make those predictions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were using fit in the context of workflows?

Comment on lines +32 to +34
#' @details The step assumes that the data's `time_value` column is already _in
#' the proper sequential order_ for shifting.
#'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? I though it just added to the time_value and joined. In that case,
it wouldn't matter what order it was in.

#' argument is a common *multiplier* of the selected variables.
#' `step_population_scaling()` creates a specification of a recipe step that
#' will perform per-capita scaling. Typical usage would set `df` to be a dataset
#' that contains state-level population, and use it to convert predictions made
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' that contains state-level population, and use it to convert predictions made
#' that contains population for each `geo_value`, and use it to convert predictions made

#' @param by A (possibly named) character vector of variables to join by.
#' @param role For model terms created by this step, what analysis role should
#' they be assigned?
#' @param df a data frame containing the scaling data (such as population). The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param df a data frame containing the scaling data (such as population). The
#' @param df a data frame containing the scaling data (typically population). The

Comment on lines +18 to 20
#' @param by A (possibly named) character vector of variables to join `df` onto
#' the `epi_df` by.
#'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param by A (possibly named) character vector of variables to join `df` onto
#' the `epi_df` by.
#'
#' @param by A (possibly named) character vector of variables by which to join `df` to
#' the `epi_df`.
#'

Comment on lines +21 to +22
#' `step_epi_naomit()`. Typical usage will have this function applied after
#' every other step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' `step_epi_naomit()`. Typical usage will have this function applied after
#' every other step.
#' `step_epi_naomit()`. Typical usage will use this step last
#' in an `epi_recipe()`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vignettes
4 participants