-
Notifications
You must be signed in to change notification settings - Fork 10
Docs overhaul #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Docs overhaul #431
Conversation
/preview-docs |
🚀 Deployed on https://68152e45625710260c913094--epipredict.netlify.app |
Also FYI: the bot edits its own comment for links. Each preview is a separate link and the links stick around for like 90 days. You can see the previous links in the comment edit history. |
8044b98
to
d35363e
Compare
/preview-docs |
So something weird is happening with the plot for I added an option to replace the data for the autoplot so you can compare with new data instead |
Draft of the getting started is ready, moving on to a draft of the "guts" page (name a placeholder), which is an overview of creating workflows by hand |
/preview-docs |
Including 0.5 into the user's selection sounds simple and reasonable to me. They can always filter out what they don't want. |
/preview-docs |
/preview-docs @nmdefries this also updates the backtesting vignette; I'm dropping the Canadian example because it basically had no revisions. |
/preview-docs |
My linewrapping preference is largely due to code review:
But the specifics don't matter to me so much. |
\preview-docs |
vignettes/epipredict.Rmd
Outdated
library(ggplot2) | ||
library(purrr) | ||
forecast_date <- as.Date("2021-08-01") | ||
used_locations <- c("ca", "ma", "ny", "tx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: make these visible in the code, so the vignette code can be reproduced by someone following along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved down a bit and done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a partial review:
- epipredict.Rmd
- most of the root dir files.
- README.Rmd.
Still to do:
- other vignettes
- any of the actual function docs in
R/
@@ -4,21 +4,27 @@ development: | |||
mode: devel | |||
|
|||
template: | |||
light-switch: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light-switch: true | |
light-switch: false |
The dark option looks terrible. We'd need lots of adjustments in delphidocs
, so I suggest we remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I for one would like an option to not be blinded while reading the docs after dark. I'm guessing anyone who's defaulted to the dark option would also prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfectly reasonable. But before turning it on and publishing it, you would need
to ensure that the text isn't the same color as the background. Currently, the
menu bar is illegible in dark mode. This has to be adjusted in delphidocs
.
inst/pkgdown-watch.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably not go in /inst
. This directory gets installed as-is on every user's machine that installs the package. This is a developer script. I'm not sure where it should live, but likely outside the package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah that doesn't make sense. Not sure where I picked up putting it in inst
. How about just in the main directory? It's not like it will actively get in the way of someone trying to use the source otherwise.
/preview-docs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I've made it through the other main vignettes.
Do I need to look at the ones under articles?
Do I need to look at the function documentation?
|
||
### Preprocessor | ||
|
||
A preprocessor (also called a recipe) transforms the data before model training and prediction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely accurate. The preprocessor can be various things. Currently the
different types allowed are "recipe", "formula", or "variables".
At it's most basic, a preprocessor identifies the features and response. Both
recipes and formulas can be used to create transformations, but they operate
differently.
|
||
### Trainer | ||
|
||
A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A trainer (aso called a model or engine) fits a `{parsnip}` model on data, and outputs a fitted model object. | |
A trainer (aso called a model or engine) fits a `{parsnip}` model to training data, and outputs a fitted model object. |
The "also called" is doing a lot of work here, and not quite precise. Related to
our previous conversation. The "model" here is a statistical one, like "linear
regression" via linear_reg()
, while the "engine" is computational, like
lm()
. The computational engine can be changed without altering the fundamental
statistical model being estimated. For example, rand_forest()
has 6 different
possible computational
engines who's
behaviour can differ. But they all produce random forests.
.before = 120, | ||
.versions = forecast_dates | ||
) | ||
forecast_wrapper <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we format this chunk a bit better: there's lots of available width. It just
kind of looks funny.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly trying to rely on tabs to indicate sub-calls, since it's pretty deeply nested
scale_y_continuous(expand = expansion(c(0, 0.05))) + | ||
labs(x = "Date", y = "smoothed, day of week adjusted covid-like doctors visits") + | ||
theme(legend.position = "none") | ||
``` | ||
|
||
```{r} | ||
```{r plot_fl_forecasts, warning = FALSE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer a different color scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely not a great choice. The tricky thing about choosing a color scheme for this plot is that we actually want adjacent dates to be far-ish apart, which is more typical for discrete palettes, but 15 is a lot of colors for those; I guess hue is the default? Anyways, the result for some Viridis types:
Inferno (the one you suggested above)
Turbo
Maybe a bit too pastel? but it distinguishes feb/march 2021 better
Base Viridis
We should probably port whatever we decide on back to epiprocess at some point.
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons | ||
(although neither approach produces amazingly accurate forecasts). | ||
|
||
### Example using case data from Canada | ||
In the version faithful case for California, the March 2021 forecast (turquoise) | ||
starts at a value just above 10, which is very well lined up with reported values leading up to that forecast. | ||
The measured and forecasted trends are also concordant (both increasingly moderately fast). | ||
|
||
<details> | ||
Because the data for this time period was later adjusted down with a decreasing trend, the March 2021 forecast looks quite bad compared to finalized data. | ||
|
||
<summary>Data and forecasts. Similar to the above.</summary> | ||
|
||
By leveraging the flexibility of `epiprocess`, we can apply the same techniques | ||
to data from other sources. Since some collaborators are in British Columbia, | ||
Canada, we'll do essentially the same thing for Canada as we did above. | ||
|
||
The [COVID-19 Canada Open Data Working Group](https://opencovid.ca/) collects | ||
daily time series data on COVID-19 cases, deaths, recoveries, testing and | ||
vaccinations at the health region and province levels. Data are collected from | ||
publicly available sources such as government datasets and news releases. | ||
Unfortunately, there is no simple versioned source, so we have created our own | ||
from the Github commit history. | ||
|
||
First, we load versioned case rates at the provincial level. After converting | ||
these to 7-day averages (due to highly variable provincial reporting | ||
mismatches), we then convert the data to an `epi_archive` object, and extract | ||
the latest version from it. Finally, we run the same forcasting exercise as for | ||
the American data, but here we compare the forecasts produced from using simple | ||
linear regression with those from using boosted regression trees. | ||
|
||
```{r get-can-fc, warning = FALSE} | ||
aheads <- c(7, 14, 21, 28) | ||
canada_archive <- can_prov_cases | ||
canada_archive_faux <- epix_as_of(canada_archive, canada_archive$versions_end) %>% | ||
mutate(version = time_value) %>% | ||
as_epi_archive() | ||
# This function will add the 7-day average of the case rate to the data | ||
# before forecasting. | ||
smooth_cases <- function(epi_df) { | ||
epi_df %>% | ||
group_by(geo_value) %>% | ||
epi_slide_mean("case_rate", .window_size = 7, na.rm = TRUE, .suffix = "_{.n}dav") | ||
} | ||
forecast_dates <- seq.Date( | ||
from = min(canada_archive$DT$version), | ||
to = max(canada_archive$DT$version), | ||
by = "1 month" | ||
) | ||
The equivalent version un-faithful forecast starts at a value of 5, which is in line with the finalized data but would have been out of place compared to the version data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Food for thought, do we even really want to demonstrate un-faithful forecasting?
We can just reference published work that says it's bad and do only the version
faithful case.
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons | ||
(although neither approach produces amazingly accurate forecasts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The extra spaces create a strange looking new line.)
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons | |
(although neither approach produces amazingly accurate forecasts). | |
The version faithful and un-faithful forecasts look moderately similar except for the 1 day horizons (although neither approach produces amazingly accurate forecasts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I've made it through the other main vignettes.
Do I need to look at the ones under articles?
I've also reviewed all function documentation.
#' workflow | ||
#' @return An `arx_fcast`, with the fields `predictions` and `epi_workflow`. | ||
#' `predictions` is an `epi_df` of predicted values while `epi_workflow()` is | ||
#' the fit workflow used to make those predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' the fit workflow used to make those predictions | |
#' the trained workflow used to make those predictions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were using fit in the context of workflows?
#' @details The step assumes that the data's `time_value` column is already _in | ||
#' the proper sequential order_ for shifting. | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? I though it just added to the time_value and joined. In that case,
it wouldn't matter what order it was in.
#' argument is a common *multiplier* of the selected variables. | ||
#' `step_population_scaling()` creates a specification of a recipe step that | ||
#' will perform per-capita scaling. Typical usage would set `df` to be a dataset | ||
#' that contains state-level population, and use it to convert predictions made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' that contains state-level population, and use it to convert predictions made | |
#' that contains population for each `geo_value`, and use it to convert predictions made |
#' @param by A (possibly named) character vector of variables to join by. | ||
#' @param role For model terms created by this step, what analysis role should | ||
#' they be assigned? | ||
#' @param df a data frame containing the scaling data (such as population). The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' @param df a data frame containing the scaling data (such as population). The | |
#' @param df a data frame containing the scaling data (typically population). The |
#' @param by A (possibly named) character vector of variables to join `df` onto | ||
#' the `epi_df` by. | ||
#' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' @param by A (possibly named) character vector of variables to join `df` onto | |
#' the `epi_df` by. | |
#' | |
#' @param by A (possibly named) character vector of variables by which to join `df` to | |
#' the `epi_df`. | |
#' |
#' `step_epi_naomit()`. Typical usage will have this function applied after | ||
#' every other step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' `step_epi_naomit()`. Typical usage will have this function applied after | |
#' every other step. | |
#' `step_epi_naomit()`. Typical usage will use this step last | |
#' in an `epi_recipe()`. |
Checklist
Please:
dajmcdon.
DESCRIPTION
andNEWS.md
.Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).
epiprocess
version in theDESCRIPTION
file ifepiprocess
soonepipredict
andepiprocess
Change explanations for reviewer
Draft ready for review:
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
symmetrize
for residuals #264nafill_buffer
usage #320