Skip to content

Commit 524066a

Browse files
nmdefriesdsweber2
authored andcommitted
landing page wording and get code running
1 parent 5a5a24f commit 524066a

File tree

1 file changed

+61
-53
lines changed

1 file changed

+61
-53
lines changed

README.md

Lines changed: 61 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -8,27 +8,28 @@
88
[![R-CMD-check](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/cmu-delphi/epipredict/actions/workflows/R-CMD-check.yaml)
99
<!-- badges: end -->
1010

11-
Epipredict is a framework for building transformation and forecasting
11+
`{epipredict}` is a framework for building transformation and forecasting
1212
pipelines for epidemiological and other panel time-series datasets. In
1313
addition to tools for building forecasting pipelines, it contains a
1414
number of “canned” forecasters meant to run with little modification as
1515
an easy way to get started forecasting.
1616

1717
It is designed to work well with
18-
[`epiprocess`](https://cmu-delphi.github.io/epiprocess/), a utility for
19-
handling various time series and geographic processing tools in an
18+
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/), a utility for
19+
time series handling and geographic processing in an
2020
epidemiological context. Both of the packages are meant to work well
2121
with the panel data provided by
22-
[`epidatr`](https://cmu-delphi.github.io/epidatr/).
22+
[`{epidatr}`](https://cmu-delphi.github.io/epidatr/).
23+
Pre-compiled example datasets are also availalbe in [`{epidatasets}`](https://cmu-delphi.github.io/epidatasets/).
2324

24-
If you are looking for more detail beyond the package documentation, see
25-
our [forecasting
26-
book](https://cmu-delphi.github.io/delphi-tooling-book/).
25+
If you are looking for detail beyond the package documentation, see
26+
our [forecasting book](https://cmu-delphi.github.io/delphi-tooling-book/).
2727

2828
## Installation
2929

30-
To install (unless you’re planning on contributing to package
31-
development, we suggest using the stable version):
30+
Unless you’re planning on contributing to package
31+
development, we suggest using the stable version.
32+
To install, run:
3233

3334
``` r
3435
# Stable version
@@ -44,25 +45,32 @@ is at <https://cmu-delphi.github.io/epipredict/dev>.
4445

4546
## Motivating example
4647

47-
To demonstrate the kind of forecast epipredict can make, say we’re
48-
predicting COVID deaths per 100k for each state on
48+
To demonstrate the kind of forecast `{epipredict}` can make, say we want to
49+
predict COVID-19 deaths per 100k people for each state on 2021-08-01.
4950

5051
``` r
52+
library(epipredict)
53+
library(epidatr)
54+
library(epiprocess)
55+
library(dplyr)
56+
library(ggplot2)
57+
5158
forecast_date <- as.Date("2021-08-01")
5259
```
5360

5461
Below the fold, we construct this dataset as an `epiprocess::epi_df`
55-
from JHU data.
62+
from [Johns Hopkins Center for Systems Science and Engineering deaths data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/jhu-csse.html).
5663

5764
<details>
5865
<summary>
5966
Creating the dataset using `{epidatr}` and `{epiprocess}`
6067
</summary>
6168

62-
This dataset can be found in the package as `covid_case_death_rates`; we
63-
demonstrate some of the typically ubiquitous cleaning operations needed
64-
to be able to forecast. First we pull both jhu-csse cases and deaths
65-
from [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
69+
This section is intended to demonstrate some of the ubiquitous cleaning operations needed
70+
to be able to forecast.
71+
The dataset prepared here is also included ready-to-go in `{epipredict}` as `covid_case_death_rates`.
72+
73+
First we pull both `jhu-csse` cases and deaths data from the [Delphi API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html) using the [`{epidatr}`](https://cmu-delphi.github.io/epidatr/) package:
6674

6775
``` r
6876
cases <- pub_covidcast(
@@ -87,7 +95,7 @@ deaths <- pub_covidcast(
8795
```
8896

8997
Since visualizing the results on every geography is somewhat
90-
overwhelming, we’ll only train on a subset of 5.
98+
overwhelming, we’ll only train on a subset of locations.
9199

92100
``` r
93101
used_locations <- c("ca", "ma", "ny", "tx")
@@ -113,12 +121,11 @@ cases_deaths |>
113121

114122
<img src="man/figures/README-date-1.png" width="90%" style="display: block; margin: auto;" />
115123

116-
As with basically any dataset, there is some cleaning that we will need
117-
to do to make it actually usable; we’ll use some utilities from
124+
As with the typical dataset, we will need to do some cleaning to make it actually usable; we’ll use some utilities from
118125
[`{epiprocess}`](https://cmu-delphi.github.io/epiprocess/) for this.
119126

120-
First, to eliminate some of the noise coming from daily reporting, we do
121-
7 day averaging over a trailing window[^1]:
127+
First, to reduce the noise from daily reporting, we will compute a
128+
7 day average over a trailing window[^1]:
122129

123130
``` r
124131
cases_deaths <-
@@ -134,7 +141,7 @@ cases_deaths <-
134141
rename(case_rate = cases_7dav, death_rate = death_rate_7dav)
135142
```
136143

137-
Then trimming outliers, most especially negative values:
144+
Then we'll trim outliers, especially negative values:
138145

139146
``` r
140147
cases_deaths <-
@@ -161,24 +168,25 @@ cases_deaths <-
161168

162169
</details>
163170

164-
After having downloaded and cleaned the data in `cases_deaths`, we plot
165-
a subset of the states, noting the actual forecast date:
171+
After downloading and cleaning the cases and deaths data, we can plot
172+
a subset of the states, marking the desired forecast date:
166173

167174
<details>
168175
<summary>
169176
Plot
170177
</summary>
171178

172179
``` r
180+
used_locations <- c("ca", "ma", "ny", "tx")
173181
forecast_date_label <-
174182
tibble(
175183
geo_value = rep(used_locations, 2),
176184
.response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
177185
dates = rep(forecast_date - 7 * 2, 2 * length(used_locations)),
178186
heights = c(rep(150, 4), rep(0.75, 4))
179187
)
180-
processed_data_plot <-
181-
covid_case_death_rates |>
188+
189+
covid_case_death_rates |>
182190
filter(geo_value %in% used_locations) |>
183191
autoplot(
184192
case_rate,
@@ -204,13 +212,13 @@ processed_data_plot <-
204212

205213
<img src="man/figures/README-show-processed-data-1.png" width="90%" style="display: block; margin: auto;" />
206214

207-
To make a forecast, we will use a “canned” simple auto-regressive
215+
To make a forecast, we will use a simple “canned” auto-regressive
208216
forecaster to predict the death rate four weeks into the future using
209-
lagged[^2] deaths and cases
217+
lagged[^2] deaths and cases.
210218

211219
``` r
212220
four_week_ahead <- arx_forecaster(
213-
cases_deaths |> filter(time_value <= forecast_date),
221+
covid_case_death_rates |> filter(time_value <= forecast_date),
214222
outcome = "death_rate",
215223
predictors = c("case_rate", "death_rate"),
216224
args_list = arx_args_list(
@@ -221,31 +229,31 @@ four_week_ahead <- arx_forecaster(
221229
)
222230
four_week_ahead
223231
#> ══ A basic forecaster of type ARX Forecaster ════════════════════════════════
224-
#>
232+
#>
225233
#> This forecaster was fit on 2025-02-10 12:09:58.
226-
#>
234+
#>
227235
#> Training data was an <epi_df> with:
228236
#> • Geography: state,
229237
#> • Time type: day,
230238
#> • Using data up-to-date as of: 2022-01-01.
231239
#> • With the last data available on 2021-08-01
232-
#>
240+
#>
233241
#> ── Predictions ──────────────────────────────────────────────────────────────
234-
#>
242+
#>
235243
#> A total of 4 predictions are available for
236244
#> • 4 unique geographic regions,
237245
#> • At forecast date: 2021-08-01,
238246
#> • For target date: 2021-08-29,
239-
#>
247+
#>
240248
```
241249

242-
In this case, we have used 0-3 days, a week, and two week lags for the
243-
case rate, while using only zero, one and two weekly lags for the death
244-
rate (as predictors). The result `four_week_ahead` is both a fitted
250+
In our model setup, we are defining as our predictors case rate lagged 0-3 days, one week, and two weeks, and death rate lagged 0-2 weeks.
251+
The result `four_week_ahead` is both a fitted
245252
model object which could be used any time in the future to create
246-
different forecasts, as well as a set of predicted values (and
253+
different forecasts, and a set of predicted values (and
247254
prediction intervals) for each location 28 days after the forecast date.
248-
Plotting the prediction intervals on our subset above[^3]:
255+
256+
Plotting the prediction intervals on the true values for our location subset[^3]:
249257

250258
<details>
251259
<summary>
@@ -275,28 +283,29 @@ forecast_plot <-
275283

276284
<img src="man/figures/README-show-single-forecast-1.png" width="90%" style="display: block; margin: auto;" />
277285

278-
And as a tibble of quantile level-value pairs:
286+
And as a tibble of quantile-value pairs:
279287

280288
``` r
281289
four_week_ahead$predictions |>
282290
select(-.pred) |>
283291
pivot_quantiles_longer(.pred_distn)
284292
#> # A tibble: 20 × 5
285293
#> geo_value values quantile_levels forecast_date target_date
286-
#> <chr> <dbl> <dbl> <date> <date>
287-
#> 1 ca 0.199 0.1 2021-08-01 2021-08-29
288-
#> 2 ca 0.285 0.25 2021-08-01 2021-08-29
289-
#> 3 ca 0.345 0.5 2021-08-01 2021-08-29
290-
#> 4 ca 0.405 0.75 2021-08-01 2021-08-29
291-
#> 5 ca 0.491 0.9 2021-08-01 2021-08-29
292-
#> 6 ma 0.0285 0.1 2021-08-01 2021-08-29
294+
#> <chr> <dbl> <dbl> <date> <date>
295+
#> 1 ca 0.199 0.1 2021-08-01 2021-08-29
296+
#> 2 ca 0.285 0.25 2021-08-01 2021-08-29
297+
#> 3 ca 0.345 0.5 2021-08-01 2021-08-29
298+
#> 4 ca 0.405 0.75 2021-08-01 2021-08-29
299+
#> 5 ca 0.491 0.9 2021-08-01 2021-08-29
300+
#> 6 ma 0.0285 0.1 2021-08-01 2021-08-29
293301
#> # ℹ 14 more rows
294302
```
295303

296-
The black dot gives the median prediction, while the blue intervals give
304+
The orange dot gives the predicted median, and the blue intervals give
297305
the 25-75%, the 10-90%, and 2.5-97.5% inter-quantile ranges[^4]. For
298306
this particular day and these locations, the forecasts are relatively
299-
accurate, with the true data being at least within the 10-90% interval.
307+
accurate, with the true data being at worst within the 10-90% interval.
308+
300309
A couple of things to note:
301310

302311
1. Our methods are primarily direct forecasters; this means we don’t
@@ -310,12 +319,11 @@ A couple of things to note:
310319
## Getting Help
311320

312321
If you encounter a bug or have a feature request, feel free to file an
313-
[issue on our github
322+
[issue on our GitHub
314323
page](https://github.com/cmu-delphi/epipredict/issues). For other
315324
questions, feel free to reach out to the authors, either via this
316-
[contact
317-
form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
318-
email, or the Insightnet slack.
325+
[contact form](https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform),
326+
email, or the InsightNet Slack.
319327

320328
[^1]: This makes it so that any given day of the processed time-series
321329
only depends on the previous week, which means that we avoid leaking

0 commit comments

Comments
 (0)