Skip to content

Visual checks: CPR num tests and test positivity at the county level #1513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nmdefries opened this issue Feb 4, 2022 · 9 comments · Fixed by #1548
Closed

Visual checks: CPR num tests and test positivity at the county level #1513

nmdefries opened this issue Feb 4, 2022 · 9 comments · Fixed by #1548
Assignees

Comments

@nmdefries
Copy link
Contributor

Originating Asana task with line plots and choropleths for hospital admissions, test volume, and test positivity.

Outstanding question: Should we suppress test positivity for small counts? Small counts cause the signal to be highly variable.

Number of tests and test positivity at the county level need further visualization, stratified in some way to make plots easier to interpret.

@nmdefries nmdefries self-assigned this Feb 4, 2022
@nmdefries
Copy link
Contributor Author

Here are some additional county-level test positivity visualizations, broken down by high-medium-low test volume.

@ryantibs @krivard Let me know if any other plots would be useful.

In terms of variability of individual signals, things look okay (by eyeball) by the ~6th %ile mark. But if we used that as a threshold, we'd be discarding a lot of data. Is there precedent for this from any of the other indicators?

@krivard
Copy link
Contributor

krivard commented Feb 16, 2022

Can you produce the following:

x axis y axis line color
day number of counties with test volume <=Z Z in 1, 2, 3, 4, 5, 10, 20, 50
day percentage of available counties with test volume <=Z Z in 1, 2, 3, 4, 5, 10, 20, 50

where "available counties" are all counties with a numeric value reported for that day

if we used that as a threshold, we'd be discarding a lot of data. Is there precedent for this from any of the other indicators?

There's definitely precedent; almost all our sample-based indicators do not report if a minimum sample size is not met. For testing, test volume === sample size.

@nmdefries
Copy link
Contributor Author

@krivard
Copy link
Contributor

krivard commented Feb 18, 2022

My gut feeling is to go with a threshold of 6 (ie drop everything with 5 or fewer total tests). That would give us a worst-case minimum nonzero test positivity of ~16%, which is higher than I'd like, but I also don't like the idea of suppressing more than 20% of our data even if it's only during the slow times.

@ryantibs thoughts?

@ryantibs
Copy link
Member

As a working solution to get us to move forward: this is fine with me. Thanks!

@nmdefries
Copy link
Contributor Author

Most of the time (when not backfilling new signals), the pipeline processes a single new spreadsheet, for the current day.

Within a spreadsheet, test positivity is reported for a time period that overlaps with but is not the same as the period that test volume is reported for. For example, the 2022-01-07 spreadsheet reports test positivity for Dec 29-Jan 4 and test volume for Dec 25-31. We report these as 7-day averages assigned to the last day in the range, so positivity would be for Jan 4 and test volume for Dec 31.

This means that sample size (test volume) values aren't the most appropriate to use for thresholding test positivity values found in the same spreadsheet. Potential approaches:

  • Do it anyway.
    • The simplest approach. The time periods are only a few days different.
  • Don't report new positivity values right away. Wait until test volume values for the right date range are published (4 days later), then threshold and publish positivity.
    • This requires re-processing the last few days of spreadsheets. Without additional logic, will add duplicate entries to the API.
    • It's possible that a corresponding time period for test volume will never exist since spreadsheets aren't published on the weekend.
  • Do something else?

@krivard
Copy link
Contributor

krivard commented Feb 21, 2022

oh yuck, i'd forgotten about that.

@ryantibs this means that we also can't easily fill in testing volume for the sample_size column in test positivity either. I'll add this as a discussion item for the next Leads meeting.

@krivard
Copy link
Contributor

krivard commented Feb 24, 2022

Discussion of decision and alternatives in PRD

TL;DR: given test positivity reference date X and test volume reference date Y coming from a single CPR file,

  • generate test positivity signal files named for date X containing value from test positivity at X and stdev/sample_size from test volume at Y
  • censor based on test volume at Y
  • document this choice obsessively in code and API docs

@nmdefries
Copy link
Contributor Author

Add dsew keyword for searchability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants