Skip to content

nhsn doesn't appear on covidcast_epidata() source or signal lists #300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brookslogan opened this issue Jan 29, 2025 · 10 comments · Fixed by #306
Closed

nhsn doesn't appear on covidcast_epidata() source or signal lists #300

brookslogan opened this issue Jan 29, 2025 · 10 comments · Fixed by #306
Assignees

Comments

@brookslogan
Copy link
Contributor

but it seems from doc pages (nhsn, hhs) that it should appear? This might be an upstream metadata issue.

library(epidatr)
#> ! epidatr cache is being used (set env var EPIDATR_USE_CACHE=FALSE if not
#>   intended).
#> ℹ The cache directory is ~/.cache/R/epidatr.
#> ℹ The cache will be cleared after 14 days and will be pruned if it exceeds 4096
#>   MB.
#> ℹ The log of cache transactions is stored at ~/.cache/R/epidatr/logfile.txt.
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
cce <- covidcast_epidata()
"hhs" %in% names(cce$sources)
#> [1] TRUE
"nhsn" %in% names(cce$sources)
#> [1] FALSE
"hhs" %in% (cce$signals %>% as_tibble() %>% .$source)
#> [1] TRUE
"nhsn" %in% (cce$signals %>% as_tibble() %>% .$source)
#> [1] FALSE

Created on 2025-01-29 with reprex v2.1.1

@melange396
Copy link
Contributor

nhsn is definitely in the metadata... you can see it listed as an available source in epivis, and the following python returns "True":

import requests ; print('nhsn' in {m['data_source'] for m in requests.get("https://api.delphi.cmu.edu/epidata/covidcast_meta/").json()['epidata']})

perhaps this is a caching issue?

@brookslogan
Copy link
Contributor Author

@melange396 this isn't based on covidcast_meta but covidcast/meta. Doesn't seem to be in the latter.

@melange396
Copy link
Contributor

Ah, ok! I should have checked to see which of the two covidcast meta endpoints you were using, but i presumed it was /covidcast_meta because we added cacheability to that one specifically for the old covidcast R client (kind of a long story, but it was to ensure CRAN compliance).

The endpoint you are using is restricted to only the signals that have individual write-ups in the "signals" google doc (which includes some description text and other details) -- and NHSN has apparently not been added to that doc yet. I will see what i can do about getting that attended to.

If you are going to continue using this metadata endpoint, you might consider adding accessors in the epidatr client for some of the attributes it provides, like description, has_sample_size, has_stderr, is_cumulative, is_smoothed, is_weighted, active, and format.

@melange396
Copy link
Contributor

4 NHSN signals went into the description metadata with cmu-delphi/delphi-epidata#1599, and will be publicly available as soon as a delphi-epidata release is done

@melange396
Copy link
Contributor

This should be fixed on the server-side now... Can you verify?

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 4, 2025

It's sort of fixed, sort of more broken client-side:

library(epidatr)
#> ! epidatr cache is being used (set env var EPIDATR_USE_CACHE=FALSE if not
#>   intended).
#> ℹ The cache directory is ~/.cache/R/epidatr.
#> ℹ The cache will be cleared after 14 days and will be pruned if it exceeds 4096
#>   MB.
#> ℹ The log of cache transactions is stored at ~/.cache/R/epidatr/logfile.txt.
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
cce <- covidcast_epidata()
"hhs" %in% names(cce$sources)
#> [1] TRUE
"nhsn" %in% names(cce$sources)
#> [1] TRUE
"hhs" %in% (cce$signals %>% as_tibble() %>% .$source)
#> Error in `map_chr()` at epidatr/R/covidcast.R:80:3:
#> ℹ In index: 457.
#> ℹ With name: nhsn:confirmed_admissions_covid_ew.
#> Caused by error:
#> ! Result must be length 1, not 0.
"nhsn" %in% (cce$signals %>% as_tibble() %>% .$source)
#> Error in `map_chr()` at epidatr/R/covidcast.R:80:3:
#> ℹ In index: 457.
#> ℹ With name: nhsn:confirmed_admissions_covid_ew.
#> Caused by error:
#> ! Result must be length 1, not 0.
cce$sources
#> $chng
#> [1] "Change Healthcare"
#> [1] "chng"
#> [1] "Change Healthcare is a healthcare technology company that aggregates medical claims data from many healthcare providers. This source includes aggregated counts of claims with confirmed COVID-19 or COVID-related symptoms. All claims data has been de-identified in accordance with HIPAA privacy regulations. "
#> # A tibble: 8 × 2
#>   signal                        short_description                               
#>   <chr>                         <chr>                                           
#> 1 smoothed_outpatient_cli       Estimated percentage of outpatient doctor visit…
#> 2 smoothed_adj_outpatient_cli   Estimated percentage of outpatient doctor visit…
#> 3 smoothed_outpatient_covid     COVID-Confirmed Doctor Visits                   
#> 4 smoothed_adj_outpatient_covid COVID-Confirmed Doctor Visits                   
#> 5 smoothed_outpatient_flu       Estimated percentage of outpatient doctor visit…
#> 6 smoothed_adj_outpatient_flu   Estimated percentage of outpatient doctor visit…
#> 7 7dav_inpatient_covid          Ratio of inpatient hospitalizations associated …
#> 8 7dav_outpatient_covid         Ratio of outpatient doctor visits with confirme…
# [...]
# [...]
# [...]
#> $nssp
#> [1] "National Syndromic Surveillance Program"
#> [1] "nssp"
#> [1] "The National Syndromic Surveillance Program (NSSP) is an effort to track epidemiologically relevant conditions. This dataset in particular tracks emergency department (ED) visits arising from a subset of influenza-like illnesses, specifically influenza, COVID-19, and respiratory syncytial virus (RSV)."
#> # A tibble: 8 × 2
#>   signal                           short_description                            
#>   <chr>                            <chr>                                        
#> 1 pct_ed_visits_covid              Percent of ED visits that had a discharge di…
#> 2 pct_ed_visits_influenza          Percent of ED visits that had a discharge di…
#> 3 pct_ed_visits_rsv                Percent of ED visits that had a discharge di…
#> 4 pct_ed_visits_combined           Percent of ED visits that had a discharge di…
#> 5 smoothed_pct_ed_visits_covid     3-week moving average of percent of ED visit…
#> 6 smoothed_pct_ed_visits_influenza 3-week moving average of percent of ED visit…
#> 7 smoothed_pct_ed_visits_rsv       3-week moving average of percent of ED visit…
#> 8 smoothed_pct_ed_visits_combined  3-week moving average of percent of ED visit…
#> 
#> $nhsn
#> [1] "National Healthcare Safety Network"
#> [1] "nhsn"
#> [1] "The National Healthcare Safety Network (NHSN) is the nation’s most widely used healthcare-associated infection tracking system. "
#> Error in `map_chr()` at epidatr/R/covidcast.R:80:3:
#> ℹ In index: 1.
#> ℹ With name: confirmed_admissions_covid_ew.
#> Caused by error:
#> ! Result must be length 1, not 0.
cce$signals
#> Error in `map_chr()` at epidatr/R/covidcast.R:80:3:
#> ℹ In index: 457.
#> ℹ With name: nhsn:confirmed_admissions_covid_ew.
#> Caused by error:
#> ! Result must be length 1, not 0.

Created on 2025-03-04 with reprex v2.1.1

@brookslogan
Copy link
Contributor Author

The current issue, at least for printing the signal list, seems to be the lack of a "value_label" entry for nhsn signals.

cce$signals %>% purrr::map("value_label")
#> $`chng:smoothed_outpatient_cli`
#> [1] "Value"
#> 
#> $`chng:smoothed_adj_outpatient_cli`
#> [1] "Value"
# [...]
# [...]
# [...]
#> $`nssp:smoothed_pct_ed_visits_rsv`
#> [1] "Percentage"
#> 
#> $`nssp:smoothed_pct_ed_visits_combined`
#> [1] "Percentage"
#> 
#> $`nhsn:confirmed_admissions_covid_ew`
#> NULL
#> 
#> $`nhsn:hosprep_confirmed_admissions_covid_ew`
#> NULL
#> 
#> $`nhsn:confirmed_admissions_covid_ew_prelim`
#> NULL
#> 
#> $`nhsn:hosprep_confirmed_admissions_covid_ew_prelim`
#> NULL
#> 
#> $`nhsn:confirmed_admissions_flu_ew`
#> NULL
#> 
#> $`nhsn:hosprep_confirmed_admissions_flu_ew`
#> NULL
#> 
#> $`nhsn:confirmed_admissions_flu_ew_prelim`
#> NULL
#> 
#> $`nhsn:hosprep_confirmed_admissions_flu_ew_prelim`
#> NULL
#> 
#> $`nhsn:confirmed_admissions_rsv_ew`
#> NULL
#> 
#> $`nhsn:hosprep_confirmed_admissions_rsv_ew`
#> NULL
#> 
#> $`nhsn:confirmed_admissions_rsv_ew_prelim`
#> NULL
#> 
#> $`nhsn:hosprep_confirmed_admissions_rsv_ew_prelim`
#> NULL

@carlynvandyke
Copy link

The current issue, at least for printing the signal list, seems to be the lack of a "value_label" entry for nhsn signals.

I just added value labels in the spreadsheet.

@dsweber2
Copy link
Contributor

https://cran.r-project.org/web/checks/check_results_epidatr.html ok, so I guess we should check back on this next week; we've got until the 28th to have it functional.

@melange396
Copy link
Contributor

This newer problem should go away once cmu-delphi/delphi-epidata#1622 is approved and then released, but its not a bad idea to handle missing fields more gracefully, in case something like this happens again in the future.

It looks like you can specify a .default argument to these calls to use an empty string or something similar instead of erroring out:

epidatr/R/covidcast.R

Lines 72 to 83 in 0026856

tib$source <- unname(map_chr(x, "source"))
tib$signal <- unname(map_chr(x, "signal"))
tib$name <- unname(map_chr(x, "name"))
tib$active <- unname(map_lgl(x, "active"))
tib$short_description <- unname(map_chr(x, "short_description"))
tib$description <- unname(map_chr(x, "description"))
tib$time_type <- unname(map_chr(x, "time_type"))
tib$time_label <- unname(map_chr(x, "time_label"))
tib$value_label <- unname(map_chr(x, "value_label"))
tib$format <- unname(map_chr(x, "format"))
tib$category <- unname(map_chr(x, "category"))
tib$high_values_are <- unname(map_chr(x, "high_values_are"))

epidatr/R/covidcast.R

Lines 187 to 191 in 0026856

tib$source <- unname(map_chr(x, "source"))
tib$name <- unname(map_chr(x, "name"))
tib$description <- unname(map_chr(x, "description"))
tib$reference_signal <- unname(map_chr(x, "reference_signal"))
tib$license <- unname(map_chr(x, "license"))

dsweber2 added a commit that referenced this issue Mar 14, 2025
dsweber2 added a commit that referenced this issue Mar 18, 2025
Fixing #300 by giving default values for fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants