nssp pipeline code #1952

minhkhul · 2024-04-17T21:09:59Z

Description

Add 8 signals from source nssp

dsweber2

Ran the pipeline and it seems to be pulling correctly, and the fields that are generated make sense. Generally looks good, I have some minor cosmetic suggestions.

I guess the archive differ is the only part that really remains?

nssp/DETAILS.md

nssp/README.md

dsweber2 · 2024-04-18T19:24:55Z

nssp/README.md

Ran these as it suggests and things run fine. The linter is a little angry. Either way

dsweber2 · 2024-04-18T19:46:27Z

nssp/delphi_nssp/pull.py

+    limit = 50000  # maximum limit allowed by SODA 2.0
+    while True:
+        page = client.get("rdmq-nq56", limit=limit, offset=offset)
+        if not page:
+            break  # exit the loop if no more results
+        results.extend(page)
+        offset += limit


Suggested change

limit = 50000 # maximum limit allowed by SODA 2.0

while True:

page = client.get("rdmq-nq56", limit=limit, offset=offset)

if not page:

break # exit the loop if no more results

results.extend(page)

offset += limit

limit = 50_000

for ii in range(100):

page = client.get("rdmq-nq56", limit=limit, offset=offset)

if not page:

max_ii = ii

break # exit the loop if no more results

results.extend(page)

offset += limit

if max_ii == 100:

raise ValueError("client has pulled 100x the socrata limit")

This is probably fine, but while true freaks me out. Feel free to use or not

@dsweber2 why did you choose 100 here for the limit in your rewrite? I believe 5k is the per-page item limit, not the total limit. So theoretically we could get an infinite-page result.

100 was maybe too low, though it would correspond to 50,000,000 items. If we're pulling more than that it should be quite a while down the road, or something has gone wrong. (or I may be misunderstanding how item counts work).

nssp/delphi_nssp/run.py

Co-authored-by: David Weber <[email protected]>

nmdefries

This file seems to be unused: nssp/tests/test_data/page.txt

Some nits and style suggestions. Question about which geos we're reporting.

nssp/README.md

nssp/delphi_nssp/__main__.py

nssp/delphi_nssp/constants.py

nssp/delphi_nssp/pull.py

nssp/delphi_nssp/run.py

nssp/params.json.template

nssp/delphi_nssp/run.py

Co-authored-by: nmdefries <[email protected]>

… available sources and signals"

nssp/delphi_nssp/pull.py

Nssp staging

melange396 · 2024-05-13T17:58:04Z

Oh, good! I thought i remembered having a discussion about that but wasnt sure if it was for this or something else.

melange396

I'm gonna do a deeper look at the core code, but i did a first pass to see if i could narrow down what files i needed to inspect more closely; i knocked off 29/46 files that way!

Theres a few things that seem to be out of scope for this PR, like all the max-line-length=120 in the pylint configs from other indicators... Are they going to break the lint steps in other indicators?

ansible/templates/sir_complainsalot-params-prod.json.j2

nssp/setup.py

melange396 · 2024-05-14T03:00:41Z

notebooks/renv/activate.R

+local({
+
+  # the requested version of renv
+  version <- "1.0.7"


this file seems to be imported from some external source. will we need to do anything to keep it synced/up-to-date? i presume (while along side the notebooks/renv.lock) it should continue to "just work" until we need it to do something it cant already do?

I'm not 100% sure this is the right place to keep these notebooks to be honest, but I feel like the current set of notes about adding indicators are way too scattered. None of this folder is really intended to be automated.

renv/ and renv.lock are maintained via a cli in R, and allow for pinning versions of dependencies (in this case, the notebooks).

nmdefries · 2024-06-04T22:20:26Z

Statistical review from Roni and followup on slack:

I just finished reviewing this. Overall it looks great -- thank you all for your work on this! I do have some comments/answers/questions:

Nat: Looks like there is data that is not getting included in the aggregation step -- we'll want to think about what to do with that.

Was this resolved? If so, how?

Nat: What is the meaning of a missing value in the incoming data? Censored for privacy? 0? Too small a sample size to accurately report?

Was this resolved? If so, how?

Nat: Think about if we should do any filtering/cleaning, e.g. low sample size in covid tests causing high variability in test positivity rate.
Minh: There are some outliers in low population counties, creating stats like 50-100% of ER visits being covid/flu related in those places. There are around 10 of these outliers spread among each signals. This makes sense as the signals are percentages. So if county A has only one ER visit this week and that visit is about covid, then the data would show up as 100% of all ER visits in county A is covid-related.

This is a known problem but one we cannot solve for now. I am fairly sure that CDC would want us to maintain all the values, even if they are e.g. 100% or 50%. But we should alert to this in our documentation.

RE “Covid Lag analysis”, I think you have the wrong data viz here (it’s a copy of the one above it, but should be correlation vs. lag). Not crucial for releasing the signal, but we should fix it wherever you end up storing the analysis. Speaking of which: where do we usually store these analyses for new signals? Are they linked to from the documentation?

As you showed, there is a good number of discrepancies between original state-level and state-level reconstructed from lower geo-level. Did you figure out the full reason(s) for that? Are these possibly rounding errors? Something else? This too should be discussed in the documentation.

That's all I could find. To the substance of the signal: the correlations look awesome, esp. for flu -- this was far from assured! ED and hosp admissions are very different things, and come from very different reporting systems. But this bodes extremely well for estimating hosp trends without hosp reporting.

nmdefries

Looks good

nmdefries · 2024-06-04T22:42:08Z

notebooks/nssp/cor_dashboard.Rmd

@@ -0,0 +1,257 @@
+---
+title: "Correlation Analyses for COVID-19 Indicators"


note: I am not checking this or other notebooks items

nmdefries · 2024-06-04T23:05:33Z

nssp/delphi_nssp/run.py

+from .pull import pull_nssp_data
+
+
+def add_needed_columns(df, col_names=None):


suggestion (optional): to make this more robust, add assert to make sure that our set of missing column names doesn't include important ones (like geo_id and value).

nmdefries · 2024-06-05T16:13:02Z

Next steps:

check that this runs and output looks correct on staging
loop in Brian to get this set up in production
I will review the docs

* add make format to nssp Makefile * run make format on nssp

Out of date

nmdefries · 2024-06-10T11:24:59Z

From Roni, let's rename the signals by prefixing "visits" with "ed", e.g. pct_visits_covid to pct_ed_visits_covid.

This is ready to go whenever that is finished.

minhkhul · 2024-06-10T15:06:47Z

@nmdefries cool. I'll get that going.

melange396 · 2024-06-10T15:36:35Z

when merging this, please do not forget to do a squash

minhkhul added 4 commits March 18, 2024 16:35

to make nssp run in staging

d4ca5ca

add nssp to Jenkinsfile

11ff7d0

nssp_token name change

d76d6ce

et code

c85c5dd

minhkhul requested review from dsweber2 and nmdefries April 17, 2024 21:11

dsweber2 approved these changes Apr 18, 2024

View reviewed changes

minhkhul and others added 3 commits April 19, 2024 18:18

Update nssp/delphi_nssp/run.py

3014997

Co-authored-by: David Weber <[email protected]>

Update nssp/README.md

4a90591

Co-authored-by: David Weber <[email protected]>

Update nssp/DETAILS.md

638c51d

Co-authored-by: David Weber <[email protected]>

nmdefries requested changes Apr 19, 2024

View reviewed changes

minhkhul and others added 5 commits April 19, 2024 20:48

Update nssp/delphi_nssp/__main__.py

82367b4

Co-authored-by: nmdefries <[email protected]>

Update nssp/delphi_nssp/pull.py

68a6154

Co-authored-by: nmdefries <[email protected]>

Update nssp/delphi_nssp/run.py

8851bd6

Co-authored-by: nmdefries <[email protected]>

readme update

39e2cbd

column names mapping + signals name standardization to fit with other…

95bdac1

… available sources and signals"

dsweber2 reviewed Apr 23, 2024

View reviewed changes

nssp/delphi_nssp/pull.py Outdated Show resolved Hide resolved

minhkhul and others added 12 commits April 23, 2024 12:24

improve readability

583e24e

Add type_dict constant

971968e

more type_dict

de9ef62

add more unit test pull

900fcc9

data for unit test of pull

0678564

hrr + msa geos

38cd523

use enumerate for clarity

5807cdb

Merge pull request #1950 from cmu-delphi/nssp_staging

b4ec831

Nssp staging

set nssp sircal max_age to 13 days

e974afb

set nssp sircal max_age to 15 days, to account for nighttime run

85e7b8b

set nssp sircal max_age to 15 days, to account for nighttime run

2247e1b

add validation to params

2bfd5fc

minhkhul requested a review from nmdefries April 25, 2024 19:56

melange396 previously requested changes May 13, 2024

View reviewed changes

ansible/templates/sir_complainsalot-params-prod.json.j2 Outdated Show resolved Hide resolved

nssp/setup.py Outdated Show resolved Hide resolved

rebase woes and version consistency

67601aa

melange396 reviewed May 14, 2024

View reviewed changes

minhkhul and others added 3 commits May 14, 2024 01:17

Update nssp-params-prod.json.j2 min/max lag to 13

a79cff8

Update params.json.template min/max lag to 7 and 13

9c6f31b

missed column renames for geo_mapper, unneeded index

33a188e

dshemetov mentioned this pull request May 31, 2024

Weekday adjustment to use Clarabel instead of ECOS #1966

Merged

dsweber2 requested a review from nmdefries June 4, 2024 17:43

Merge branch 'main' into nssp

8a4cd18

nmdefries approved these changes Jun 5, 2024

View reviewed changes

nmdefries requested a review from melange396 June 5, 2024 16:08

dshemetov added 6 commits June 5, 2024 12:29

Merge branch 'main' into nssp

eb2f000

lint+fix: update from linter changes

24b25dd

* add make format to nssp Makefile * run make format on nssp

ci: update ci to lint nssp

8daefe6

lint: linter happy

ec39773

lint: pydocstyle happy

2e178a8

lint: pydocstyle happy

90081df

dsweber2 mentioned this pull request Jun 6, 2024

nssp documentation draft cmu-delphi/delphi-epidata#1439

Merged

3 tasks

minhkhul added 2 commits June 10, 2024 11:25

Resolved merge conflicts by accepting all incoming changes

91b759c

pct_visits to pct_ed_visits

355d65b

minhkhul merged commit ae6f011 into main Jun 10, 2024
16 checks passed

This was referenced Jul 10, 2024

Release covidcast-indicators 0.3.55 #1989

Closed

Release covidcast-indicators 0.3.55 #1991

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nssp pipeline code #1952

nssp pipeline code #1952

minhkhul commented Apr 17, 2024

dsweber2 left a comment

dsweber2 Apr 18, 2024

dsweber2 Apr 18, 2024

nmdefries Apr 19, 2024

dsweber2 Apr 22, 2024

nmdefries left a comment

melange396 commented May 13, 2024

melange396 left a comment

melange396 May 14, 2024

dsweber2 May 14, 2024

nmdefries commented Jun 4, 2024 •

edited

Loading

nmdefries left a comment

nmdefries Jun 4, 2024

nmdefries Jun 4, 2024 •

edited

Loading

nmdefries commented Jun 5, 2024

nmdefries commented Jun 10, 2024

minhkhul commented Jun 10, 2024

melange396 commented Jun 10, 2024

		@@ -0,0 +1,257 @@
		---
		title: "Correlation Analyses for COVID-19 Indicators"

		from .pull import pull_nssp_data


		def add_needed_columns(df, col_names=None):

nssp pipeline code #1952

nssp pipeline code #1952

Conversation

minhkhul commented Apr 17, 2024

Description

dsweber2 left a comment

Choose a reason for hiding this comment

dsweber2 Apr 18, 2024

Choose a reason for hiding this comment

dsweber2 Apr 18, 2024

Choose a reason for hiding this comment

nmdefries Apr 19, 2024

Choose a reason for hiding this comment

dsweber2 Apr 22, 2024

Choose a reason for hiding this comment

nmdefries left a comment

Choose a reason for hiding this comment

melange396 commented May 13, 2024

melange396 left a comment

Choose a reason for hiding this comment

melange396 May 14, 2024

Choose a reason for hiding this comment

dsweber2 May 14, 2024

Choose a reason for hiding this comment

nmdefries commented Jun 4, 2024 • edited Loading

nmdefries left a comment

Choose a reason for hiding this comment

nmdefries Jun 4, 2024

Choose a reason for hiding this comment

nmdefries Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

nmdefries commented Jun 5, 2024

nmdefries commented Jun 10, 2024

minhkhul commented Jun 10, 2024

melange396 commented Jun 10, 2024

nmdefries commented Jun 4, 2024 •

edited

Loading

nmdefries Jun 4, 2024 •

edited

Loading