Return google symptoms to using current pandas version, with mitigations #1497

nmdefries · 2022-01-27T20:51:40Z

Description

Followup to #1496.

Adapt pipeline to work correctly with new version of pandas. Add test to check for nulled-out rows. Fails when using pandas 1.4.0 and passes when using 1.3.5, which matches expected behavior.

Turn on numeric_only=True to suppress future deprecation warning.

Changelog

setup.py
pull.py
geo.py

Fixes

Replace deprecated DataFrame.append().
Explicitly convert dbdate to datetime64 to prevent data from being nulled out.

Bug probably related to the datetime-like changes in 1.4.0 and an issue they fixed, pandas-dev/pandas#42921

dshemetov · 2022-09-30T20:08:30Z

google_symptoms/delphi_google_symptoms/geo.py

@@ -37,7 +37,7 @@ def generate_transition_matrix(geo_res):
    if geo_res == "hrr":
        map_df["population"] = map_df["population"] *  map_df["weight"]

-    aggregated_pop = map_df.groupby(geo_res).sum().reset_index()
+    aggregated_pop = map_df.groupby(geo_res).sum(numeric_only=True).reset_index()


Wondering about the numeric_only=True. In the function arg type hint, it says there's no default, but in the docs below that it says it defaults to bool=True. Guessing that it actually tries to use NANs by default?

I changed this because I'm getting a warning about to-be-deprecated behavior. Thought I'd just throw it in here.

FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.

The current default does actually seem to be numeric_only=True

Ah thanks for that! Curious why they're moving away from numeric_only=True.

krivard

All good so long as we expect to only ever iterate over a small number of timestamps

google_symptoms/delphi_google_symptoms/geo.py

google_symptoms/delphi_google_symptoms/pull.py

google_symptoms/tests/test_data/state_data_20220916-20220924.csv

nmdefries · 2022-10-04T17:22:44Z

@krivard This is ready to merge.

nmdefries added 6 commits January 27, 2022 15:40

replace deprecated pandas functions

29e15e7

convert datetime type

3c25127

return to current pandas version

5dec58a

Merge branch 'main' into ndefries/gs-deprecated-pandas-fns

dd36917

test no rows dropped during preprocess

88dd5d3

Merge branch 'main' into ndefries/gs-deprecated-pandas-fns

beb0f4c

nmdefries marked this pull request as ready for review September 30, 2022 17:40

nmdefries added 3 commits September 30, 2022 13:59

add new test file

7d4d762

explicitly set sum::numeric_only to suppress warning

ab23148

make new test file smaller

2e9a978

nmdefries requested a review from krivard September 30, 2022 18:12

dshemetov reviewed Sep 30, 2022

View reviewed changes

krivard approved these changes Oct 3, 2022

View reviewed changes

google_symptoms/delphi_google_symptoms/geo.py Outdated Show resolved Hide resolved

google_symptoms/delphi_google_symptoms/pull.py Show resolved Hide resolved

google_symptoms/tests/test_data/state_data_20220916-20220924.csv Outdated Show resolved Hide resolved

nmdefries added 2 commits October 3, 2022 16:26

preallocate output dfs list and concat outside loop for speed

052bdbc

use existing test data for test_null_rows

1872b81

krivard merged commit 2d9df04 into main Oct 17, 2022

krivard deleted the ndefries/gs-deprecated-pandas-fns branch October 17, 2022 13:23

krivard mentioned this pull request Oct 19, 2022

Release covidcast-indicators 0.3.25 #1709

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return google symptoms to using current pandas version, with mitigations #1497

Return google symptoms to using current pandas version, with mitigations #1497

nmdefries commented Jan 27, 2022 •

edited

Loading

dshemetov Sep 30, 2022

nmdefries Sep 30, 2022

nmdefries Sep 30, 2022

dshemetov Sep 30, 2022

krivard left a comment

nmdefries commented Oct 4, 2022

Return google symptoms to using current pandas version, with mitigations #1497

Return google symptoms to using current pandas version, with mitigations #1497

Conversation

nmdefries commented Jan 27, 2022 • edited Loading

Description

Changelog

Fixes

dshemetov Sep 30, 2022

Choose a reason for hiding this comment

nmdefries Sep 30, 2022

Choose a reason for hiding this comment

nmdefries Sep 30, 2022

Choose a reason for hiding this comment

dshemetov Sep 30, 2022

Choose a reason for hiding this comment

krivard left a comment

Choose a reason for hiding this comment

nmdefries commented Oct 4, 2022

nmdefries commented Jan 27, 2022 •

edited

Loading