-
Notifications
You must be signed in to change notification settings - Fork 16
Return google symptoms to using current pandas version, with mitigations #1497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -37,7 +37,7 @@ def generate_transition_matrix(geo_res): | |||
if geo_res == "hrr": | |||
map_df["population"] = map_df["population"] * map_df["weight"] | |||
|
|||
aggregated_pop = map_df.groupby(geo_res).sum().reset_index() | |||
aggregated_pop = map_df.groupby(geo_res).sum(numeric_only=True).reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering about the numeric_only=True
. In the function arg type hint, it says there's no default, but in the docs below that it says it defaults to bool=True. Guessing that it actually tries to use NANs by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this because I'm getting a warning about to-be-deprecated behavior. Thought I'd just throw it in here.
FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated.
In a future version, numeric_only will default to False. Either specify numeric_only or select
only columns which should be valid for the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current default does actually seem to be numeric_only=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for that! Curious why they're moving away from numeric_only=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good so long as we expect to only ever iterate over a small number of timestamps
google_symptoms/tests/test_data/state_data_20220916-20220924.csv
Outdated
Show resolved
Hide resolved
@krivard This is ready to merge. |
Description
Followup to #1496.
Adapt pipeline to work correctly with new version of pandas. Add test to check for nulled-out rows. Fails when using pandas 1.4.0 and passes when using 1.3.5, which matches expected behavior.
Turn on
numeric_only=True
to suppress future deprecation warning.Changelog
setup.py
pull.py
geo.py
Fixes
DataFrame.append()
.dbdate
todatetime64
to prevent data from being nulled out.Bug probably related to the datetime-like changes in 1.4.0 and an issue they fixed, pandas-dev/pandas#42921