-
Notifications
You must be signed in to change notification settings - Fork 16
Doctor_visits patching code #1977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
doctor_visits/delphi_doctor_visits/download_claims_ftp_files.py
Outdated
Show resolved
Hide resolved
@@ -51,9 +51,13 @@ def change_date_format(name): | |||
name = '_'.join(split_name) | |||
return name | |||
|
|||
def download(ftp_credentials, out_path, logger): | |||
def download(ftp_credentials, out_path, logger, issue=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general thought: can we introduce using typing for the function parameters? I checked that typing was introduced as a part of the standard library at 3.5 so should be able to use it for this repo. As a newcomer it would be helpful to know the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aysim319 can you turn this request into an issue?
I know we mentioned testing in person yesterday and was wondering if you were planning to add a basic pytest for sanity checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some comments about adding example input/output.
Please add unit tests for download()
and get_latest_filename
that use the issue_date
arg to make sure that is working as desired. Merge at will once that's done.
Everything else looks good.
if not issue_date: | ||
current_time = datetime.datetime.now() | ||
else: | ||
current_time = datetime.datetime.strptime(issue_date, "%Y-%m-%d").replace(hour=23, minute=59, second=59) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (optional): since we're using this chunk in multiple places, consider factoring it out into a separate function. (2 duplicates is okay to leave, if we use this in a third place, we definitely should factor it out.)
To use this module, you need to specify the range of issue dates in params.json. | ||
|
||
It will generate data for that range of issue dates, and store them in batch issue format: | ||
[name-of-patch]/issue_[issue-date]/doctor-visits/actual_data_file.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: add example output
[name-of-patch]/issue_[issue-date]/doctor-visits/actual_data_file.csv | |
[name-of-patch]/issue_[issue-date]/doctor-visits/actual_data_file.csv | |
The run using the above example `params.json` will create a directory 'covidcast-indicators/doctor_visits/AprilPatch' containing: | |
AprilPatch/ | |
issue_20240420/ | |
doctor-visits/ | |
20240408_state_smoothed_adj_cli.csv | |
20240408_state_smoothed_cli.csv | |
... | |
issue_20240421/ | |
doctor-visits/ | |
20240409_state_smoothed_adj_cli.csv | |
20240408_state_smoothed_cli.csv | |
... |
question: do we have or do we need a helper script to put these files into the DB? That sounds useful. Due to the output dir structure defined here, this seems like it will need a specialized acquisition script. That would belong in |
Co-authored-by: nmdefries <[email protected]>
@nmdefries No we don't need new script to go from csv to db. Acquisition on epidata already includes code that allow files in this structure to get put into db using the flag |
Description
This change automates multi-day patching for
doctor_visits
indicators, producingcsv
files organized in batch issue format:[name-of-patch]/issue_[issue-date]/nssp/actual_data_file.csv
, making it easier to ingest large batches of patch data.Users start by specify in
params.json
the start issue and end issue of the patch, along with a directory where output is stored in. For example:the run patch.py will create a directory 'covidcast-indicators/doctor_visits/AprilPatch' containing: