Skip to content

fix logger bug for batch issue upload #622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 12, 2021
Merged

fix logger bug for batch issue upload #622

merged 12 commits into from
Jul 12, 2021

Conversation

TaraLakdawala
Copy link

@TaraLakdawala TaraLakdawala commented Jun 30, 2021

change logger.warning to logger.info to avoid error

still need to add/update unit and integration tests to catch this issue in the future

@TaraLakdawala TaraLakdawala linked an issue Jun 30, 2021 that may be closed by this pull request
@TaraLakdawala TaraLakdawala changed the title changed warning to info fix logger bug for batch issue upload Jun 30, 2021
@krivard
Copy link
Contributor

krivard commented Jul 1, 2021

The line referred to in the error message is actually this one:

logger.info('processing csv files from issue date: "' + str(issue_date) + '", directory', path)

but it looks like the line you're currently repairing has the same problem. So probably just extend the fix to both locations? and hopefully the test you write will catch either of them.

@TaraLakdawala
Copy link
Author

Not sure how I missed that, but I've extended the fix to line 95. Hopefully, that should cover everything!

Testing the logger is turning out to be a bit of a challenge, so we will open a separate issue to track that.

@TaraLakdawala TaraLakdawala marked this pull request as ready for review July 1, 2021 18:25
@sgratzl sgratzl added the acquisition changes acquisition logic label Jul 2, 2021
if issuedir_match and os.path.isdir(path):
if issuedir_match:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting -- what will happen if path is a text file called issue_12345678? will the find_csv_files call on line 96 handle that gracefully, or crash?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a try on this approach by picking up the issue_12345 by a text filed called issue_12345, here is some code of my approach:

path='prefix/to/the/data/issue_20200408.csv' 
issuedir_match = CsvImporter.PATTERN_ISSUE_DIR.match(('.').join(path.split('.')[:-1]))

it will pass on the tests but will fail on integration tests. Because the change of form of issue_path will effect file load_sensors.py, so it will need some extra changes if I want to use the text file form as issue_123456.csv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i meant that we want the following:

  1. if someone mistakenly creates a file which matches CsvImporter.PATTERN_ISSUE_DIR but is not a directory
  2. and we try to load the directory containing that bad file as a batch import
  3. then the code should ignore the bad file

This was originally accomplished using the and os.path.isdir(path) clause. Without that clause, will the code crash in the above scenario?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! I previously hadn't make dir path to a valid dir so the test won't pass. I already add back the check code and add one more test of the invalid dir path. One more question, can I know why in the function find_csv_files check the csv path(the path format) instead to check the dir path? Should I add thisos.path.isdir(path) also in the function find_csv_files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both find_issue_specific_csv_files and find_csv_files check that the paths they produce match a regular expression because the regex is used to extract information from the path (issue, source, time_value, geo_type, and signal).

find_issue_specific_csv_files checks that the paths it produces are directories, because we've fed it bad input before by mistake 😉 . It doesn't get run very often, and each run has to be configured by hand, which means it's easy to accidentally do it wrong.

We run find_csv_files 24 times a day, so if we ever did accidentally feed it a filesystem that had directories with names that should have been normal files, or normal files with names that should have been directories, we would have fixed it in early summer 2020. All the things that feed find_csv_files are automated and less susceptible to human error.

krivard
krivard previously approved these changes Jul 8, 2021
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -1,4 +1,8 @@
# start with the `delphi_python` image
FROM delphi_python

# copy over all source files
COPY repos repos
RUN chmod -R o+r repos/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sgratzl I added this to the Dockerfile to avoid having to repeatedly rebuild both delphi_python and delphi_web_python during development. Will this break CI or can I leave it in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you just add this as a volume into the docker container? then you just have to restart the container. At least that is what I'm doing.

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still good, just dropped the unnecessary docker edit 👍

@krivard krivard merged commit a09889f into dev Jul 12, 2021
@krivard krivard deleted the logging-patch branch July 12, 2021 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acquisition changes acquisition logic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repair logging statement in find_issue_specific_csv_files
4 participants