-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CI/BLD: Restrict ci/code_checks.sh to tracked repo files #36386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI/BLD: Restrict ci/code_checks.sh to tracked repo files #36386
Conversation
6b3bddb
to
590b54c
Compare
I don't think it's worth adding complexity here. This script is intended for our CI processes. For local development you can use pre-commit: https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#pre-commit |
The point is to use Moreover, why should we require more effort from developers to set up their own local checks? The "homemade" local checks will probably be very different from what is running on CI and thus they will generate a lot of false negatives (false check passes), as said above, and perhaps even some false positives (false check fails). We can easily kill two birds with one stone here. After all, what we're testing on CI is exactly the set of conditions we want the codebase to satisfy, isn't it? Finally, yes, this is adding a tiny bit of complexity, but it will also benefit the CI on its own: it helps make the CI process less brittle. Under no circumstances should the CI check files that are not part of the Git repo. For example, suppose that in the future a change is made to the CI process such that before the code checks are run, some files are generated first; as it is right now, |
These in particular are managed through the configuration file, so they won't differ from being run in pre-commit which has the added bonus of being cross platform |
Thanks @plammens for the PR, although just for the record
in ~ 1 year of contributing to pandas, this has never happened to me. Sometimes tests pass locally but fail during CI, but I've found black, isort, flake8 to pass/fail reliably |
Admittedly, the examples I made were pretty terrible 🙂; it's true that And maybe I expressed myself incorrectly: the problem is not checks that are run in the same exact configuration locally and in CI which produce different results (these are almost exclusively doctests and unit tests, and this PR doesn't fix that), the problem is running differently configured checks or running different checks (i.e. less checks) than those on CI, which is prone to happening if we require the developer to transcribe every single check as a precommit hook or whatever works for them locally. A quick Ctrl + F tells me that there are about 74 distinct checks being made by the A hypothetical (but maybe not-so-hypothetical 😉) example of what happens to me:
The obvious solution is to have some form of automation that runs all the necessary checks. But this already exists: it's That's why I believe it's beneficial to use a "centralized" checking script available to all developers that does the exact same checks as CI. And again, if you're not convinced by the "easier local checks" argument, the argument that this improves the CI process on its own still stands:
I still don't understand what's the downside to these changes 🤔. If you don't want to touch the CI script, here are some alternative ideas, just to throw them out:
By the way, the reason I used
|
I agree with @plammens on this one (havent looked at the PR itself, so im agreeing with this specific statement). I regularly get into a situation in which manually running flake8 passes but then the pre-commit flake8 produces a bunch of spurious complaints. Side-note: I recently added a "check" to the makefile that duplicates some of the checks in code_checks.sh. That's my bad, should be changed to call code_checks directly to keep the checks in sync. |
@plammens can you merge master |
Extract common code for checking a single file path.
The previous behaviour filtered out too many paths: any subdirectory whose relative path *contained* any of the ignored paths (which could be arbitrary strings) would be ignored. E.g., if PATHS_TO_IGNORE contained "foo", all of "./foo", "./spam/foo", "./spam/foo/eggs", "./barfoobaz", "./spam/foo.py" would get filtered out. On the other hand, individual files that *did* appear in the PAHTS_TO_IGNORE were *not* ignored. Now the behaviour should be a bit more robust. Ignored file pahts can be specified as relative paths or absolute paths (since they are all passed through os.path.abspath); any files below a subdirectory included in PATHS_TO_IGNORE will be filtered out, and so will any files which are explicitly mentioned in PATHS_TO_IGNORE.
1985b5b
to
90302ca
Compare
This flag controls whether individual files explicitly passed as arguments should override the --excluded-file-paths rule.
Previously, some of the checks in code_checks.sh ran unrestricted on all the contents of the repository root (recursively), so that if any files extraneous to the repo were present (e.g. a virtual environment directory), they were checked too, potentially causing many false positives when a developer runs ./ci/code_checks.sh . The checker invocations that were already scoped (i.e. they were already restricted, in one way or another, to the actual pandas code, e.g. by restricting the search to the `pandas` subfolder) have been left as-is, while those that weren't are now given an explicit list of files that are tracked in the repo.
90302ca
to
8611fe6
Compare
Is this still happening after #36412 ? EDITNevermind, they're still not pinned to the same version, sorry for the noise |
Hi @plammens Sorry for the delay. I've been busy, but also PRs which change multiple things aren't the easiest to review. There's a couple of proposed changes which I think we take and merge quickly if you open separate PRs for them:
Then this PR can be left to just discuss running the checks on tracked files |
Opened #37110 for this.
Will do this soon. (If I understand correctly, I should undo the changes to the |
That would be good, thanks! |
Is this PR still needed or is everything that we need in pre-commit now? |
Not everything is in pre-commit yet, but things are making their way there and I'd very much be inclined with moving as much as possible over (then they'll be cross-platform and'll provide faster feedback to devs). It would also allow us to reduce the complexity of
rather than
Anyway, massive thanks @plammens for having brought up the issue, and if you'd like to help with moving checks over to pre-commit, that'd be welcome! |
Closing this as it has been superseded by pre-commit configurations. |
git diff upstream/master -u -- "*.py" | flake8 --diff
Previously, some of the checks in
code_checks.sh
ran unrestricted on all thecontents of the repository root (recursively), so that if any files extraneous
to the repo were present (e.g. a virtual environment directory, or generated source files), they were
checked too, potentially causing many false positives when a developer runs
./ci/code_checks.sh
locally to check that the code is ready to be put in a PR.The checker invocations that were already scoped (i.e. they were already
restricted, in one way or another, to the actual pandas code, e.g. by
restricting the search to the
pandas
subfolder) have been left as-is,while those that weren't are now given an explicit list of files that are
tracked in the repo.