Skip to content

BUG: pandas.DataFrame.index.map() works differently if debugpy debugger is attached #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ember91 opened this issue Nov 1, 2021 · 3 comments

Comments

@ember91
Copy link

ember91 commented Nov 1, 2021

Environment data

  • debugpy version: 1.51
  • OS and version: Ubuntu 20.04.3 LTS
  • Python version (& distribution if applicable, e.g. Anaconda): 3.9.7
  • Using VS Code or Visual Studio: VS Code
  • pandas version: 1.3.3

Actual behavior

debugpy crashes where neither CPython 3.9 nor pdb does. It crashes at line 5 in the example below with the stack trace:

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: map_func)
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
  File "/home/emil/test_error/test.py", line 5, in map_func (Current frame)
    if dt.month <= 2:
  File "/home/emil/test_error/test.py", line 17, in <module>
    print(f"Result: {df.index.map(map_func)}")

Expected behavior

It should print something similar to Result: Int64Index([2031, 2032], dtype='int64', name='Date'), which python3.9 and pdb does.

Steps to reproduce:

import pandas as pd


def map_func(dt):
    if dt.month <= 2:
        return dt.year + dt.month - 1
    return dt.year + dt.month


dict = {
    "Name": ["Tom", "Joseph"],
    "Date": ["2021-10-01", "2021-11-01"],
}
df = pd.DataFrame(dict)
df = df.set_index("Date")
df.index = pd.to_datetime(df.index)
print(f"Result: {df.index.map(map_func)}")

By looking at the pandas source code it first tries running map_func() with the index as argument, and if it fails an exception is caught and it calls the same method with every single element. But with debugpy the except Exception: is not caught for some reason. From pandas/core/indexes/extension.py:374-389:

    @doc(Index.map)
    def map(self, mapper, na_action=None):
        # Try to run function on index first, and then on elements of index
        # Especially important for group-by functionality
        try:
            result = mapper(self)

            # Try to use this result if we can
            if isinstance(result, np.ndarray):
                result = Index(result)

            if not isinstance(result, Index):
                raise TypeError("The map function must return an Index object")
            return result
        except Exception:
            return self.astype(object).map(mapper)
@int19h
Copy link
Contributor

int19h commented Nov 3, 2021

Do you mean that exception is reported? Exceptions that are caught can still be reported at the time they are raised; and if you have User Unhandled Exceptions checked in the Breakpoints pane, this will happen here, because the except block catching it is in library code (i.e. not in user code). However, you can still do Continue, and it should run normally.

To suppress the report, just uncheck the aforementioned checkbox. We've already changed it to be false by default in debugpy 1.5.1, but I think it might preserve the checked state from previous versions in VSCode UI.

@ember91
Copy link
Author

ember91 commented Nov 4, 2021

You're probably right. I can continue running after breaking. But what confuses me is that if I run:

try:
    raise RuntimeError()
except:
    print("Here!")

it doesn't break on line 2. That's why I thought this was a crash. But in your comment I think you explain why this happens the first example but not the second.

I believe this can be closed.

@fabioz
Copy link
Collaborator

fabioz commented Nov 4, 2021

That's because User Unhandled Exceptions are shown whenever an exception raised in the user code (which in this case happens on the first call to map_func) makes it into library code (which is what's happening here, so, the behavior seems to be correct).

On your second case, the exception is raised and caught in user code, so, it's not shown with this setting (it'll only be shown if Raised Exceptions is checked).

The solution is turning User Unhandled Exceptions off or just continuing the execution in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants