Skip to content

AttributeError: 'Index' object has no attribute 'head' - Regression in 2.15.* #1327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mwoods-familiaris opened this issue May 10, 2022 · 2 comments · Fixed by #1333
Closed
Assignees
Labels
bug Something isn't working

Comments

@mwoods-familiaris
Copy link

mwoods-familiaris commented May 10, 2022

Describe the bug

Regression of #1188 in awswrangler 2.15.*

wr.s3.to_parquet fails when writing a DataFrame with an an ExtensionDType index resulting in error

AttributeError: 'Index' object has no attribute 'head'

Specifically, the failure happens when using pandas==1.4.* and awswrangler==2.15.*. This behavior is consistent on both pyarrow==5.0.0 and pyarrow==7.0.0. Appears to be a regression in the 2.15.0 release, as works fine when awswrangler==2.14.*

How to Reproduce

import pandas as pd
import awswrangler as wr


df = pd.DataFrame(
    {"col1": [1, 2, 3], "col2": [1, 2, 3]}, dtype=pd.Int64Dtype()
).set_index("col1")

wr.s3.to_parquet(
    df,
    path="s3://bucket/tmp.parquet",
    index=True,
    dataset=False,
)
# throws error

Expected behavior

Successful write of parquet file to S3 location

Your project

No response

Screenshots

No response

OS

Linux / Debian 10.12

Python version

3.9

AWS DataWrangler version

2.15.0, 2.15.1

Additional context

Pandas 1.4.2

@mwoods-familiaris mwoods-familiaris added the bug Something isn't working label May 10, 2022
@kukushking kukushking self-assigned this May 12, 2022
@kukushking
Copy link
Contributor

There's all sorts of strange issues when using extension types in indexes in pandas 1.4.* i.e. this as well.

Let's see if above fix helps with this particular case & doesn't break anything else

@kukushking kukushking linked a pull request May 13, 2022 that will close this issue
@updiversity
Copy link

Hi,

it looks like it continues to fail in the case the index has no name.

import pandas as pd
import awswrangler as wr

df = pd.DataFrame(
    {"col1": [1, 2, 3], "col2": [1, 2, 3]}, dtype=pd.Int64Dtype()
).set_index("col1")

df.index.name = None

wr.s3.to_parquet(
    df,
    path="s3://bucket/tmp.parquet",
    index=True,
    dataset=False,
)

# throws error
```

This is a common case when dealing with df generated with user-functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants