Skip to content

DataFrame corrupted after improper column creation #3010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zetyang opened this issue Mar 11, 2013 · 5 comments · Fixed by #3018
Closed

DataFrame corrupted after improper column creation #3010

zetyang opened this issue Mar 11, 2013 · 5 comments · Fixed by #3018
Labels
Milestone

Comments

@zetyang
Copy link

zetyang commented Mar 11, 2013

I want to add a column to DataFrame using a list of arrays, but I forgot to change the 2d array into list first, so I got a "Wrong number of items passed" error.

After that, the df seemed to be corrupted, whenever I want to print it, it kept raising a "NoneType object is not iterable error".
While it could still be added new columns, the improperly assigned columns couldn't be deleted yet (also prompting the NoneType error).

This the code that could reproduce my problem:

df = DataFrame(np.ones((4,4)))
df['foo'] = np.ones((4,2)).tolist()  # OK, and that's what I should type
df['test'] = np.ones((4,2))  # the improper column creation
AsssertionError: Wrong number of items passed1 vs 2)

df
TypeError: 'NoneType' object is not iterable, u'occured at index foo'

del df['test']
TypeError: 'NoneType' object is not iterable
@jreback
Copy link
Contributor

jreback commented Mar 11, 2013

You realize you are creating an embedded list (what you are assigning to df['foo']
Is there a reason you are trying to put a 2-d array where you would normally put a 1-d?

In [15]: pd.Series(np.ones((4,2)).tolist())
Out[15]: 
0    [1.0, 1.0]
1    [1.0, 1.0]
2    [1.0, 1.0]
3    [1.0, 1.0]
dtype: object

@zetyang
Copy link
Author

zetyang commented Mar 12, 2013

Yes, you're right, I know your way does work and I should do that way. My point is that I wrongly assigned the column, and then realized that I made a mistake. When I went on my work, DataFrame corrupted. I used to thought that DataFrame would be robust to my mistake.

@zetyang
Copy link
Author

zetyang commented Mar 12, 2013

In [1]:  df._data.items
Out[1]:  [0,1,2,3,'foo', 'test']

loc = df._data.get_loc('test')
new_items = df._data.items.delete(loc)
df._data.set_items_norename(new_items)

Above lines could finally fix my problem (after refering to the source code in core/internals.py ). The problem is due to that the data block corresponding to df['test'] was empty.

But I still wonder why DataFrame add the item df['test'] into its items list, while the data block hasn't been added properly?

@jreback
Copy link
Contributor

jreback commented Mar 12, 2013

see PR #3018, essentially the insertion operation is 2 stages, if the first fails then the state of th existing
frame is irrecoverable (as you noted). What you showed above is essentially what I did (in the except block)

thanks for the report.

@zetyang
Copy link
Author

zetyang commented Mar 13, 2013

My pleasure, btw, pandas is really awesome :-)

@zetyang zetyang closed this as completed Mar 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants