DataFrame corrupted after improper column creation #3010

zetyang · 2013-03-11T07:58:44Z

I want to add a column to DataFrame using a list of arrays, but I forgot to change the 2d array into list first, so I got a "Wrong number of items passed" error.

After that, the df seemed to be corrupted, whenever I want to print it, it kept raising a "NoneType object is not iterable error".
While it could still be added new columns, the improperly assigned columns couldn't be deleted yet (also prompting the NoneType error).

This the code that could reproduce my problem:

df = DataFrame(np.ones((4,4)))
df['foo'] = np.ones((4,2)).tolist()  # OK, and that's what I should type
df['test'] = np.ones((4,2))  # the improper column creation
AsssertionError: Wrong number of items passed （1 vs 2)

df
TypeError: 'NoneType' object is not iterable, u'occured at index foo'

del df['test']
TypeError: 'NoneType' object is not iterable

jreback · 2013-03-11T19:43:33Z

You realize you are creating an embedded list (what you are assigning to df['foo']
Is there a reason you are trying to put a 2-d array where you would normally put a 1-d?

In [15]: pd.Series(np.ones((4,2)).tolist())
Out[15]: 
0    [1.0, 1.0]
1    [1.0, 1.0]
2    [1.0, 1.0]
3    [1.0, 1.0]
dtype: object

zetyang · 2013-03-12T01:30:02Z

Yes, you're right, I know your way does work and I should do that way. My point is that I wrongly assigned the column, and then realized that I made a mistake. When I went on my work, DataFrame corrupted. I used to thought that DataFrame would be robust to my mistake.

zetyang · 2013-03-12T07:16:44Z

In [1]:  df._data.items
Out[1]:  [0,1,2,3,'foo', 'test']

loc = df._data.get_loc('test')
new_items = df._data.items.delete(loc)
df._data.set_items_norename(new_items)

Above lines could finally fix my problem (after refering to the source code in core/internals.py ). The problem is due to that the data block corresponding to df['test'] was empty.

But I still wonder why DataFrame add the item df['test'] into its items list, while the data block hasn't been added properly?

jreback · 2013-03-12T12:50:08Z

see PR #3018, essentially the insertion operation is 2 stages, if the first fails then the state of th existing
frame is irrecoverable (as you noted). What you showed above is essentially what I did (in the except block)

thanks for the report.

zetyang · 2013-03-13T03:23:33Z

My pleasure, btw, pandas is really awesome :-)

jreback mentioned this issue Mar 12, 2013

BUG: Bug in DataFrame column insertion when the column creation fails (GH3010) #3018

Merged

zetyang closed this as completed Mar 13, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame corrupted after improper column creation #3010

DataFrame corrupted after improper column creation #3010

zetyang commented Mar 11, 2013

jreback commented Mar 11, 2013

zetyang commented Mar 12, 2013

zetyang commented Mar 12, 2013

jreback commented Mar 12, 2013

zetyang commented Mar 13, 2013

DataFrame corrupted after improper column creation #3010

DataFrame corrupted after improper column creation #3010

Comments

zetyang commented Mar 11, 2013

jreback commented Mar 11, 2013

zetyang commented Mar 12, 2013

zetyang commented Mar 12, 2013

jreback commented Mar 12, 2013

zetyang commented Mar 13, 2013