regression in 0.10.1 with boolean indexing? #2745

ruidc · 2013-01-24T10:30:18Z

this used to work in 0.10 but now fails in 0.10.1:

import pandas
df = pandas.DataFrame(index=[1,2])
df['test'] = [1,2]
df['test'][[True, False]] = [0]

Now gives:
ValueError: Length of replacements must equal series length

possibly related to closed issue #2703

The text was updated successfully, but these errors were encountered:

jreback · 2013-01-24T11:39:34Z

this worked by 'accident' before 0.10.1, see #2686
this will work if the rhs is a same length list/ndarray, constant expression, or alignable series
the problem with a list that is not the correct length is that it is ambiguous what should be assigned (e.g. do you cycle the values or not)

ruidc · 2013-01-24T12:24:47Z

The provided reference is difficult for me to follow. Can you provide a simple example where this would be ambiguous?

jreback · 2013-01-24T12:46:40Z

your example is ambiguous, e.g. the rhs side is a 1 element list, you are assigning to 2 elements

should df['test'] = [0] work? (after your first assignment where 'test' is created)
what if there are 3 elements on the rhs?

numpy by default will take whatever elements that you supply (so if you have 2 on the left but 3 on the right it will take the first 2, with 1 element on the rhs it will cycle them).

since you have a series with defined labels on the lhs, the rhs needs to be aligned so that the labels match, in this case you don't have labels, so its impossible to match unambigously (a constant is a special case where all labels from the lhs get the value, an equal length series or ndarray is unambigous, there is a 1-1 match between lhs and rhs)

jreback · 2013-01-24T12:49:29Z

sorry....number got mixed up...its PR #2686

ruidc · 2013-01-24T13:08:20Z

Thanks for the corrected link, that makes more sense.

I would not expect df['test'] = [0] to work after first assignment because of the length mismatch, but in the case where the result of the boolean vector on LHS matches the shape on the RHS it's unambiguous though.
I can understand if the lengths were different

jreback · 2013-01-24T13:21:20Z

the alignment happens before the indexing, so it IS ambiguous, as I said, you can simply make the rhs a series and it will work (you example was dtype int, so I changed to floats and it works, (with ints I think this is a bug, cause the reindexing should cast the ints to floats so you can put Nans on the )

df['test'] = [1.,2.]
df['test'][[True,False]] = pd.Series([0.],index=[1])

jreback · 2013-01-24T13:23:04Z

see issue #2746

jreback · 2013-01-24T13:27:09Z

I suppose that if you provide a list on the rhs that matches the indexed vector then it SHOULD work, but a priori you almost never know (otherwise why would you need to do the boolean indexing?) - e.g. in your example you are explicity using True/False...using this is an expression though

ruidc · 2013-01-24T13:45:29Z

but a priori you almost never know

?
isn't it just a matter of testing the length AFTER the vector is applied?

using this is an expression though

?
how so?

otherwise why would you need to do the boolean indexing

In our code we are interested doing multiple, separate operations on a slice that we refer to by using the boolean vector as a variable - ndarray of dtype bool which makes sense in our code.

jreback · 2013-01-24T14:04:59Z

yes, this could be tested AFTER the vector is applied

what I meant (my language is unclear!) - is that if you have a boolean vector that is already indicative of true/false (e.g. its not a computed vector), then use reindex by that and assign directly to your ndarray, the point of an alignment is so you don't make errors by assigning an unlabeled vector to something, everything always has (or can be converted to something) like a series

you can certianly do what you are doing, but seems a lot clearer to make your rhs a series (which semantically is very close to a ndarray), and has the BIG advantage of having labels for the values

ruidc · 2013-01-24T14:37:41Z

yes, this could be tested AFTER the vector is applied

I'm not clear on the internal mechanics, so why isn't it done this way?

use reindex by that and assign directly to your ndarray

can you clarify how? To elaborate on our usage:

import numpy
import pandas
df = pandas.DataFrame([1, 2, 3], index=[0, 1, 2], columns=['test'], dtype=object)
interesting_subset = numpy.greater(df['test'], 1)
df['test'][interesting_subset] = ['some extra work will happen here']

and has the BIG advantage of having labels for the values

in the above, why would having a Series/labels on RHS be an advantage? Thanks for trying to help and explain, perhaps we should move this to the ML ? My biggest concern is the change in behaviour that (to me at least) was not ambiguous and hard to identify in a large code-base.

jreback · 2013-01-24T14:42:33Z

what is the ['some extra work will happen here']?

here's a psedo example

mask = df['test'] > 1
df['test'][mask] = df['test'] + 5

of course the rhs could be any series (from this df or other), that aligns by labels, that's the key
it makes it so you don't have to worry about sub-setting the rhs at all

ruidc · 2013-01-25T08:08:48Z

when you suggest a Series in RHS to "align by labels", I presume you mean, to have matching/valid index values?

['some extra work will happen here'] in my use case is a list returned from a server operation on the interesting_subset whose only relationship to the DataFrame is positional alignment of the rows, so I can work around the issue, but it still feels like a regression in a case like this where there is no ambiguity from lengths.

jreback · 2013-01-25T12:58:15Z

yes, the issue you have is that you are providing a guarantee that the ['some extra work will happen here'] are in exactly the same order and exactly the same length as the indexing array, this is an extremely strong statement; it might be true in your case, but in general this is not. what if you happen to off by 1 or 1 extra value is returned? doesn't it make more sense to have the operation 'figure' it out by aligning by labels?

ruidc · 2013-01-25T13:48:13Z

of course, but that's extra work than previously required. Thanks for clarifying.

jreback · 2013-01-25T18:16:44Z

@changhiskhan or @wesm any comments on this?

wesm · 2013-02-07T20:36:58Z

This looks like a bug to me. Marking as such and will try to fix for 0.10.2/0.11

jreback · 2013-03-22T18:54:28Z

closed by #3139

jreback · 2013-04-02T14:31:25Z

@ruidc FYI #3236 fixes the more general issue of what you were doing here

ghost assigned wesm Feb 7, 2013

jreback mentioned this issue Mar 22, 2013

BUG: GH2745 Fix issue with indexing a series with a boolean key a 1-len list on rhs #3139

Merged

jreback closed this as completed Mar 22, 2013

This was referenced Apr 2, 2013

BUG: GH3235 fix setitem on Series with boolean indexing and rhs of list #3236

Merged

assigning list to series with boolean indexing #3235

Closed

SleepingPills mentioned this issue Jul 10, 2013

BUG: Boolean indexed assignment #4192

Closed

jreback mentioned this issue Sep 5, 2013

Bug in Fancy/Boolean Indexing with nested lists #2702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression in 0.10.1 with boolean indexing? #2745

regression in 0.10.1 with boolean indexing? #2745

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

jreback commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 25, 2013

jreback commented Jan 25, 2013

ruidc commented Jan 25, 2013

jreback commented Jan 25, 2013

wesm commented Feb 7, 2013

jreback commented Mar 22, 2013

jreback commented Apr 2, 2013

regression in 0.10.1 with boolean indexing? #2745

regression in 0.10.1 with boolean indexing? #2745

Comments

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

jreback commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 24, 2013

jreback commented Jan 24, 2013

ruidc commented Jan 25, 2013

jreback commented Jan 25, 2013

ruidc commented Jan 25, 2013

jreback commented Jan 25, 2013

wesm commented Feb 7, 2013

jreback commented Mar 22, 2013

jreback commented Apr 2, 2013