Skip to content

Rank by multiple columns #4311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Jul 21, 2013 · 5 comments
Closed

Rank by multiple columns #4311

hayd opened this issue Jul 21, 2013 · 5 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement

Comments

@hayd
Copy link
Contributor

hayd commented Jul 21, 2013

I don't think this is possible atm, but would be a nice enhancement.

Similar to how you can pass list of columns to sort*.

http://stackoverflow.com/questions/17775935/sql-like-window-functions-in-pandas-row-numbering-in-python-pandas-dataframe

__probably there is a cheeky answer/hack to this q using sort...*

@jreback
Copy link
Contributor

jreback commented Jul 23, 2013

you could set the columns as a multi index then just sort the index (though I think u have an issue about multi sort )

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 1, 2015
@imre-kerr
Copy link

This was possible until 0.23, but is semi-broken currently. (Understandable since I guess it's a bit of a hack.)

df['rankby'] = pd.Series(df[['foo', 'bar']].itertuples(index=False)).values
df['ranking'] = inner.groupby('quux')['rankby'].rank(method='min')

Still works if you don't have a groupby.

@WillAyd
Copy link
Member

WillAyd commented Jun 21, 2018

@imre-kerr this is a pretty old issue that in the current iteration of pandas mixes a few concepts. If you want row-numbering within a group you should use cumcount:

https://stackoverflow.com/questions/37997668/pandas-number-rows-within-group

The behavior you are referring to of using rank would work if dealing only with numeric data. Rank within GroupBy operations on strings will raise an error, and should in the future raise also when called from a Series / Frame (see #19560)

@WillAyd
Copy link
Member

WillAyd commented Jun 21, 2018

Closing as this issue is no longer relevant - rank can be used on numeric data in combination with groupby. For object data cumcount can be used, though it would be up to the user to specify the desired order first

@WillAyd WillAyd closed this as completed Jun 21, 2018
@oulenz
Copy link

oulenz commented Nov 20, 2019

@WillAyd cumcount isn't a proper substitute, since it doesn't take a method parameter. Also, it feels wrong that it requires sorting the whole dataframe when all you want is sorting within groups (even if in practice it doesn't make much difference in terms of performance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants