Skip to content

ENH: support Categorical hist plotting #8712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jreback opened this issue Nov 2, 2014 · 3 comments
Open

ENH: support Categorical hist plotting #8712

jreback opened this issue Nov 2, 2014 · 3 comments

Comments

@jreback
Copy link
Contributor

jreback commented Nov 2, 2014

s = Series(list('abbc')).astype('category')

This raises

s.hist()

This works

s.value_counts().plot()

I think this should work, beyond this, anyone have ideas on what/how to plot categorical data?

cc @JanSchulz
cc @TomAugspurger
cc @jorisvandenbossche

@jreback jreback added this to the 0.16.0 milestone Nov 2, 2014
@jorisvandenbossche
Copy link
Member

I don't know if that should work. The natural way to plot a categorical in something hist-like is using a barplot with the frequency of each category I think (s.value_counts().plot(kind='bar')). But, that is not exactly what a histogram is (or at least, not the 'classical' definition). On the other hand: practicallity beats purity?

Also, in R:

> x <- c('a', 'a', 'b', 'c')
> hist(x)
Error in hist.default(x) : 'x' must be numeric
> hist(as.factor(x))
Error in hist.default(as.factor(x)) : 'x' must be numeric

So maybe we should rather have a better error message here? (which also holds True for string series):

In [14]: s.astype(object).hist()
...
TypeError: cannot concatenate 'str' and 'float' objects

So maybe we should check for numeric dtype in hist?

@wesm
Copy link
Member

wesm commented Jul 6, 2018

What do you think about making s.hist() do s.value_counts().plot.bar()?

@wcneill
Copy link

wcneill commented Mar 17, 2020

Hi, newb here. I was going to submit a bug report, but I found this. It does seem the same, but before I walk away without reporting, I wanted to make sure that my issue is actually different:

The docs for DataFrame.hist() say:

This function calls matplotlib.pyplot.hist(), on each series in the DataFrame, resulting in one histogram per column.

However, what is displayed is a set of histograms for numerical data only. This lead me to believe that maybe matplotlib.pyplot.hist() only works on numeric data. However, I tried it anyway and found that

matplotlib.pyplot.hist(df['categorical column with strings/objects'])

results in a beautiful histogram.

I also tried calling .hist directly on a series with categorical data like so:

df['categorical string data].hist()

And also got great results.

Again, not trying to step on any toes, just a new programmer trying to contribute. Let me know if this issue is already covered here :)

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants