-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
read_csv: Infers different column types in different runs #13604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If the dtype argument is not specified in read_csv, the result is not always the same in all runs. This is probably a pandas bug (pandas-dev/pandas#13604).
Thanks for the report. Unfortunately I couldn't reproduce it on my mac. It looks to be always If no options are specified, dtypes are |
pls pd.show_versions() and exact code that u r running; and print the pandas version in the running code |
Here's the program I'm running, which I call #!/usr/bin/env python3
from io import StringIO
import pandas as pd
test_timeseries = """\
2008-02-07 09:40,1032.43
2008-02-07 09:50,1042.54
2008-02-07 10:00,1051.65
"""
df = pd.read_csv(StringIO(test_timeseries), parse_dates=[0],
usecols=['date', 'value'], index_col=0, header=None,
names=('date', 'value'))
print ('Result: {}'.format(df.value.dtype))
pd.show_versions() Here is some output:
|
I can reproduce this, but only when called from a script. If I repeat it multiple times in an interactive console, it gives always the same. In only seem to see this with python 3 and not with python 2, but there are also many other differences between the two environments, so not sure this is the cause of the difference. |
I have been able to isolate it to https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L1457. We are building a list from a set and expecting a consistent order. I would love to help, but I don't know |
@gfyoung is there a replicating test for this? |
I added a test here that replicates the same situation. Should I explicitly add this example to that set of tests as well? |
yes i think a replica of this issue would be good |
I run this program 10 times and the result is sometimes
float64
and sometimesobject
.This happens with pandas 0.18.1 on Debian Jessie amd64 with Python 3.4.2 and numpy 1.11.1. I don't see it happening with Debian's packaged pandas 0.14.1.
I can work around this by specifying the
dtype
argument; but shouldn't pandas behave deterministically when it's omitted?The text was updated successfully, but these errors were encountered: