-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Unable to store np.ndarray objects as elements and store this DF as HDF #20440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you try making a smaller example? It's hard to see what's going on. Is sqlite necessary to demonstrate the bug? |
@TomAugspurger I threw out most of things that might disturb now. You can see it updated in my main post. SQLite is necessary here although it occupies one 1-2 line of code here. |
You might want to glance through http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports :) |
@TomAugspurger I hope I got your point now correctly. Updated the minimal example and the error message from it. I get the same error message. Sorry for the inconvenience. |
Perfect, thanks. So you're storing arrays within the column. No I don't think that's currently supported, and I'm not sure whether it's possible with pytables / HDF5. In this case, your |
It seems like pytables and HDF5 do have some support for ragged (variable length) arrays. http://www.pytables.org/usersguide/libref/homogenous_storage.html#the-vlarray-class, so that might be an option. In general, storing arrays inside a Series column isn't well supported by pandas at the moment. |
this is not supported in the |
I have a databases storing time series of many channels in SQLite. At the moment I'm converting them to HDF5 via pandas. In there are also many characteristic quantities like mass, time in seconds as well as the time signal as BLOB stored. When reading them with pd.read_sql() the BLOBs are interpreted as strings.
When using to_hdf to store the pandas Dataframe as HDF and format='tables' the database increases its size 6-7x compared to the SQLite database. I assumed here that the strings are the problem, which is why I convert them to a numpy array.
I'm running a for-Loop over all rows converting the string into a numpy array with the corresponding data type, see python Script following:
When trying to save this as hdf, I recieve the following error.
Something I don't understand is why it's bragging about [mixed] object type although every element is from the same data type in one row. I already saw #8284, which did not help in my case. Other similar incidents are not reported (or I don't find them; if so I'm very sorry). Interestingly this works when changing from
format='table'
toformat='fixed'
. This is not an option here although because I need the possibility to search in my database (e.g.where='time >= 132450000'
) as well as append data due to the conversion process and lack of memory.Is pandas able to store numpy arrays as elements and
format='table'
? This would be a very nice feature regarding time series in Dataframes.Best,
fwillo
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: