-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
to_sql is too slow #15276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
see here: http://stackoverflow.com/questions/33816918/write-large-pandas-dataframes-to-sql-server-database with SQLServer you need to import via csv with a bulk upload for efficiency |
you might find this useful: http://odo.pydata.org/en/latest/perf.html |
ODO wouldn't work for me, it generates errors I wasn't able to fix, but d6tstack worked fine https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb. You can preprocess with pandas and it uses postgres COPY FROM to make the import quick. Works well with RDS postgres. |
Add this code below
In my code, |
Thanks @llautert! # dont forget to import event
from sqlalchemy import event, create_engine
engine = create_engine(connection_string)
@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
cursor.commit() |
I've attempted to run this fix, but run into an error message:
Anyone know what's going on? |
Hey @tim-sauchuk , running into the same error as well, although I found a solution that's been working great which involves a slight edit to the pandas.io.sql.py file (just delete the .pyc file from pycache before importing again to make sure it writes the new version to the compressed file) |
Issue #8953 that @bsaunders23 mentioned also shows a way to "monkey patch" (fix it at run time). I tried it, and a 20k dataset that took 10+ minutes to upload then took only 4 seconds. |
Does anyone know how I can implement this solution inside a class with a self.engine instance? |
Works for me by refering to Example:
|
Does not work for me. What pandas and sqlalchemy version are you using? |
I tried it running sqlalchemy: 1.2.4-py35h14c3975_0 and 1.2.11-py35h7b6447c_0 but I am getting
|
What does the function call look like in this context? Or in other words, what are you using for the arguments to successfully upload the table?
|
You are using psycopg2, which is a postgresql driver. This issue and fix pertain to Microsoft SQL Server using the pyodbc driver. |
what about add 'dtype' parameter |
Have you found out how? |
I think the correct answer should be to use https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-batch-mode-fast-execution, if you're trying to save a |
A new version of pandas contains |
engine = sqlalchemy.create_engine(connection_string, fast_executemany=True) Maybe it's worth mentioning it somewhere in the docs or with an example? It's a particular case not related to pandas, but it's a small addition that could drastically improve the performance in many cases. |
You'd think that setting the |
An alternative for MS SQL users is to also use |
For future readers on this, there are two options to use a 'batch_mode' for to_sql. The following are the two combinations:
or
Details on these arguments can be found here: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-fast-execution-helpers |
For postgres users, I recommend setting
and call the function from the example code here https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#insertion-method and Using |
do you call |
This is the only solution that worked for me for Postgres, thank you @feluelle! This took my query from 20 min to < 1 min. I had tried |
Code Sample,
Problem description
Im writing a 500,000 row dataframe to a postgres AWS database and it takes a very, very long time to push the data through.
It is a fairly large SQL server and my internet connection is excellent so I've ruled those out as contributing to the problem.
In comparison, csv2sql or using cat and piping into psql on the command line is much quicker.
The text was updated successfully, but these errors were encountered: