to_sql is too slow #15276

dean12 · 2017-02-01T03:48:24Z

Code Sample,

df_name.to_sql('table_name',
                          schema = 'public',
                          con = engine,
                          index = False,
                          if_exists = 'replace')

Problem description

Im writing a 500,000 row dataframe to a postgres AWS database and it takes a very, very long time to push the data through.

It is a fairly large SQL server and my internet connection is excellent so I've ruled those out as contributing to the problem.

In comparison, csv2sql or using cat and piping into psql on the command line is much quicker.

jreback · 2017-02-01T03:59:51Z

see here: http://stackoverflow.com/questions/33816918/write-large-pandas-dataframes-to-sql-server-database

with SQLServer you need to import via csv with a bulk upload for efficiency

jreback · 2017-02-01T04:02:42Z

you might find this useful: http://odo.pydata.org/en/latest/perf.html

citynorman · 2018-10-14T21:52:56Z

ODO wouldn't work for me, it generates errors I wasn't able to fix, but d6tstack worked fine https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb. You can preprocess with pandas and it uses postgres COPY FROM to make the import quick. Works well with RDS postgres.

llautert · 2018-11-29T18:26:47Z

Add this code below engine = create_engine(connection_string):

from sqlalchemy import event

@event.listens_for(e, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

In my code, to_sql function was taking 7 min to execute, and now it takes only 5 seconds ;)

dean12 · 2018-11-29T23:35:25Z

Thanks @llautert!
That helped a lot!

# dont forget to import event
from sqlalchemy import event, create_engine

engine = create_engine(connection_string)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

tim-sauchuk · 2018-12-12T03:57:53Z

I've attempted to run this fix, but run into an error message:

AttributeError: 'psycopg2.extensions.cursor' object has no attribute 'fast_executemany'

Anyone know what's going on?

bsaunders23 · 2018-12-20T00:05:35Z

Hey @tim-sauchuk , running into the same error as well, although I found a solution that's been working great which involves a slight edit to the pandas.io.sql.py file (just delete the .pyc file from pycache before importing again to make sure it writes the new version to the compressed file)

#8953

scottcode · 2019-01-07T20:16:32Z

Hey @tim-sauchuk , running into the same error as well, although I found a solution that's been working great which involves a slight edit to the pandas.io.sql.py file (just delete the .pyc file from pycache before importing again to make sure it writes the new version to the compressed file)

#8953

Issue #8953 that @bsaunders23 mentioned also shows a way to "monkey patch" (fix it at run time). I tried it, and a 20k dataset that took 10+ minutes to upload then took only 4 seconds.

ghost · 2019-02-14T15:28:10Z

Thanks @llautert!
That helped a lot!

# dont forget to import event
from sqlalchemy import event, create_engine

engine = create_engine(connection_string)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()

Does anyone know how I can implement this solution inside a class with a self.engine instance?

DanielOverdevest · 2019-02-15T13:54:24Z

Does anyone know how I can implement this solution inside a class with a self.engine instance?

Works for me by refering to self.engine

Example:

    self.engine = sqlalchemy.create_engine(connectionString, echo=echo)
    self.connection = self.engine.connect()

    @event.listens_for(self.engine, 'before_cursor_execute')
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        print("Listen before_cursor_execute - executemany: %s" % str(executemany))
        if executemany:
            cursor.fast_executemany = True
            cursor.commit()

xnejed07 · 2019-02-20T22:11:43Z

Does not work for me. What pandas and sqlalchemy version are you using?

pedrovgp · 2019-03-31T12:05:54Z

I tried it running sqlalchemy: 1.2.4-py35h14c3975_0 and 1.2.11-py35h7b6447c_0

but I am getting

AttributeError: 'psycopg2.extensions.cursor' object has no attribute 'fast_executemany'

davefanch87 · 2019-04-23T16:50:48Z

@dean12 @llautert

What does the function call look like in this context? Or in other words, what are you using for the arguments to successfully upload the table?

<# dont forget to import event
from sqlalchemy import event, create_engine

engine = create_engine(connection_string)

@event.listens_for(engine, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()>``

lllong33 · 2019-05-14T02:13:16Z

see here：https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc

vaneseltine · 2019-06-04T14:59:45Z

I tried it running sqlalchemy: 1.2.4-py35h14c3975_0 and 1.2.11-py35h7b6447c_0

but I am getting

AttributeError: 'psycopg2.extensions.cursor' object has no attribute 'fast_executemany'

You are using psycopg2, which is a postgresql driver. This issue and fix pertain to Microsoft SQL Server using the pyodbc driver.

SincerelyUnique · 2019-06-26T07:49:03Z

what about add 'dtype' parameter

FukoH · 2019-07-08T08:12:03Z

Does anyone know how I can implement this solution inside a class with a self.engine instance?

Works for me by refering to self.engine

Example:

    self.engine = sqlalchemy.create_engine(connectionString, echo=echo)
    self.connection = self.engine.connect()

    @event.listens_for(self.engine, 'before_cursor_execute')
    def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
        print("Listen before_cursor_execute - executemany: %s" % str(executemany))
        if executemany:
            cursor.fast_executemany = True
            cursor.commit()

Have you found out how?

pankgeorg · 2019-08-08T13:57:22Z

I think the correct answer should be to use https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-batch-mode-fast-execution, if you're trying to save a pandas dataframe to a postgres

KamranMK · 2019-08-12T12:08:58Z

A new version of pandas contains method parameter which could be chosen to be 'multi'. This makes the code run much faster.

giuliobeseghi · 2020-01-09T12:19:09Z

fast_executemany can be performed in a single step now (sqlalchemy >= 1.3.0):

engine = sqlalchemy.create_engine(connection_string, fast_executemany=True)

Maybe it's worth mentioning it somewhere in the docs or with an example? It's a particular case not related to pandas, but it's a small addition that could drastically improve the performance in many cases.

json2d · 2020-03-15T00:17:46Z

A new version of pandas contains method parameter which could be chosen to be 'multi'. This makes the code run much faster.

You'd think that setting the chunksize parameter would be enough to make to_sql batch insert but nope.

xhochy · 2020-07-01T07:26:36Z

An alternative for MS SQL users is to also use turbodbc.Cursor.insertmanycolumns, I've explained that in the linked StackOverflow post: https://stackoverflow.com/a/62671681/1689261

jpuerto-psc · 2020-08-24T14:43:33Z

For future readers on this, there are two options to use a 'batch_mode' for to_sql. The following are the two combinations:

create_engine(connection_string, executemany_mode='batch', executemany_batch_page_size=x)

or

create_engine(connection_string, executemany_mode='values', executemany_values_page_size=x)

Details on these arguments can be found here: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-fast-execution-helpers

feluelle · 2020-08-27T21:29:59Z

For postgres users, I recommend setting method to a callable:

callable with signature (pd_table, conn, keys, data_iter): This can be used to implement a more performant insertion method based on specific backend dialect features.

and call the function from the example code here https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#insertion-method and

Using COPY FROM is really a lot faster 🚀

Jakobhenningjensen · 2021-09-23T03:46:37Z

Add this code below engine = create_engine(connection_string):
from sqlalchemy import event

@event.listens_for(e, 'before_cursor_execute')
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
    if executemany:
        cursor.fast_executemany = True
        cursor.commit()
In my code, to_sql function was taking 7 min to execute, and now it takes only 5 seconds ;)

do you call receive_before_cursor_execute somewhere (since it needs the argument executemany to be True) or how is that used with df.to_sql() ?

mzw4 · 2023-03-03T20:34:24Z

For postgres users, I recommend setting method to a callable:

callable with signature (pd_table, conn, keys, data_iter): This can be used to implement a more performant insertion method based on specific backend dialect features.

and call the function from the example code here https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#insertion-method and

Using COPY FROM is really a lot faster 🚀

This is the only solution that worked for me for Postgres, thank you @feluelle! This took my query from 20 min to < 1 min.

I had tried method='multi' and various chunksizes, but it was slower than without it. Also tried executemany_mode='batch' in the sqlalchemy engine but it threw an "Invalid value" error and I couldn't find a solution to that.

jreback closed this as completed Feb 1, 2017

jreback added IO SQL to_sql, read_sql, read_sql_query Usage Question labels Feb 1, 2017

jreback added this to the No action milestone Feb 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_sql is too slow #15276

to_sql is too slow #15276

dean12 commented Feb 1, 2017 •

edited

Loading

jreback commented Feb 1, 2017

jreback commented Feb 1, 2017

citynorman commented Oct 14, 2018 •

edited

Loading

llautert commented Nov 29, 2018

dean12 commented Nov 29, 2018

tim-sauchuk commented Dec 12, 2018

bsaunders23 commented Dec 20, 2018

scottcode commented Jan 7, 2019

ghost commented Feb 14, 2019

DanielOverdevest commented Feb 15, 2019

xnejed07 commented Feb 20, 2019

pedrovgp commented Mar 31, 2019

davefanch87 commented Apr 23, 2019

lllong33 commented May 14, 2019

vaneseltine commented Jun 4, 2019

SincerelyUnique commented Jun 26, 2019

FukoH commented Jul 8, 2019

pankgeorg commented Aug 8, 2019

KamranMK commented Aug 12, 2019 •

edited

Loading

giuliobeseghi commented Jan 9, 2020

json2d commented Mar 15, 2020

xhochy commented Jul 1, 2020

jpuerto-psc commented Aug 24, 2020

feluelle commented Aug 27, 2020

Jakobhenningjensen commented Sep 23, 2021

mzw4 commented Mar 3, 2023

to_sql is too slow #15276

to_sql is too slow #15276

Comments

dean12 commented Feb 1, 2017 • edited Loading

Code Sample,

Problem description

jreback commented Feb 1, 2017

jreback commented Feb 1, 2017

citynorman commented Oct 14, 2018 • edited Loading

llautert commented Nov 29, 2018

dean12 commented Nov 29, 2018

tim-sauchuk commented Dec 12, 2018

bsaunders23 commented Dec 20, 2018

scottcode commented Jan 7, 2019

ghost commented Feb 14, 2019

DanielOverdevest commented Feb 15, 2019

xnejed07 commented Feb 20, 2019

pedrovgp commented Mar 31, 2019

davefanch87 commented Apr 23, 2019

lllong33 commented May 14, 2019

vaneseltine commented Jun 4, 2019

SincerelyUnique commented Jun 26, 2019

FukoH commented Jul 8, 2019

pankgeorg commented Aug 8, 2019

KamranMK commented Aug 12, 2019 • edited Loading

giuliobeseghi commented Jan 9, 2020

json2d commented Mar 15, 2020

xhochy commented Jul 1, 2020

jpuerto-psc commented Aug 24, 2020

feluelle commented Aug 27, 2020

Jakobhenningjensen commented Sep 23, 2021

mzw4 commented Mar 3, 2023

dean12 commented Feb 1, 2017 •

edited

Loading

citynorman commented Oct 14, 2018 •

edited

Loading

KamranMK commented Aug 12, 2019 •

edited

Loading