How to write DataFrame to postgres table?

pandas dataframe to postgresql psycopg2
python write to postgres
pandas create table from dataframe
load postgres table to pandas dataframe
dataframe to_sql
python insert dataframe into oracle table
insert into table using dataframe
sqlalchemy postgres

There is DataFrame.to_sql method, but it works only for mysql, sqlite and oracle databases. I cant pass to this method postgres connection or sqlalchemy engine.

Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sql module now uses sqlalchemy to support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:

from sqlalchemy import create_engine
engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase')
df.to_sql('table_name', engine)

You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql: https://gist.github.com/jorisvandenbossche/10841234. I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where con is a postgresql connection):

import sql  # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')

Storing a pandas dataframe in a PostgreSQL database – Jan , For example, in a company's revenue database, one table might contain a list of orders (with customer id, item id, amount paid, date, billing  The read_sql() method of pandas DataFrame, reads from a PostgreSQL table and loads the data into a DataFrame object. The to_sql() method of the DataFrame writes its contents to a PostgreSQL table. The python example reads a DataFrame from a PostgreSQL table and writes a DataFrame to a PostgreSQL table.

Faster option:

The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.

Create an engine based on your DB specifications.

Create a table in your postgres DB that has equal number of columns as the Dataframe (df).

Data in DF will get inserted in your postgres table.

from sqlalchemy import create_engine
import psycopg2 
import io

if you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.

engine = create_engine('postgresql+psycopg2://username:password@host:port/database')

df.head(0).to_sql('table_name', engine, if_exists='replace',index=False) #truncates the table

conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()

Not able to write a dataframe to postgres table, Please note I want to create table at run time rather than first creating the schema at postgres and then writing it. from sqlalchemy import  3 Answers 3. Try write.jdbc and pass the parameters individually created outside the write.jdbc(). Also check the port on which postgres is available for writing mine is 5432 for Postgres 9.6 and 5433 for Postgres 8.4.

Pandas 0.24.0+ solution

In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method

import csv
from io import StringIO

from sqlalchemy import create_engine

def psql_insert_copy(table, conn, keys, data_iter):
    # gets a DBAPI connection that can provide a cursor
    dbapi_conn = conn.connection
    with dbapi_conn.cursor() as cur:
        s_buf = StringIO()
        writer = csv.writer(s_buf)
        writer.writerows(data_iter)
        s_buf.seek(0)

        columns = ', '.join('"{}"'.format(k) for k in keys)
        if table.schema:
            table_name = '{}.{}'.format(table.schema, table.name)
        else:
            table_name = table.name

        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
            table_name, columns)
        cur.copy_expert(sql=sql, file=s_buf)

engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)

How to query a table from PostgreSQL with Python, Python - Read & Write tables from PostgreSQL with Security Writing Dataframe to PostgreSQL and replacing table if it already exists. In this project, I will set up a PostgreSQL database on a Windows 10 machine, connect to the database from Python using the SQLAlchemy package and ingest the EU industry production dataset from a pandas dataframe. Setting up PostgreSQL. Let’s start by downloading and installing the current version of PostgreSQL (9.6.5), following this

This is how I did it.

It may be faster because it is using execute_batch:

# df is the dataframe
if len(df) > 0:
    df_columns = list(df)
    # create (col1,col2,...)
    columns = ",".join(df_columns)

    # create VALUES('%s', '%s",...) one '%s' per column
    values = "VALUES({})".format(",".join(["%s" for _ in df_columns])) 

    #create INSERT INTO table (columns) VALUES('%s',...)
    insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)

    cur = conn.cursor()
    psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
    conn.commit()
    cur.close()

Read & Write tables from PostgreSQL – Saagie Help Center, Gist Page : example-python-read-and-write-from-postgresql Writing Dataframe to PostgreSQL and replacing table if it already exists Iterate thru the rows of the DataFrame, yielding a string representing a row (see below) Convert this iterable in a stream, using for example Python: Convert an iterable to a stream? Finally use psycopg's copy_from on this stream. To yield rows of a DataFrame efficiently use something like:

For Python 2.7 and Pandas 0.24.2 and using Psycopg2

Psycopg2 Connection Module

def dbConnect (db_parm, username_parm, host_parm, pw_parm):
    # Parse in connection information
    credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm}
    conn = psycopg2.connect(**credentials)
    conn.autocommit = True  # auto-commit each entry to the database
    conn.cursor_factory = RealDictCursor
    cur = conn.cursor()
    print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm))
    return conn, cur

Connect to the database

conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd)

Assuming dataframe to be present already as df

output = io.BytesIO() # For Python3 use StringIO
df.to_csv(output, sep='\t', header=True, index=False)
output.seek(0) # Required for rewinding the String object
copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL ''  ESCAPE '\\' HEADER "  # Replace your table name in place of mem_info
cur.copy_expert(copy_query, output)
conn.commit()

From Pandas Dataframe To SQL Table using Psycopg2, For a full functioning example, please refer to my Jupyter notebook on GitHub. Step 2: Connect to the database and insert your dataframe one row at the def connect(params_dic): """ Connect to the PostgreSQL database  Finally, close the communication with the PostgreSQL database server by calling the close() method of the cursor and connection objects. cur.close() conn.close() Inserting one row into a PostgreSQL table example. For the demonstration, we will use the vendors table in the suppliers table that we created in the creating table tutorial.

bulk insert from python Pandas DataFrame to Postgres database, Recipe for (fast) bulk insert from python Pandas DataFrame to Postgres database header=None)) # Write the Pandas DataFrame as a csv to the buffer. In this story, i would like to walk you through the steps involved to perform read and write out of existing sql databases like postgresql, oracle etc. Following are the two scenario’s covered in this story. To save the spark dataframe object into the table using pyspark. To Load the table data into the spark dataframe.

pandas.DataFrame.to_sql, Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or  Write DataFrame index as a column. Uses index_label as the column name in the table. index_label str or sequence, default None. Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. chunksize int, optional

Faster loading of Dataframes from Pandas to Postgres : Python, A DataFrame I was loading into a Postgres DB has been growing larger and to_sql() was no Creates the Postgres table using pd.io.sql.get_schema (Note that I drop the StringIO() # ignore the index df.to_csv(output, sep='\t', header=​False,  Create an engine and table based on your DB specifications. The Table should have an equal number of columns as the Dataframe (df). Data in DF will get inserted in your postgres table. If you want to replace the table, we can replace it with the to_sql method using headers from DF and then load the entire big time-consuming DF into DB.

Comments
  • Did this make it to 0.14?
  • Yes, and also 0.15 is already released (release candidate). I will update the answer, thanks for asking.
  • This post solved the problem for me: stackoverflow.com/questions/24189150/…
  • Note: to_sql does not export array types in postgres.
  • Instead of creating a new Sqlalchemy engine, can I use an existing Postgres connection created using psycopg2.connect()?
  • What does the variable contents do? Should this be the one that is written in copy_from()?
  • @n1000 Yeah just ignore the contents variable, everything else should work just fine
  • why do you do output.seek(0) ?
  • This is so fast that it's funny :D
  • Load is table is failing for me because of new line characters in some fields. How do I handle this? df.to_csv(output, sep='\t', header=False, index=False, encoding='utf-8') cur.copy_from(output, 'messages', null="") # null values become ''
  • For most of the time, add method='multi' option is fast enough. But yes, this COPY method is the fastest way right now.
  • Is this for csv's only? Can it be used with .xlsx as well? Some notes on what each part of this is doing would be helpful. The first part after the with is writing to an in memory buffer. The last part of the with is using an SQL statement and taking advantage of copy_expert's speed to bulk load the data. What is the middle part that starts with columns = doing?
  • I get AttributeError: module 'psycopg2' has no attribute 'extras'. Ah, this needs to be explicitly imported. import psycopg2.extras
  • this function is much faster than the sqlalchemy solution