How to insert 100,000 parent rows each with 200 child rows super fast?

db2 bulk insert
db2 insert multiple rows
how to improve insert performance
db2 merge insert multiple rows
rownum between 100 and 200 in oracle
db2 bulk insert from file
db2 multi row insert performance
how to make insert statement run faster

I have a parent entity called OrderEvent and child entity called PreCondition. One OrderEvent can have many PreConditions(>= 200). I need to save 100000 OrderEvent + 100000 * 200 PreCondition. I used Repository.save(list Of OrderEvents) and save into DB for every 1000 records. It takes approx 30secs to insert 1000 OrderEvents.

It takes almost an hour to save all 100000 OrderEvents.

Is there any way to bring down below 2 mins?

Tried save entities method of repository

    public  void parseOrder(String path, String collectionName) throws ParseException {
        BufferedReader reader;
        Connection conn = (Connection) em.unwrap(java.sql.Connection.class);
        System.out.println(conn);
        try {
            reader = new BufferedReader(new FileReader(
                    path));
            String line = reader.readLine();

            String jobNumber =  line.substring(0, 7).trim();
            String recordType =  line.substring(7, 9).trim();
            Integer len = line.length();
            preId = 0L;
            postId = 0L;
            eventId = 0L;

            OrderEvent orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);
            Integer count = 1;
            Integer batch = 0;
            long startTime = System.nanoTime();

            List<OrderEvent> list = new ArrayList<OrderEvent>();
            while (line != null) {
                line = reader.readLine();
                if (line == null) {
                    continue;
                }
                jobNumber =  line.substring(0, 7).trim();
                recordType =  line.substring(7, 9).trim();
                len = line.length();

                if (recordType.equals("0H")) { 

                    count++;
                    batch++;
                    if (batch.equals(1000)) {
                        orderRepository.save(list);
                        list.clear();
                        long estimatedTime = System.nanoTime() - startTime;
                        System.out.println("Processed " +  batch + " records in " +  estimatedTime / 1_000_000_000.  +  " second(s).");

                        batch = 0;
                        startTime = System.nanoTime();
                    }


                    list.add(orderEvent);
                    //orderRepository.saveAndFlush(orderEvent);
                    orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);

                } else if (recordType.equals("2F")) { 
                    this.paraseFeature(line,len,jobNumber,orderEvent);
                }
            }
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private  OrderEvent paraseHeader (String line,Integer len,String jobNumber,String collectionName) throws ParseException {

            String model = line.substring(9, 16).trim();
            String processDate =  line.substring(len-11,len-3).trim();
            String formattedProcessDate =  processDate.substring(0,4) + "-" + 
                    processDate.substring(4,6) +"-" + processDate.substring(6,8) + " 00:00:00";

            //eventId++;

            OrderEvent orderEvent = new OrderEvent(jobNumber,UUID.randomUUID().toString(),collectionName,
                    formatter.parse(formattedProcessDate));

        //  preId++;
            //postId++;
            orderEvent.fillPrecondition("Model", "Stimulus", "OP_EQ", model);
            orderEvent.fillPostcondition("Add_Fact","Coded","Response","True");


            return orderEvent;
    }
    private  void paraseFeature (String line,Integer len, String jobNumber, OrderEvent orderEvent) {

    //  preId++;
        String feature = line.substring(len-7,len).trim();
        orderEvent.fillPrecondition("Feature", "Stimulus", "OP_EQ", feature);
    }

This usually depends on the database setup e.g. what is the latency to the client, what are the indexes on the tables, how queries are locking the table and so on.

Make sure that you understand how much time is spent in network operations. It could be the limiting factor, especially if your database sits on the other side of the world.

First establish what is the latency between the client and the database server. If it's 10 ms than inserting this row by row would be: 100,000 * 200 * 10ms = 200000s ~ 56h. This is very slow so make sure you are using batch inserts with JDBC.

Sometimes the insertion process can be significantly speed up by creating a shadow table:

  1. Create new tables that are identical to OrderEvents and PreCondition tables. Some RDBMS allow for CREATE TABLE ... AS SELECT ... FROM ... syntax.
  2. Disable foreign keys and indexes on the shadow tables.
  3. Bulk insert all the data.
  4. Enable foreign keys and indexes on shadow tables. This will hopefully ensure that imported data was correct.
  5. Insert from shadow tables into the actual tables e.g. by running INSERT INTO ... SELECT ... FROM ....
  6. Delete shadow table.

However the best option would be to skip JDBC and switch to bulk load utility provided by your database e.g. Oracle DB has External Tables and SQL*Loader. These tools are specifically designed to ingest large quantities of data efficiently while JDBC is a general purpose interface.

Tips for improving INSERT performance in DB2 Universal Database , Explore techniques for optimizing the performance of row inserts into a For example, with a buffer pool of 100,000 pages and no page cleaning, a bulk This approach can achieve very fast performance but does require some Then each child row's parent will in general be on a different partition, so for� Thoughts on the issue, thrown in random order: The obvious index for this query is: (rated_user_id, rating).A query that gets data for only one of the million users and needs 17 seconds is doing something wrong: reading from the (rated_user_id, rater_user_id) index and then reading from the table the (hundreds to thousands) values for the rating column, as rating is not in any index.

Something like that better to do using DB server BULK-processing operation. Yes, it totally different process, but it will takes seconds. not even minutes.

Unfortunatelly HOWTO is very depended on SQL-Server

MS SQL: BULK INSERT: https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

PostgreSQL: COPY: https://www.postgresql.org/docs/current/sql-copy.html

How to pass .NET collection of objects (parent-child) hierarchy to a , So this all makes roughly = 100K + 200K + 300K = 600K records . This was very improved . As this would be close to metal , it would be 10 times faster than entity NET C# in loop and for each consignment, along with it's children to insert in set-based operations, you would insert all the parent rows,� Further on sequences on multi-table inserts, in regard to Jay's comment about the evaluation of NEXTVAL for each row returned in the query, this causes problems with the parent child situation, which I think has been touched briefly here but only on a one-to-one basis, not one-to-many sw>create table swp (p number); Table created.

In c# I can use SqlBulkCopy for this type of tasks.

Maybe in java there is an equivalent API.. Something like this: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy

SQL Performance Best Practices, Use multi-row INSERT statements for bulk inserts into existing tables DELETE statements, a single multi-row DML is faster than multiple single-row DMLs. that work on the parent table(s) will need to "skip over" the data in interleaved children, like CockroachDB are very different than for a legacy single-node database. The query which you've shown is very good and working very fast within a range of 100,000 to 150,000 rows but when trying to get rows more than 500,000's it is taking a minute for doing so. The query : select fatwaid,fatwatitle from (select a.*,rownum r from (select * from fatwa order by fatwaid) a where rownum <= &upperbound ) where r

Ask TOM "getting rows N through M of a result set", 100 records, I would like to cut the result set into 4, each of very good and working very fast within a range of 100,000 to 150,000 rows but 1 insert into order_by 2 select rownum ,'N' from all_objects 3 where rownum 0 sorts (disk) 200 rows processed SQL> SELECT b.rnum - a.minrnum + 1 slno,� SQL provides the INSERT statement that allows you to insert one or more rows into a table. The INSERT statement allows you to: Insert a single row into a table; Insert multiple rows into a table; Copy rows from a table to another table. We will examine each function of the INSERT statement in the following sections. Insert one row into a table

How to Delete Millions of Rows Fast with SQL, An overview of way to delete lots of data fast in Oracle Database. But every now and then you may need to do a bit of spring cleaning and clear down data. Removing all the rows fast with truncate; Using create-table-as-select to wipe a If you want to clear down the parent and child tables in one fell� The fastest Oracle table insert rate I've ever seen was 400,000 rows per second, about 24 millions rows per minute, using super-fast RAM disk (SSD). Speed of inserts is primarily a function of device speed, but NOLOGGING, maximum parallel DML (which, in turn, a function of the number of CPU's and the layout of the disks) also factor into the

MySQL Tutorial - MySQL By Examples for Beginners, Insert multiple rows INSERT INTO tableName (column1Name, An index is build automatically on the primary-key column to facilitate fast Using comparison operator (such as = or <> ) to check for NULL is a mistake - a very common mistake. child table to the suppliers parent table, to ensure that every supplierID in the� PARALLEL_COMBINED_WITH_CHILD . PARALLEL_COMBINED_WITH_PARENT . A PARALLEL_COMBINED_WITH_PARENT operation occurs when the database performs the step simultaneously with the parent step. If a parallel step produces many rows, then the QC may not be able to consume the rows as fast as they are produced. Little can be done to improve this situation.

Comments
  • If you are sure that you do not violate any constraints, you might disable the constraints then insert the rows then reactivate contraints.
  • BTW you are not saving the "leftover" batch that may be less than 1000. (Unless you always only have to handle exactly n x 1000 orders...)
  • Client and DB server on same machine.
  • Also the data is stored in a text file and app reads line by line and create fill OrderEvent and PreConditions objects and saved. So how bulk load utility can be used here?
  • Instead of using JDBC convert format to something that bulk tool can use e.g. correctly formatted CSV. Than load this CSV from the filesystem using bulk tool.