MongoDB - what is the fastest way to update all records in a collection?

mongodb update all
mongodb compass update multiple documents
mongodb update multiple documents with different values
mongodb update operators
mongodb bulk update
update mongodb field using value of another field
mongodb bulk update nodejs
mongodb updatemany

I have a collection with 9 million records. I am currently using the following script to update the entire collection:

simple_update.js

db.mydata.find().forEach(function(data) {
  db.mydata.update({_id:data._id},{$set:{pid:(2571 - data.Y + (data.X * 2572))}});
});

This is run from the command line as follows:

mongo my_test simple_update.js

So all I am doing is adding a new field pid based upon a simple calculation.

Is there a faster way? This takes a significant amount of time.

There are two things that you can do.

  1. Send an update with the 'multi' flag set to true.
  2. Store the function server-side and try using server-side code execution.

That link also contains the following advice:

This is a good technique for performing batch administrative work. Run mongo on the server, connecting via the localhost interface. The connection is then very fast and low latency. This is friendlier than db.eval() as db.eval() blocks other operations.

This is probably the fastest you'll get. You have to realize that issuing 9M updates on a single server is going to be a heavy operation. Let's say that you could get 3k updates / second, you're still talking about running for nearly an hour.

And that's not really a "mongo problem", that's going to be a hardware limitation.

db.collection.update(), How to update documents in MongoDB. How to update a single document in MongoDB. How to update multiple documents in MongoDB. How to update all  The fastest way is to use replaceOne() in MongoDB. Let us create a collection with documents − Display all documents from a collection with the help of find() method − This will produce the following output −

I am using the: db.collection.update method

// db.collection.update( criteria, objNew, upsert, multi ) // --> for reference
db.collection.update( { "_id" : { $exists : true } }, objNew, upsert, true);

Update Documents, the db.collection.bulkWrite() method for performing bulk write operations. The Bulk.find.update() method updates all matching documents. To specify a  I would like to know if there is a way to get the last update/modify time of data (i.e documents) in a collection in MongoDB. More clearly, I want to make a query to retrieve all documents updated after a particular time.

I won't recommend using {multi: true} for a larger data set, because it is less configurable.

A better way using bulk insert.

Bulk operation is really helpful for scheduler tasks. Say you have to delete data older that 6 months daily. Use bulk operation. Its fast and won't slow down server. The CPU, memory usage is not noticeable when you do insert, delete or update over a billion documents. I found {multi:true} slowing down the server when you are dealing with million+ documents(require more research in this.)

See a sample below. It's a js shell script, can run it in server as a node program as well.(use npm module shelljs or similar to achieve this)

update mongo to 3.2+

The normal way of updating multiple unique document is

let counter = 0;
db.myCol.find({}).sort({$natural:1}).limit(1000000).forEach(function(document){
    counter++;
    document.test_value = "just testing" + counter
    db.myCol.save(document)
});

It took 310-315 seconds when I tried. That's more than 5 minutes for updating a million documents.

My collection includes 100 million+ documents, so speed may differ for others.

The same using bulk insert is

    let counter = 0;
// magic no.- depends on your hardware and document size. - my document size is around 1.5kb-2kb
// performance reduces when this limit is not in 1500-2500 range.
// try different range and find fastest bulk limit for your document size or take an average.
let limitNo = 2222; 
let bulk = db.myCol.initializeUnorderedBulkOp();
let noOfDocsToProcess = 1000000;
db.myCol.find({}).sort({$natural:1}).limit(noOfDocsToProcess).forEach(function(document){
    counter++;
    noOfDocsToProcess --;
    limitNo--;
    bulk.find({_id:document._id}).update({$set:{test_value : "just testing .. " + counter}});
    if(limitNo === 0 || noOfDocsToProcess === 0){
        bulk.execute();
        bulk = db.myCol.initializeUnorderedBulkOp();
        limitNo = 2222;
    }
});

The best time was 8972 millis. So in average it took only 10 seconds to update a million documents. 30 times faster than old way.

Put the code in a .js file and execute as mongo shell script.

If someone found a better way, please update. Lets use mongo in a faster way.

Bulk.find.update(), The database computes a padding factor for each collection based on how often items grow and move. The more often the objects grow, the  MongoDB's update() and save() methods are used to update document into a collection. The update() method updates the values in the existing document while the save() method replaces the existing document with the document passed in save() method. The update() method updates the values in the existing document.

Not sure if it will be any faster but you could do a multi-update. Just say update where _id > 0 (this will be true for every object) and then set the 'multi' flag to true and it should do the same without having to iterate through the entire collection.

Check this out: MongoDB - Server Side Code Execution

Fast Updates with MongoDB (update-in-place), In the next guide, you'll see how to delete a document from a MongoDB collection​. What's Next¶. In this guide, you will delete documents from a MongoDB  This page documents the mongo shell method, and does not refer to the MongoDB Node.js driver (or any other driver) method. For corresponding MongoDB driver API, refer to your specific MongoDB driver documentation instead. Removes documents from a collection. The db.collection.remove () method can have one of two syntaxes.

Update Data in MongoDB, Are your MongoDB queries fast and effective regardless of database size? By default, MongoDB records all queries which take longer than 100 MongoDB provides an explain facility which reveals how a database In the worst cases, MongoDB might have to scan every document in the collection. The db.collection.update() method can accept: A document that only contains update operator expressions to modify specific fields. A document that only contains <field1>: <value1> pairs to replace the matching document wholesale. Starting in MongoDB 4.2, an aggregation pipeline that modifies the matching document(s).

7 Simple Speed Solutions for MongoDB, Updates multiple documents within the collection based on the filter. The updateMany() method has the following form: db.collection.updateMany( <filter>​  How do I truncate a collection in MongoDB or is there such a thing? Right now I have to delete 6 large collections all at once and I'm stopping the server, deleting the database files and then recreating the database and the collections in it. Is there a way to delete the data and leave the collection as it is?

db.collection.updateMany(), Update fields of one or all matched documents in a collection using $set, $inc operators. This article shows how to update one or multiple existing documents in a MongoDB database. "note" : "Always pays in time, very good customer!", Starting in MongoDB 4.2, the db.collection.updateMany() can use an aggregation pipeline for the update. The pipeline can consist of the following stages: The pipeline can consist of the following stages:

MongoDB update one or more fields of one or all documents, To update all documents matching the filter, use the updateMany method UpdateResult updateResult = collection. MongoDB is a general purpose, document-based, distributed database built for modern application developers and for the cloud era. No database makes you more productive. Try MongoDB free in the cloud! Used by millions of developers to power the world's most innovative products and services. View customer stories. THE DOCUMENT MODEL.

Comments
  • So would having multiple instances (slave/master) make it faster?
  • Master / Slave won't improve your write time. Mongo only has one write thread and it's typically limited by the disk throughput when doing a massive update like this. The "multiple instances" that you need is sharding. With sharding, you'll have two machines with two separate disks and you'll get nearly double the write throughput. Again though, please look at your hardware and compare that with your expected throughput.
  • Ok. I understand that. What about reading? Is sharding a way of speeding up reading or querying as well?
  • The best way to have good read times is to keep everything in memory. That means having as much RAM as possible available. Sharding is a reasonable way to "add more RAM", because you get to use the RAM from multiple machines. In that sense, sharding is "faster".
  • How does exactly storing function in local and server file any different? They all run in server context anyway.
  • You can also use {} (empty BSSON Object) for the first argument. Using null will throw an error. This is an inconsistency in the API as other methods accept null as search criteria (and interpret it as "match any"), for example find and findOne functions.
  • In shell "let" cannot be used globally. Can use in a function. so use "var" instead of "let".