How to remove duplicate values inside a list in mongodb

mongodb remove duplicates
mongodb 3.0 remove duplicates
mongodb duplicate data
mongodb distinct sort
mongodb distinct limit
mongodb distinct multiple fields
mongodb compass distinct
mongodb duplicate documents

I have a mongodb collection . When I do.

db.bill.find({})

I get,

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  190215,  147708 ],
    "customer_name" : "abc"
}

I need an operation to remove the duplicate values from the bill_codes. Finally it should be

{ 
    "_id" : ObjectId("55695ea145e8a960bef8b87a"),
    "name" : "ABC. Net", 
    "code" : "1-98tfv",
    "abbreviation" : "ABC",
    "bill_codes" : [  190215,  44124,  147708 ],
    "customer_name" : "abc"
}

How to achieve this in mongodb.

Well's you can do this using the aggregation framework as follows:

collection.aggregate([
    { "$project": {
        "name": 1,
        "code": 1,
        "abbreviation": 1,
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] }
    }}
])

The $setUnion operator is a "set" operator, therefore to make a "set" then only the "unique" items are kept behind.

If you are still using a MongoDB version older than 2.6 then you would have to do this operation with $unwind and $addToSet instead:

collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": "$_id",
        "name": { "$first": "$name" },
        "code": { "$first": "$code" },
        "abbreviation": { "$first": "$abbreviation" },
        "bill_codes": { "$addToSet": "$bill_codes" }
    }}
])

It's not as efficient but the operators are supported since version 2.2.

Of course if you actually want to modify your collection documents permanently then you can expand on this and process the updates for each document accordingly. You can retrieve a "cursor" from .aggregate(), but basically following this shell example:

db.collection.aggregate([
    { "$project": {
        "bill_codes": { "$setUnion": [ "$bill_codes", [] ] },
        "same": { "$eq": [
            { "$size": "$bill_codes" },
            { "$size": { "$setUnion": [ "$bill_codes", [] ] } }
        ]}
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

A bit more involved for earlier versions:

db.collection.aggregate([
    { "$unwind": "$bill_codes" },
    { "$group": {
        "_id": { 
            "_id": "$_id",
            "bill_code": "$bill_codes"
        },
        "origSize": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id._id",
        "bill_codes": { "$push": "$_id.bill_code" },
        "origSize": { "$sum": "$origSize" },
        "newSize": { "$sum": 1 }
    }},
    { "$project": {
        "bill_codes": 1,
        "same": { "$eq": [ "$origSize", "$newSize" ] }
    }},
    { "$match": { "same": false } }
]).forEach(function(doc) {
    db.collection.update(
        { "_id": doc._id },
        { "$set": { "bill_codes": doc.bill_codes } }
    )
})

With the added operations in there to compare if the "de-duplicated" array is the same as the original array length, and only return those documents that had "duplicates" removed for processing on updates.


Probably should add the "for python" note here as well. If you don't care about "identifying" the documents that contain duplicate array entries and are prepared to "blast" the whole collection with updates, then just use python .set() in the client code to remove the duplicates:

for doc in collection.find():
    collection.update(
       { "_id": doc["_id"] },
       { "$set": { "bill_codes": list(set(doc["bill_codes"])) } }
    )

So that's quite simple and it depends on which is the greater evil, the cost of finding the documents with duplicates or updating every document whether it needs it or not.

This at least covers techniques.

How to remove duplicate values inside a list in MongoDB?, You can use aggregate framework along with $setUnion operator. Let us first create a collection with documents −> db.removeDuplicatesDemo  MongoDB 4.2 collection updateMany method's update parameter can also be an aggregation pipeline (instead of a document). The pipeline supports $set, $unset and $replaceWith stages. Using the $setIntersection aggregation pipeline operator with the $set stage, you can remove the duplicates from an array field and update the collection in a single operation.

You can use a foreach loop with some javascript:

db.bill.find().forEach(function(entry){
     var arr = entry.bill_codes;
     var uniqueArray = arr.filter(function(elem, pos) {
        return arr.indexOf(elem) == pos;
     }); 
     entry.bill_codes = uniqueArray;
     db.bill.save(entry);
})

How To Remove Duplicates In MongoDB, If you ever find yourself in a position where you need to remove duplicate entries on a MongoDB collection, as per version 3.0 you will need… We've altered the list for this article to include duplicate banks, so our list now includes 559 banks which can be downloaded here. After downloading the CSV file, let's import the file into our Compose MongoDB deployment. To insert the CSV file, we'll be using mongoimport to create a database called banks and a collection called list. Make sure to use the credentials for your deployment.

Mongo 3.4+ has $addFields aggregation stage, which allows you to avoid explicitly listing all the other fields in $project:

db.bill.aggregate([
    {"$addFields": {
        "bill_codes": {"$setUnion": ["$bill_codes", []]}
    }}
])

Just for reference, here is another (more lengthy) way that uses replaceRoot and also doesn't require listing all possible fields:

db.bill.aggregate([
    {'$unwind': {
        'path': '$bill_codes',
        // output the document even if its list of books is empty
        'preserveNullAndEmptyArrays': true
    }},
    {'$group': {
        '_id': '$_id',
        'bill_codes': {'$addToSet': '$bill_codes'},
        // arbitrary name that doesn't exist on any document
        '_other_fields': {'$first': '$$ROOT'},
    }},
    {
      // the field, in the resulting document, has the value from the last document merged for the field. (c) docs
      // so the new deduped array value will be used
      '$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
    },
    {'$project': {'_other_fields': 0}}
])    

dropDups() Method To Remove Duplicate Documents: MongoDB , and 1 as separate values. For an example, see Return Distinct Values for an Array Field. To perform a distinct operation within a transaction: For unsharded​  This page documents the mongo shell method, and does not refer to the MongoDB Node.js driver (or any other driver) method. For corresponding MongoDB driver API, refer to your specific MongoDB driver documentation instead. Finds the distinct values for a specified field across a single collection or view and returns the results in an array.

MongoDB 4.2 collection updateMany method's update parameter can also be an aggregation pipeline (instead of a document). The pipeline supports $set, $unset and $replaceWith stages. Using the $setIntersection aggregation pipeline operator with the $set stage, you can remove the duplicates from an array field and update the collection in a single operation.

An example:

arrays collection:

{ "_id" : 0, "a" : [ 3, 5, 5, 3 ] }
{ "_id" : 1, "a" : [ 1, 2, 3, 2, 4 ] }

From the mongo shell:

db.arrays.updateMany(
   {  },
   [
      { $set: { a: { $setIntersection: [ "$a", "$a" ] } } }
   ]
)

The updated arrays collection:

{ "_id" : 0, "a" : [ 3, 5 ] }
{ "_id" : 1, "a" : [ 1, 2, 3, 4 ] }

The other update methods, update(), updateOne() and findAndModify() also has this feature.

distinct(), Returns an array of all unique values that results from applying an expression to MongoDB determines that the document is a duplicate if another document in  In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.

$addToSet (aggregation), This page provides examples in: Mongo Shell; Compass; Python; Java (Sync); Node.js; PHP; Motor; Java (Async); C#; Perl  Install using msiexec.exe. Install with Docker. Upgrade MongoDB Community to MongoDB Enterprise. Upgrade to MongoDB Enterprise (Standalone) Upgrade to MongoDB Enterprise (Replica Set) Upgrade to MongoDB Enterprise (Sharded Cluster) Verify Integrity of MongoDB Packages. The mongo Shell. Configure the mongo Shell. Access the mongo Shell Help.

Delete Documents, We've altered the list for this article to include duplicate banks, so our list You can find duplicate values within your MongoDB database using duplicate documents that you may want to keep, merge, or delete altogether. In naive method, we simply traverse the list and append the first occurrence of the element in new list and ignore all the other occurrences of that particular element. # Python 3 code to demonstrate. # removing duplicated from list. # using naive methods. # initializing list. test_list = [1, 3, 5, 6, 3, 5, 6, 1]

Finding Duplicate Documents in MongoDB, We need to take a list, with duplicate elements in it and generate another list which only contains the element without the duplicates in them. Examples: Input : [2, 4,  While MongoDB supports an option to drop duplicates, dropDups, during index builds, this option forces the creation of a unique index by way of deleting data. If you use the dropDups option,

Comments
  • This does not save back to the collection. I mean doing db.bill.find({}) again retrieves the duplicate value
  • @user567797 Your question did not state that you wanted to change the stored documents. This was answering in terms of "display only". You would have to process the results and update each document individually where the items are actually changed. Added an explanation of how to do this and identify the documents that had duplicates removed from the array, so you don't need to update every document in the collection.
  • How can I do this with the second query as I am using mongo version 2.4 .Also note that your update code has a missing comma.
  • @user567797 Very possible added the way to do this and corrections. MongoDB 2.4 is quite old really and you should considering upgrading at least to 2.6.x ( and you would have to before upgrading further ). There are many advantages to doing this, including "Bulk updates" to make this work even faster.