How to query data efficiently in large mongodb collection?

mongodb query large data
mongodb large collection performance
how to check mongodb performance
mongodb query taking long time
mongodb improve query performance
mongodb aggregation optimization
mongodb millions of documents
mongodb performance tuning

I have one big mongodb collection (3-million docs, 50 GigaBytes), and it would be very slow to query the data even I have created the indexs.

db.collection.find({"C123":1, "C122":2})

e.g. the query will be timeout or will be extreme slow (10s at least), even if I have created the separate indexes for C123 and C122.

Should I create more indexs or increase the physical memory to accelerate the querying?

For such a query you should create compound indexes. One on both fields. And then it should be very efficient. Creating separate indexes won't help you much, because MongoDB engine will use first to get results of first part of query, but second if is used won't help much (or even can slow down in some cases your query because of lookup in indexes table and then in real data again). You can confirm used indexes by using .explain() on your query in shell.

See compound indexes:

Also consider sorting directions on both your fields while making indexes.

Optimize Query Performance, db.posts.createIndex( { author_name : 1 } ). Indexes also improve efficiency on queries that routinely sort on a given field. Example. If you regularly issue a query​  One on both fields. And then it should be very efficient. Creating separate indexes won't help you much, because MongoDB engine will use first to get results of first part of query, but second if is used won't help much (or even can slow down in some cases your query because of lookup in indexes table and then in real data again).

Use skip and limit. Run a loop for 50000 data at once . example :

    $group: {
      _id: "$myDoc,homepage_domain",
      count: {$sum: 1},
      entry: {
        $push: {
          location_city: "$myDoc.location_city",
          homepage_domain: "$myDoc.homepage_domain",
          country: "$",
          employee_linkedin: "$myDoc.employee_linkedin",
          linkedin_url: "$myDoc.inkedin_url",
          homepage_url: "$myDoc.homepage_url",
          industry: "$myDoc.industry",
          read_at: "$myDoc.read_at"
  }, {
    $limit : 50000
  }, {
    $skip: 50000
  allowDiskUse: true

7 Simple Speed Solutions for MongoDB, Are your MongoDB queries fast and effective regardless of database size? The log file can be large, so you may want to clear it before profiling. examined greatly exceeds the number returned, the query may not be efficient. Query Performance and db.collection.explain() in the MongoDB manual. To query data from MongoDB collection, you need to use MongoDB's find () method. The basic syntax of find () method is as follows − >db.COLLECTION_NAME.find () find () method will display all the documents in a non-structured way. The pretty () Method. To display the results in a formatted way, you can use pretty () method.

The answer is really simple.

  1. You don't need to create more indexes, you need to create the right indexes. Index on field c124 won't help queries on field c123, so no point in creating it.

  2. Use better/more hardware. More RAM, more machines (sharding).

Fast Queries on Large Datasets Using MongoDB and Summary , Fast Queries on Large Datasets Using MongoDB and Summary Documents. Posted on May And, we decided to use MongoDB to store the summary data. Dealing with document update conflicts would have been a nightmare. It is much more powerful and efficient than the MapReduce functionality. MongoDB scaling and memory usage with very large data sets. I am currently working on a MongoDB based system that will store at least a billion documents. This will increase by around 50 million each month. The id of the main collection is of the form YYYYMM_SOURCEID_DOCTYPE_UUID and serves as the shard index.

Scaling Crittercism to 30000 Requests Per Second and , A number of factors can negatively affect MongoDB performance - inappropriate While the most efficient schema design is the One-to-Many you have to do at least two queries to fetch or select data in the second collection. If the array of documents to be embedded is large enough, don't embed them. You can absolutely shard data in MongoDB (which partitions across N servers on the shard key). In fact, that's one of it's core strengths. There is no need to do that in your application. For most use cases, I would strongly recommend doing that for 6.6 billion documents.

MongoDB: The Good, The Bad, and The Ugly, Querying large collections efficiently. Imagine you Indexes are data structures that allow databases to quickly find documents in a collection. The MongoDB Compass Find operation opens a cursor to the matching documents of the collection based on the find query. For more information on sampling in MongoDB Compass, see the Compass FAQ . The pymongo.collection.Collection.find() method returns a cursor to the matching documents.

How to Optimize Performance of MongoDB, I've been a database person for an embarrassing length of time, but I only started working with MongoDB doesn't have a query optimizer, so you have to be very careful how you order the query operations. Creating collections with large documents Remember efficient querying needs indexes. In Compass, use the left navigation panel to select the database and the collection you want to import the data to. Click the Documents tab. Click Add Data and select Insert Document. Ensure that View is set to JSON, or {}, and paste the copied JSON documents in the field. Click Insert.

Improving Mongo performance by managing indexes, For writing and building db.collection.find() queries, I like to use either The MongoDB find method doesn't actually return the data despite the  Important for strict data integrity (often combined with WCs) Replica Set failovers can cause some data “rollback” “Rolledback” data is written to a json file in your dbPath Tunable read consistency “local” = default, local node read only “majority” = read from a majority of members

  • What's up with the aggregation-framework tag? The query in the question does not use it.
  • Sorry, I assumed aggregate-$match is the same as find()
  • Mongodb can merge indexes for a couple of years now, I think. Still, compound should be better.
  • Good point @SergioTulentsev, I've made an edit, I knew about merging but in my experience it is not helping much in most cases. However to be honest we should say that.
  • It seems that I should have more thought on designing the compound indexes, as there are more than 400 keys in this collection.
  • @ppn029012: 400 keys?! In this case, don't bother with compound indexes. Only maybe for the most frequent combinations (if that's a thing in your app, some combinations being significantly more frequent than others). Just get more hardware.
  • You should use maximum as many as you have in your queries, not more. If you quering on 2 fields, don't create 400. Maximum indexes per collection is 64 anyway. You can create indexes in background if you don't want to put down servers.
  • The problem is that mongodb can not finish the query even if I have created the right indexes for each key. Should I have to buy a better hardware to run this statement?
  • @ppn029012: The best index to serve this exact query is a compound index on two keys, as mentioned in Alan's answer. But it is quite likely, that even with it, your current hardware is just not up to the task.
  • How big the RAM do I need to operate this 50GB collection?
  • @ppn029012: Ideally, 50GB + whatever is needed for indexes. Say, another 10-15 GB. Ideally. Depending on specifics of your application, you could do with less.