How to select an index when querying with mongo

problem description

There are two indexes in

mongo, the first is a single-field index of the {_ id} field, and the second index is a composite index composed of {chat_id, _ id}. When I use the query
db.collection.find ({"chat_id": ObjectId ("*"), "the_time": {"$lt": ISODate ("2010-01-01T00:00:00Z")}}). Sort ({"the_time":-1}). Explain ("executionStats") , why do I get the following results

......
"winningPlan" : {
            "stage" : "SORT",
            "sortPattern" : {
                "the_time" : -1
            },
            "inputStage" : {
                "stage" : "SORT_KEY_GENERATOR",
                "inputStage" : {
                    "stage" : "FETCH",
                    "filter" : {
                        "the_time" : {
                            "$lt" : ISODate("2010-01-01T00:00:00Z")
                        }
                    },
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "keyPattern" : {
                            "chat_id" : 1,
                            "_id" : 1
                        },
                        "indexName" : "chat_id_1__id_1"
......

Why is a composite index composed of {chat_id, _ id} used?

the composite index I"ve learned about is like this

The
index prefix refers to a subset of the composite index


{ "item": 1, "location": 1, "stock": 1 }


{ item: 1 }
{ item: 1, location: 1 }

MongoDB
        item
        item + location
        item + location + stock
        item + location()


    location
    stock
    location + stock

question 1: is it possible that in the composite index {chat_id, _ id}, even if the query used is {chat_id, the_time}, the the_time field is not in the composite index, but the chat_id is in the composite index, so will you take this composite index?

question 2: in addition, I added an extra index {chat_id,third_field} and still queried {chat_id,the_time}. The final analysis result is the {chat_id,third_field} index, but not the {chat_id, _ id} index. What is the strategy of mongo in index selection?

Sep.08,2021

< H2 > question 1 < / H2 >
  1. if the query can hit the index, it can directly give the addresses of all documents that meet the criteria. Since (IXSCAN), gets the address, not the document itself, it requires an extra step to find the actual document (FETCH);
  2. from the address.
  3. if the query does not have index support, it can only load all the data that may meet the conditions into memory, and then compare whether the conditions are met one by one, and finally get the result set (COLLSCAN);
  4. .

if you take the second path, it is obviously quite resource-consuming and time-consuming, so all our queries should hit the index as much as possible, or partially hit the index. Your question may be that you don't understand what's going on with a partial hit index. Suppose your collection has 100w records, query criteria {"chat_id": ObjectId ("*"), "the_time": {"$lt": ISODate ("2010-01-01T00:00:00Z")} below:

  • the worst-case scenario is that there is no index support, so you need to traverse 100w records to see who meets these two conditions to get the result set;
  • but now there is {chat_id: 1, _ id: 1} , although it does not fully satisfy your query, but the first condition is satisfied. Suppose this condition helps you Filter 90w records (there are only 10w left), then the rest of the conditions only need to be traversed in these 10w, is it better than traversing 100w?

if you choose {_ id: 1} , it will not help the query, as in the first case; if you choose {chat_id: 1, _ id: 1} , it will at least help, so why not choose the latter?

< H2 > question two < / H2 >

the first thing to understand is that even if the same condition is run twice with the same index, the execution time is not necessarily the same, because the pressure on the server is not necessarily the same. Then the two indexes you mentioned {chat_id,third_field} and {chat_id,the_time} have almost no difference in their effect on your execution conditions. Objectively speaking, if the two indexes are the most efficient, you will always choose a faster one. Once the faster one is selected, the execution plan cache will ensure that it is always used before restart, rather than evaluating it every time (waste of resources). So it doesn't make any sense to compare the two here.

< H2 > other < / H2 >
db.collection.find({"chat_id" : ObjectId("*********"), "the_time":{"$lt" : ISODate("2010-01-01T00:00:00Z")}}).sort({"the_time":-1}).explain("executionStats")

the best index to satisfy this query should be {chat_id: 1, the_time:-1}

.
Menu