How does mongodb load index data?

clipboard.png

set up a test set one, which has 100 million documents and about 6.3G document data. A total of three indexes have been established. As shown in the figure above, the three indexes are all in the size of 1G, and the three indexes have a total size of 3.2G.

when querying this collection for the first time, the query condition does not use index fields, and a full table scan of 100 million items is performed. From the memory usage, we can see that the memory usage is constantly soaring, increasing by about 6 gigabytes.

after finishing the Mongo process, restart to query the index field, and instantly find the target document, but do not see any change in the memory usage (that is, the memory usage has increased by about 100m when starting the mongodb process), but any of the three indexes are 1G in size. Has mongodb loaded the index data into memory in the end?!?

how exactly does mongodb use index data? If it loads it into memory, why is the memory footprint basically unchanged? For the three indexes in the figure above, if the only query uses only the index query of field c, does mongodb load only 1.1g of the data of the index of field c, or load all 3.2g of the data of all three indexes directly?

Mar.03,2021

in fact, most of this is a matter of operating system principle. The operating system will put the contents of the file into free memory when reading the file, so that the next time a program tries to read the same file, it can be given directly from memory instead of reading the disk, thus greatly improving the reading speed. This cache is the file system cache.
it's easy to understand: if the memory is unused, it's a waste of space, so why not cache something in it? No matter what the cache is, you can make a profit as long as you hit it once. As for how to earn more, it depends on how you choose what is cached in the limited memory space and how you can make the cached content more hit. This part of the content has nothing to do with the problem, not to elaborate, if you are interested, you can look at the principle of the operating system.
back to your question, when you restart the MongoDB instance, the memory occupied by MongoDB has of course been freed. But both data and indexes are cached in the file system cache because they all come from data files and index files (as long as no one else wants to use the memory). The use of indexes is demand loading, which can basically be guessed from logical reasoning: suppose your 10GB index is read for the first time, do you have to wait for the 10GB index to be loaded into memory? What if the index capacity is larger than memory? So it is obviously unreasonable to load all indexes at once. Even an index is partially loaded on demand rather than all. So what you need to use is only a small part of this 1GB. Keep in mind that the time complexity of the index is one that log2 (n), needs to find out from 100 million data. in the worst case, it only needs to be queried for 27 comparisons, which comes out instantly, of course.

Menu