The data size of the mongodb backup node data directory is inconsistent with that of the primary node.


two mongodb nodes, active / standby mode, look at the data catalogue under the two nodes. It is found that
the data directory of the primary node is 23GB, and the directory details are as follows:
clipboard.png

data11GB,:
clipboard.png

.

the amount of db.collection.stats () data is the same. What is the reason for this? Are there any relevant resources available?

Mar.15,2021

beside the point, there has been master/slave replication in the history of MongoDB (in fact, it still exists). Strictly speaking, the master and backup usually refers to that thing. What we are using now is basically the replication set (replica set).

besides, your situation is actually normal. The principle is the same as the fact that your disk will be fragmented if you use it for a long time. Especially if you have deleted data on a large scale. To explain simply, suppose you have four doc1/doc2/doc3/doc4 documents in your table, and the order in which they are stored on disk is:
doc1 | doc2 | doc3 | doc4
now that you have deleted the doc2, disk, the space usage becomes
doc1 | (blank) | doc3 | doc4
the system has no way to release this blank space unless you clean up the disk and move the blank space to the end:
doc1 | doc3 | doc4 | (blank)
then the system can truncate the white space at the end of the file. Free up this space. As you can see, moving white space to the end of the file is a time-consuming and laborious operation, and the easiest way is to move all the subsequent documents forward to fill the gaps left by doc2 (as shown above, doc3/doc4 has been moved forward). However, this involves a large number of disk Imando, which can have a serious impact on performance. Of course, there are other ways to defragment the disk, but no matter which one, it will cause a serious impact on Icano, so we don't usually do this kind of defragmentation. The way to defragment is compact command . As mentioned earlier, because it has a serious impact on performance, this is generally done only during maintenance time. And even if you don't do this, the system knows where it is blank, and when new documents come in, it will try to reuse these blank parts to maximize space utilization. However, no matter how good the algorithm, space reuse must not be 100%, because the new document can never be exactly as big as the previously deleted document, so you can only find a larger space than the new document to use. This leaves a smaller fragment that is more difficult to reuse.
another workaround is to delete the node content and do Synchronize again. Because Synchronize is equivalent to grabbing all the documents and rewriting them to disk one by one, the documents are compactly arranged on disk after Synchronize is completed, which is equivalent to defragmentation. And in this process, the slave node is affected, which does not provide services in the Synchronize process, so the impact on the online is the least. Note, however, that it will also have an impact on the primary node, because it has to read all the data on the primary node, and the increase of the Ibino of the primary node is inevitable.

finally, back to your question, why the slave node is smaller than the master node should have been explained above.

Menu