Build mongdob index in python code

      def process_item(self, item, spider):
            print("")
    
            print(item["file_url"], item["name"])
    
            key_word = {"file_url":item["file_url"] , "name": item["name"]}
            res = self.db.find(key_word)
    
            if list(res):
    
                print("")
    
                raise DropItem("Duplicate item found: %s" % item)
            else:
                print("*******************************************************************************")
    
                self.db.insert({"file_url": item["file_url"], "name": item["name"]})
    
                return item

my friend says it"s slow for me to find it like this, so let me build an index? Is it a _ id index?

Aug.18,2021

slowly, every time you receive an item, you first query the database to see if it exists, and then choose whether to insert it.
A piece of data you have to perform a "full table scan" similar to mysql, which is not cost-effective.

you need to ensure the speed of the query, or change the way to remove duplicates.
if your data is less than 10,000, the operation doesn't matter.


indexing is indexing a query field (frequently used field) of a collection in MongoDB, which has nothing to do with Python code, so the title is wrong


suggest you understand the basics of the database. Simply put, the index can speed up the query exponentially, especially when the amount of data is large. The principle can not be easily explained, but the basic accomplishment of a programmer should be understood. Search for the principles of database indexing and find the corresponding articles to read.

Menu