Weight removal of MongoDB - Codes Helper - Programming Question Answer

Weight removal of MongoDB

I have written a crawler. When writing the mongo class insert function, I want to add a judgment statement to avoid crawling data to load data repeatedly, but from the actual running situation, it is not feasible. Please take a look at the code

first.

class MongoPipeLine():

    def __init__(self):
        self.client = pymongo.MongoClient(SETTINGS.MONGO_URI,connect=False)
        self.db = self.client[SETTINGS.MONGO_DB]
        self.collection = SETTINGS.COLLECTION

    def insert(self,data):
        if self.db[self.collection].find(data) == 1:
            print("Data has been existed")
        else:
            self.db[self.collection].insert(data)

    def close(self):
        self.client.close()

scheduling function:

spider = Spider()
mongo = MongoPipeLine()
image = ImagePipeLine()

def run(i):
    for page in range(i,i+20):
        response = spider.get_page(page)
        data_list = spider.parse(response)
        for data in data_list:
            mongo.insert(data)
            image.download(data)

if __name__ == "__main__":
    pool = Pool(20)
    pool.map(run,[i*20 for i in range(10)])
    pool.close()
    pool.join()
    mongo.close()

partial screenshot of run result):

Isaiah Rustad Already download
Annie Spratt Already download
Alex Kalinin Already download
Jakob Owens Already download
Emily Henry Already download

you can see from the result that although some images have been downloaded and the corresponding entries are saved in MongoDB, the output from the command side does not have the result of "Data has been existed"". Obviously, the operation after the statement is not performed.
strangely, after I change the judgment statement to"if self.bb [self.collection]. Find (data) = = 0 (data)", the output is the same as the original output, and there is still no "Data has been existed"
ask if there are any low-level mistakes that I haven"t considered?

Thank you first and wish you all a happy National Day!

Python mongodb

Jul.23,2021

it's strange why find has a range of 0 or 1. Shouldn't you use countDocuments to get the quantity? You output the results of the query.
then what query criteria do you use to judge duplicates? Check to see if there is something wrong with your query conditions.
are you concurrent? do you consider the data inconsistency caused by concurrency? For example, one thread inserts A data, and another thread queries whether there is A data, but in fact, the first thread has not successfully inserted the data.

Previous: SpringBoot1.5 integrates velocity newspaper ResourceNotFoundException

Next: Thinkphp3 takes the module name in URL as the controller name and has been working on it for a long time. I ask for your help.

The loop of cursor in pyMongo is very slow, is there any effective way to solve it?
there are 200000 data in the data, and each data has a typical list of words of 5000 length [{aVl1}, {bju 2},.], and typical dictionaries of 5000 length {{aRom 1}, {bjr 2},.}, use pyMongo to find100 data, the traversal of cursor is particularly slow, is...

Python mongodb pymongo database database-performance-optimization

Feb.28,2021
How Mongodb merges subdocuments through aggregation
in my collection, has related different documents, and only one field is different between them, assuming field . how to merge different related documents into one document and turn this field into an array of fields [], is it done through aggregat...

Python mongodb

Mar.04,2021
How to count documents after pymongo aggregate query?
how to count the total number of documents after pymongo aggregation query? related codes match1 = { $match : { regDate : {"$gte": datetime(2018, 6, 1), "$lt": datetime(2018, 6, 30)}}} lookup = { $lookup : { ...

Python mongodb

Apr.01,2021
Mongodb transaction problem
mongodb involves updating multiple tables and has no transaction. How to deal with it ...

CPP java golang python mongodb

Apr.11,2021
When pymongo inserts data in batches, it is no longer carried out, and there is no end without error.
-sharp ! usr bin python3 -sharp encoding=utf-8 import db from pymongo import MongoClient user = db.get_db_user() recharge = db.get_db_recharge() consume = db.get_db_consume() client = MongoClient( localhost , 27017) db = client.test col_new = db.new ...

Python mongodb

May.12,2021
Ask: Python MongoDB to insert multiple records (arrays), but not if they already exist.
problem description how do I insert an array into Collection? If item_id exists, do not insert? for example: item_arr = [ { item_id : 1 , title : AAA }, { item_id : 2 , title : BBB } ] I would ...

Python mongodb

May.27,2021
Yield in scrapy throws, but does not execute
problem description scrapy gives parse, different methods for the same url. Some can execute, and some cannot . the environmental background of the problems and what methods you have tried related codes urls=[ http: maoyan.com xseats 20180930015...

Python mongodb

Jul.27,2021
Mongodb always says that there is no createindex method for indexing.
index = self.xiaoMiQuanCollection.createIndex ({ "unique_charcter ": 1, "name ": hashed }, { "unique ": True}) res = self.xiaoMiQuanCollection.find (index) TypeError: Collection object is not callable. If you meant to call the createIndex ...

Python mongodb

Sep.30,2021
How mongodb modifies data
mongodb cookie cookie db.coolection.update({"cookie":"kr_stat_uuid=s83pP25734456; download_animation=1; device-uid=9447e910-f90a-11e8-85f3-853e28e448cb; kr_plus_id=643809727; krnewsfrontss=e88ee1dafbc74062e5286b2ad050d1c2; M-XSRF-TOKEN...

Python mongodb

Mar.06,2022
When averaging by mongodb, how to specify a few digits after the decimal point?
there is such a set of data nationHOLwage avg23 ...

Python mongodb

Mar.29,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-38371d3-84f0.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-38371d3-84f0.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?