Scrapy: different item will be handled by different pipeline.

problem description

how to choose different item processing according to different pipeline

the environmental background of the problems and what methods you have tried

there are multiple crawler items in a scrapy, and each crawler project has a different item,. The result of searching on the Internet is to judge the type of the received item and then execute different code. I hope to call the corresponding pipeline function after the judgment. For example, item a will be handed over to pipeline a to deal with.

related codes

/ / Please paste the code text below (do not replace the code with pictures)

from items import AspiderItem, BspiderItem, CspiderItem

class myspiderPipeline(object):
    def __init__(self):
        pass

    def process_item(self, item, spider):
        if isinstance(item, AspiderItem):
            pass
        elif isinstance(item, BspiderItem):
            return item
        elif isinstance(item, CspiderItem):
            print item
            return item
            
class AspiderPipeline(myspiderPipeline):
    def __init__(self):
        self.file = open("myadata.json", "wb")

    def process_item(self, item, spider):
        content = json.dumps(dict(item), ensure_ascii=False) + "\n"
        self.file.write(content)
        return item

    def close_spider(self, spider):
        self.file.close()
        
class BspiderPipeline(myspiderPipeline):
    pass

what result do you expect? What is the error message actually seen?

I want to know how to call the corresponding pipeline after item type determination. Instantiate the corresponding class and then call the process_item method? If so, will methods such as close_spider of this class be executed automatically?

Oct.10,2021

my understanding:

The key function of

pipeline is that "one item can be processed by multiple pipeline step by step according to the configuration in settings.py."

at each step, the pipeline modifies some contents of the item (such as repetitive checking, error data repair, etc.), or does different processing according to the data of the item (for example, some pipeline is responsible for writing the item to the log, some pipeline is responsible for writing the item to the database, and some pipeline is responsible for sending the item through the http).

if an item needs only one operation to complete, just call the member method in the pipeline that uses isinstance to determine the item type.

the routine of the subject can be written as:

class myspiderPipeline(object):
    def __init__(self):
        self.file = open('myadata.json', 'wb')

    def process_item(self, item, spider):
        if isinstance(item, AspiderItem):
            content = json.dumps(dict(item), ensure_ascii=False) + "\n"
            self.file.write(content)
            return item
        elif isinstance(item, BspiderItem):
            return item
        elif isinstance(item, CspiderItem):
            print item
            return item
            
    def close_spider(self, spider):
        self.file.close()
Menu