Scrapy cannot download pictures using custom Pipeline

the crawler cannot download pictures after Scrapy uses a custom Pipeline class that inherits ImagesPipeline

use Python 3.7 environment and Scrapy crawler framework to crawl and download pictures on the web page. You can download them normally using the built-in ImagesPipeline, but only output the link address of the picture on the command line after using the custom Pipeline class, but cannot download the picture to the local

.

related codes

items.py

class ImageItem(scrapy.Item):
    -sharp
    -sharpimage_names = scrapy.Field()
    -sharp
    -sharpfold = scrapy.Field()
    -sharp
    -sharpimage_paths = scrapy.Field()
    -sharp
    image_urls = scrapy.Field()

pipelines.py

class PicPipeline(ImagesPipeline):
    def process_item(self, item, spider):
        return item

    def get_media_requests(self, item, info):
        for image_url in item["image_urls"]:
            yield scrapy.Request(image_url)

    def item_completed(self, results, item, info):
        image_paths = [x["path"] for ok, x in results if ok]
        if not image_paths:
            raise DropItem("Item contains no images")
        item["image_paths"] = image_paths
        return item

settings.py

ITEM_PIPELINES = {
    "Pic.pipelines.PicPipeline": 300
}
IMAGE_STORES= "D:\Pic"
IMAGES_URLS_FIELD = "image_urls"

PicSpider.py

class PicSpider(scrapy.Spider):
    name = "picspider"
    allowed_domains = ["guokr.com"]
    start_urls = ["https://www.guokr.com/"]

    def parse(self, response):
        images = response.xpath("//img/@src").extract()
        item = ImageItem()
        item["image_urls"] = images
        yield item

just want to write a small Demo to practice crawling pictures, use the built-in ImagesPipeline to download pictures normally, but custom Pipeline can not be downloaded, in the command line output image link address, there is no, please advise.

the command line output is as follows

2019-02-19 12:11:06 [scrapy.core.engine] INFO: Spider opened
2019-02-19 12:11:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-02-19 12:11:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-02-19 12:11:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.guokr.com/robots.txt> (referer: None)
2019-02-19 12:11:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.guokr.com/> (referer: None)
2019-02-19 12:11:06 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.guokr.com/>
{"image_urls": ["https://3-im.guokr.com/vXVhDq_6nindVo2LqIloosK-2bHrkYpU8DEXP75DpnZKAQAA6wAAAEpQ.jpg",
                "https://2-im.guokr.com/hD7RoVC8IpQGnc2humofXMGyex-iSZH1VDaWLq2VWCE2BAAA7gMAAEpQ.jpg?imageView2/1/w/330/h/235",
                "https://2-im.guokr.com/AU-Q8pTYY_OffTqWyKfXTC5NV0RmarK_QJ9m6A6_7qhKAQAA6wAAAEpQ.jpg",
                "https://1-im.guokr.com/IIlEodManGB8jos3eP7KcrMhu3l8dtG6F5nrJczcrTiAAwAAUwIAAEpQ.jpg?imageView2/1/w/330/h/235",
                "https://1-im.guokr.com/klfXUFzwXV_jz42yk497oZ-RkLAJEc03spAKMg9AeIw4BAAADQMAAEpQ.jpg?imageView2/1/w/330/h/235",
                "https://1-im.guokr.com/BZ7R7bpcrwjOyFJ5kajc0tVHlOF8BUyEs3IpWB0l6Q4sAgAA2AEAAEpQ.jpg?imageView2/1/w/135/h/90",
                "https://1-im.guokr.com/1CJgQkib1ePSCpLBARUhOyMdf6THL2BGrkDj6WDc5eiGAQAABAEAAEpQ.jpg?imageView2/1/w/135/h/90",
                "https://1-im.guokr.com/4prMeIXxsaF2y6OTfpCB2IiI7udvwK8f_lsTcqbFcaeHAAAAWgAAAEpQ.jpg",
                "https://1-im.guokr.com/WPrAHjwbKwXNYqiYZgkaYEyh9i2R8zm9noog_AxfpHiaAgAAvAEAAEpQ.jpg?imageView2/1/w/135/h/90",
                "https://2-im.guokr.com/TNpsKxaaNGuIDTJWTpy2P5wfji_oG66rHUWGa8L7zFhKAQAAtQAAAFBO.png?imageView2/1/w/135/h/90",
                "https://2-im.guokr.com/gLbC7ix6NWlx3bz6ihFyOxsl_fWqwtB554NswEOmACFKAQAA8AAAAEpQ.jpg?imageView2/1/w/135/h/90",
                "https://2-im.guokr.com/Rx9MyfI6hndQBTyoGWvfOyb469BZ7ruf0w0k7V0aJ1pKAQAA6wAAAEpQ.jpg?imageView2/1/w/135/h/90",
                "https://2-im.guokr.com/-OmYOzUa0Nhm9vKimCFn2c2ZR9pHmgxqMiMxijD5KwkLAQAAngAAAFBO.png?imageView2/1/w/135/h/90",
                "https://1-im.guokr.com/mysULQspmaLPEMu-MQFZGHwaccTPPs9msjtLrYoDtGcsAQAAagAAAEpQ.jpg",
                "https://2-im.guokr.com/fSvqlLJ6wcRv8cCCc5Ehm5pgqZWg7TyiLZdEba34NTKgAAAAoAAAAEpQ.jpg?imageView2/1/w/48/h/48",
                "https://3-im.guokr.com/F9IifzSeB9OoKKIP-_2i3SnWHnUceIpmGyOMuwgRvgGgAAAAoAAAAEpQ.jpg?imageView2/1/w/48/h/48",
                "https://sslstatic.guokr.com/skin/imgs/dimensions-code.jpg?v=unknown",
                "https://3-im.guokr.com/0Al5wQUv5IAuo87evbERy190Y83ENmP9OpIs8Stm2lMUAAAAFAAAAFBO.png"]}
2019-02-19 12:11:06 [scrapy.core.engine] INFO: Closing spider (finished)
2019-02-19 12:11:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{"downloader/request_bytes": 434,
 "downloader/request_count": 2,
 "downloader/request_method_count/GET": 2,
 "downloader/response_bytes": 12316,
 "downloader/response_count": 2,
 "downloader/response_status_count/200": 2,
 "finish_reason": "finished",
 "finish_time": datetime.datetime(2019, 2, 19, 4, 11, 6, 755334),
 "item_scraped_count": 1,
 "log_count/DEBUG": 3,
 "log_count/INFO": 9,
 "response_received_count": 2,
 "robotstxt/request_count": 1,
 "robotstxt/response_count": 1,
 "robotstxt/response_status_count/200": 1,
 "scheduler/dequeued": 1,
 "scheduler/dequeued/memory": 1,
 "scheduler/enqueued": 1,
 "scheduler/enqueued/memory": 1,
 "start_time": datetime.datetime(2019, 2, 19, 4, 11, 6, 142378)}
2019-02-19 12:11:06 [scrapy.core.engine] INFO: Spider closed (finished)
Jun.24,2022

Custom pipeline does not need to rewrite process_item, you rewrite it and do nothing

Menu