Scrapy cannot extract the next page

problem description

cannot get the next page

related codes

/ / Please paste the code text below (do not replace the code with pictures)

import scrapy
from qsbk.items import QsbkItem
from scrapy.http.response.html import HtmlResponse
from scrapy.selector.unified import SelectorList

class QsbkSpiderSpider (scrapy.Spider):

name = "qsbk_spider"
allowed_domains = ["qiushibaike.com"]
start_urls = ["https://www.qiushibaike.com/text/page/1/"]
base_domain = "https://www.qiushibaike.com/"

def parse(self, response):
    duanzidivs = response.xpath("//div[@id="content-left"]/div")
    for duanzidiv in duanzidivs:
        author = duanzidiv.xpath(".//h2/text()").get().strip()
        content = duanzidiv.xpath(".//div[@class="content"]//text()").getall()
        content = "".join(content).strip()

        item = QsbkItem(author=author,content=content)
        yield item

    next_url = response.xpath("//ul[@class="pagination"]/li[last()]/a/@href").get()
    if not next_url:
        return
    else:
        yield scrapy.Request(self.base_domain+next_url,callback=self.parse)

an error occurs from here, the previous page is extracted normally, and an error occurs on the second page?

2019-02-23 14:07:32 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting to < GET from < GET https://www.qiushibaike.com//...;
2019-02-23 14:07:35 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying https://text/>; (failed 1 times): DNS lookup failed: no results for hostname lookup: text.
2019-02-23 14:07:37 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying < GET https://text/>; (failed 2 times): DNS lookup failed: no results for hostname lookup: text.
2019-02-23 14:07:40 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying < GET https://text/>; (failed 3 times): DNS lookup failed: no results for hostname lookup: text.
2019-02-23 14:07:40 [scrapy.core.scraper] ERROR: Error downloading https://text/>;
Traceback (most recent call last):
File" D:venvarticle_spiderlibsite-packagestwistedinternetdefer.py ", line 1416, in _ inlineCallbacks

result = result.throwExceptionIntoGenerator(g)

File "D:venvarticle_spiderlibsite-packagestwistedpythonfailure.py", line 491, in throwExceptionIntoGenerator

return g.throw(self.type, self.value, self.tb)

File "D:venvarticle_spiderlibsite-packagesscrapycoredownloadermiddleware.py", line 43, in process_request

defer.returnValue((yield download_func(request=request,spider=spider)))

File "D:venvarticle_spiderlibsite-packagestwistedinternetdefer.py", line 654, in _ runCallbacks

current.result = callback(current.result, *args, **kw)

File "D:venvarticle_spiderlibsite-packagestwistedinternetendpoints.py", line 975, in startConnectionAttempts

"no results for hostname lookup: {}".format(self._hostStr)

twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: text.

problem description

the platform version of the problem and what methods you have tried

related codes

/ / Please paste the code text below (do not replace the code with pictures)

what result do you expect? What is the error message actually seen?

Jul.05,2022
Menu