Ask for help: scrapy failed to crawl data and repeated debugging was unsuccessful.

goal: crawl the course information on a learning website, and early debugging only get the course name
crawler file:

< hr >

import scrapy
from xtzx.items import XtzxItem

< H1 > from scrapy.http import Request < / H1 >

class LessonSpider (scrapy.Spider):

name = "lesson"
allowed_domains = ["xuetangx.com"]
start_urls = ["http://www.xuetangx.com/courses/course-v1:TsinghuaX+80512073X+2018_T1/about"]
"""
def start_requests(self):
    ua={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"}
    yield Request("www.xuetangx.com/courses/course-v1:TsinghuaX+80512073X+2018_T1/about",headers=ua)
"""
def parse(self, response):
    item=XtzxItem()
    item["title"]=response.xpath("//div[@class="title_detail"/h3[@class="courseabout_title"]/text()").extract()
    print(item["title"])

< hr >

execution log:

2018-04-28 11:08:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: xtzx)
2018-04-28 11:08:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.9.0, Python 3.5.4 (v3.5.4br 3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.16299-SP0
2018-04-28 11:08:33 [scrapy.crawler] INFO: Overridden settings: {"SPIDER_MODULES": [" xtzx.spiders"], "BOT_NAME":" xtzx", "NEWSPIDER_MODULE":" xtzx.spiders", "USER_AGENT":" Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Rv:11.0) like Gecko"}
2018-04-28 11:08:33 [scrapy.middleware] INFO: Enabled extensions:
["scrapy.extensions.corestats.CoreStats",
" scrapy.extensions.telnet.TelnetConsole",
"scrapy.extensions.logstats.LogStats"]
2018-04-28 11:08:34 [scrapy.middleware] INFO: Enabled downloadermiddlewares:
[" scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware",
"scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware",
" scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware",
"scrapy.downloadermiddlewares.useragent.UserAgentMiddleware",
"scrapy.downloadermiddlewares.retry.RetryMiddleware",
" scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware",
"scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware",
" scrapy.downloadermiddlewares.redirect.RedirectMiddleware",
"scrapy.downloadermiddlewares.cookies.CookiesMiddleware",
" scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware",
"scrapy.downloadermiddlewares.stats.DownloaderStats"]
2018-04-28 11:08:34 [scrapy.middleware] INFO: Enabled spidermiddlewares:
[" scrapy.spidermiddlewares.httperror.HttpErrorMiddleware",
"scrapy.spidermiddlewares.offsite.OffsiteMiddleware",
" scrapy.spidermiddlewares.referer.RefererMiddleware",
"scrapy.spidermiddlewares.urllength.UrlLengthMiddleware",
" scrapy.spidermiddlewares.depth.DepthMiddleware"]
2018-04-28 11:08:34 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-04-28 11:08:34 [scrapy.core.engine] INFO: Spider opened

-there seems to be a problem from here

2018-04-28 11:08:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), Scraped 0 items (at 0 items/min)
2018-04-28 11:08:34 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1 scraped 6023
2018-04-28 11:08:34 [scrapy.core.engine] DEBUG: Crawled (200) < GET http://www.xuetangx.com/cours.:TsinghuaX+80512073X+2018_T1/about> (referer: None)
2018-04 -28 11:08:34 [scrapy.core.scraper] ERROR: Spider error processing < GET http://www.xuetangx.com/cours.:TsinghuaX+80512073X+2018_T1/about> (referer: None)
Traceback (most recent call last):
File" d:python3.5libsite-packagesparselselector.py ", Line 228, in xpath

**kwargs)

File "srclxmletree.pyx", line 1577, in lxml.etree._Element.xpath
File "srclxmlxpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__
File "srclxmlxpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
lxml.etree.XPathEvalError: Invalid predicate

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "d:python3.5libsite-packagestwistedinternetdefer.py", line 653, in _ runCallbacks

current.result = callback(current.result, *args, **kw)

File "E:pythonxtzxxtzxspiderslesson.py", line 16, in parse

item["title"]=response.xpath("//div[@class="title_detail"/h3[@class="courseabout_title"]/text()").extract()

File "d:python3.5libsite-packagesscrapyhttpresponsetext.py", line 119, in xpath

return self.selector.xpath(query, **kwargs)

File "d:python3.5libsite-packagesparselselector.py", line 232, in xpath

six.reraise(ValueError, ValueError(msg), sys.exc_info()[2])

File "d:python3.5libsite-packagessix.py", line 692, in reraise

raise value.with_traceback(tb)

File "d:python3.5libsite-packagesparselselector.py", line 228, in xpath

**kwargs)

File "srclxmletree.pyx", line 1577, in lxml.etree._Element.xpath
File "srclxmlxpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__
File "srclxmlxpath.pxi", line 227, In lxml.etree._XPathEvaluatorBase._handle_result
ValueError: XPath error: Invalid predicate in / / div [@ class="title_detail"/h3 [@ class="courseabout_title"] / text ()
2018-04-28 11:08:35 [scrapy.core.engine] INFO: Closing spider (finished) > 2018-04-28 11:08:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{"downloader/request_bytes": 301,
" downloader/request_count": 1,
"downloader/request_method_count/GET": 1,
"downloader/response_bytes": 24409,
" downloader/response_count": 1,
"downloader/response_status_count/200": 1,
" finish_reason": "finished",
" finish_time": datetime.datetime (2018, 4, 28, 3, 8, 35, 118088),
"log_count/DEBUG": 2,
" log_count/ERROR": 1,
"log_count/INFO": 7,
" response_received_count": 1,
"scheduler/dequeued": 1,
"scheduler/dequeued/memory": 1,
" scheduler/enqueued": 1,
"scheduler/enqueued/memory": 1,
" spider_exceptions/ValueError": 1,
"start_time": datetime.datetime (2018, 4, 28, 3, 8, 34, 418003)}
2018-04-28 11:08:35 [scrapy.core.engine] INFO: Spider closed (finished)

< hr >

feels that the program is very simple, but it just doesn"t work. Other items are all routine settings. There is no new content added to pipelines, and then settings modifies the value of ROBOTSTXT_OBEY.
I have looked up such errors on the Internet for a long time, but I have not found a corresponding method, and it is useless to try to crawl under the guise of a browser. Self-study, there is no teacher, there is nothing I can do. Please ask for help.

Mar.06,2021

File "srclxmlxpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result
ValueError: XPath error: Invalid predicate in / / div [@ class='title_detail'/h3 [@ class='courseabout_title'] / text ()

xpath is miswritten, missing a]


xpath. Div [@ class='title_detail' is missing here]?

item["title"]=response.xpath("//div[@class='title_detail'/h3[@class='courseabout_title']/text()").extract()

Menu