How does Xpath match multiple times in one range?

The

code is a little messy, but that"s what the original page looks like. You can change it

.
<li class="list__item"><div class="list__title">The world this week</div><a itemProp="url" class="link-button list__link" href="/node/21752687"><span class="print-edition__link-flytitle">Print-edition redesign</span><span class="print-edition__link-title">Introducing our new look</span></a><a itemProp="url" class="link-button list__link" href="/node/21752688"><span class="print-edition__link-title-sub">Politics this week</span></a><a itemProp="url" class="link-button list__link" href="/node/21752686"><span class="print-edition__link-title-sub">Business this week</span></a><a itemProp="url" class="link-button list__link" href="/node/21752685"><span class="print-edition__link-title-sub">KAL" s cartoon</span></a></li><li class="list__item"><div class="list__title">Leaders</div><a itemProp="url" class="link-button list__link" href="/node/21752616"><span class="print-edition__link-flytitle">Politics and power</span><span class="print-edition__link-title">China v America</span></a><a itemProp="url" class="link-button list__link" href="/node/21752617"><span class="print-edition__link-flytitle">Germany</span><span class="print-edition__link-title">Not so grand</span></a><a itemProp="url" class="link-button list__link" href="/node/21752619"><span class="print-edition__link-flytitle">Criminal justice</span><span class="print-edition__link-title">Against pessimism</span></a><a itemProp="url" class="link-button list__link" href="/node/21752620"><span class="print-edition__link-flytitle">Oil markets</span><span class="print-edition__link-title">Beyond boom and bust?</span></a><a itemProp="url" class="link-button list__link" href="/node/21752618"><span class="print-edition__link-flytitle">In praise of the basics</span><span class="print-edition__link-title">Captain Sensible</span></a></li>

after the browser renders, it looks like this:
clipboard.png

is like this. Each < li class= "list__item" > is a piece of content containing a < div class= "list__title" > and N li tags, and the li tag has N hyperlinks.
question:
I need to take out < div class= "list__title" > and li tags in the same < li class= "list__item" > tag href= "/ node/21752688" > . The effect is like this (type is not necessarily list,). This is to look more intuitive):
["The world this week","/node/21752687","/node/21752688","/node/21752685"], [" Against pessimism","/node/21752618","/node/21752617","/node/21752183"].

Sep.16,2021

xpath supports or relationships, as long as you select multiple expressions at the same time: |

since your two elements extract different things, I suggest you use multiple xpath expressions to extract:

extract text attribute of div:

/path/to/li/a/@href

of course, the same as above, you can only extract to the Element level, and then use the extraction attribute method corresponding to python to get the href attribute. Do not deal with this problem in xpath.

finally, the results extracted from these two xpath expressions can be concatenated

.
  • Problem parsed by xpath

    crawls the movie of Douban, saying that the tag of each movie is parsed into list, but after traversing the list, it is found that every element in the list is the same . movies = selector.xpath( *[@id="content"] div div[1] ol li ) -sharp...

    Mar.04,2021
  • Use xpath to get the value of the node as None

    want to crawl http: 47.99.86.238 portal li. the data of this website, use scrapy, to set everything else, only one value to get is None, please take a look at it. I can match the value using the plug-in myself, but the result printed by storyMale in...

    Aug.20,2021
Menu