Regular expression problem?

encountered a problem of regular expressions getting key-value key-value pairs?

the following example:

      <td>dryad.dansEditIRI</td> 
          <td>https://easy.dans.knaw.nl/sword2/container/3e576bf7-26e1-404c-9cc8-bc8bd53c9591</td> 

first < td > as key, second < td > as value, how can I get it with regular expressions? Or is there any way to get such a key-value value conveniently?

Mar.25,2021

use BeautifulSoup

from bs4 import BeautifulSoup

s = """
<table>...</table>
"""

soup = BeautifulSoup(s, "lxml")

result = [{tr.find_all("td")[0].text: tr.find_all("td")[1].text} for tr in soup.find_all("tr")]

because the data is standard xml (html is a special case of xml), it can be accurately obtained with the xml analysis tool.
and there are many xml analysis tools. BeautifulSoup under bs4 is one, and so is xml.dom.minidom. (this is applicable to the python language)

in addition, other languages, such as their own xml analysis modules, are also available. Don't be limited to regular expressions (in fact, regular expressions are not suitable for dealing with such complex data)


I think using js is the most convenient way to cycle through the tr under table to get the value of the first two td, the first td as the key and the second td as the value

Menu