Python uses regular extraction of html text content, how to get all the results of multi-segment matching

encountered when cleaning web page data, how to extract all the contents if there are multiple target objects in a piece of html text.

for example, the following paragraph

<span style="mso-spacerun:"yes";font-family:;mso-ascii-font-family:Calibri;mso-hansi-font-family:Calibri;mso-bidi-font-family:"Times new roman";font-size:10.5000pt;mso-font-kerning:1.0000pt;">
<font face=""></font></span>

want to extract the part of Chinese characters.

current scheme

use regular expressions to fully match. The specific code is as follows (partially intercepted):

import re
s = """
<span style="mso-spacerun:"yes";font-family:;mso-ascii-font-family:Calibri;mso-hansi-font-family:Calibri;mso-bidi-font-family:"Times new roman";font-size:10.5000pt;mso-font-kerning:1.0000pt;">
<font face=""></font></span>
"""
rs = re.findall(r"(?<=(>))[\d\D]*?(?=(<))", s, re.M)
for item in rs:
    print item

result

the output is as follows, which is not the result you want

(">", "<")
(">", "<")
(">", "<")

Don't use regularization, BeautifulSoup is much better for html

from bs4 import BeautifulSoup
s = '''
<span style="mso-spacerun:'yes';font-family:;mso-ascii-font-family:Calibri;mso-hansi-font-family:Calibri;mso-bidi-font-family:'Times new roman';font-size:10.5000pt;mso-font-kerning:1.0000pt;">
<font face=""> </font> </span>
'''
clean_text = BeautifulSoup(s,"lxml").get_text()
print(clean_text)

output

We strolled into a small courtyard with a rich rural flavor, which was clean and tidy. The yard is neatly covered with golden corn, even corn bones are neatly lined up, red chili peppers are hanging on both sides of the door, chickens, dogs and cats are walking leisurely in the courtyard, and there are two chicken nests on the chicken house. there happens to be an egg in one of the henhouses, and all kinds of flowers such as hydrangeas are in full bloom. The owners of the courtyard are all in their eighties. The master is 83 and the hostess is 85. They are still grabbing corn and seeing us burst into the yard. Instead of being nervous, they are very enthusiastic. They invite us to sit down and plan to pour us Scald. We just keep saying no. The two old men, unhurried and slow, have never stopped. According to them, most of their children and grandchildren are now independent and promising. Seeing such a clean and clean small courtyard full of warm life, it must be the life of the elderly who is full of pursuit and interest to create the beauty of all this.
Menu