There is a br tag in the web page crawled by python with Beautifulsoup, and the returned string is None, and the attempt to use re has not been done. Ask for advice!

question: just started to practice the python crawler, crawling the web page with Beautifulsoup, the web page contains br tags, the crawling result is None.
tried whether to replace br, with the string"s replace function or return None.
tried replacing br, with re regular prompt and returned a type error.

Code:

from bs4 import BeautifulSoup

html_doc="""
<tr>
    <td>1</td>
    <td>2(<br>)</td>
    <td>3(<br/>)</td>
    <td>1<br/>        
    </td>
</tr>
"""
soup=BeautifulSoup(html_doc,"lxml")

for i in soup.find_all("td"):
    print(i.string)

(1) output result:

1
None
None
None

(2) try to replace it with re regular. The code is as follows, which indicates that the return value is of the wrong type.

re_br=re.compile("<br.*?/?>")-sharp
s=re_br.sub("\n",soup)-sharpbr

(3) converting soup to str (soup), indicates that there is no find_all attribute.

Mar.15,2021

replace
or < br/ >
in the text with the replace function
-sharp!/usr/bin/env python
-sharp -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html_doc='''
<tr>
    <td>1</td>
    <td>2(<br>)</td>
    <td>3(<br/>)</td>
    <td>1<br/>        
    </td>
</tr>
'''
soup=BeautifulSoup((html_doc.replace('<br>','')).replace('<br/>',''),'lxml')

for i in soup.find_all('td'):
    print(i.string)

Menu