How to use beautiful soup to crawl the movie name and link in the source code of the following web page

use python3 bs4 to climb the latest movie of movie paradise http://www.dytt8.net/
, but crawl out is web data, very messy, you can use soup.findAll to directly find the link tag to extract import urllib.request
from bs4 import BeautifulSoup
html = urllib.request.urlopen (" http://www.dytt8.net/")
bsObj = BeautifulSoup (html,"html.parser")
a = bsObj.findAll ("div", {" class":"co_content8"})
list1 = []
for i in a:

j = i.findAll("a")
print(type(j))
print("-sharp-sharp-sharp")
print(list1.append(str(j)))

print (" list1 is:", list1)
print (type (list1)
print (len (list1)
for n in list1:

print(n.split(","))

part of the source code of the web page is as follows:

Mar.11,2021

CSS selection or xpath, pyjquery
traditional rules are recommended. Findall is also acceptable, but the effect is not good.


first locate the ul, under div, then use findall to extract each
under ul, and then extract the href attribute of each

.

you can take a look at the article , which is about the use of BeautifulySoup. I hope it will be helpful

.

try whether the code is feasible


<blockquote>soup.findAll<a herf=<br><strong> hrefhtml</strong>
</blockquote>
-sharp -*- coding: utf-8 -*-

import urllib.request,re
from bs4 import BeautifulSoup
html = urllib.request.urlopen('http://www.dytt8.net/')
bsObj = BeautifulSoup(html,'html.parser')

bsObj1 = bsObj.find_all('a',href=re.compile('/html'))
for i in bsObj1:
    print (i['href'],i.string)

Menu