When crawlers, the web page is displayed in normal Chinese, but the source code is messed up.

not long after learning python, when you use selenium to climb the monk"s website (search Mercedes-Benz), and then use developer tools to look at the source code, it is as follows:

:
clipboard.png

the code to read this page is as follows (a small part):

def get_products ():

"""

"""
-sharppage_sourcestr
html = browser.page_source
doc = pq(html)
items = doc(".position .position-list li.font").items()
for item in items:
    product = {
        "name": item.find(".name").text(),
        "release_time": item.find(".release-time").text(),
        "company": item.find(".company").text(),
        "area": item.find(".area").text(),
        "info": item.find(".more").text(),
    }
    print(product)
    

then the output in the console of spyder (the anaconda3 used) looks like this, where the screenshot above corresponds to the information of "info"

.

{"name":" ue222uee04uf627 uf627uee14uee14uebe3ue321ue817 internship ue194", "release_time":" 2 days ago, "company":" Daimler Mercedes-Benz", "area":" Beijing", "info":" ue83buf591uf591-ue83buf825uf591/ days | uf825 days / week | uecb6 months"}

then write the txt file, using utf8 encoding, and find that it is still the same.
Code:

def save_to_text (product):

file = word + ".txt"
with open(file, "a" , encoding="utf-8") as k:
    for key, value in product.items():
        k.write(key + ":" + value + "\n")

Open the file:
name: / week
release_time:2
company: Daimler Mercedes-Benz
area: Beijing
info: customers / days | days / week | months

so is it still a matter of coding?

Mar.24,2021

font anti-crawler, need to parse font


anti-crawl, if you just want to practice, climb another website


try to get the encoding of the web page and then decode the result before re-encoding

url=' http://www.cea.gov.cn/publish.'
result = requests.get (url=url)
print (result.encoding)-sharp <-- get the web page encoding format

Menu