The problem of Chinese garbled code in Python

recently, I have been using Python to process bus GPS track data, but I have encountered some coding problems, which are very annoying. The original data is in .gz format. After I decoded it on windows, I converted it into a file in .txt format, but the file contains a lot of garbled code. No matter how to convert the code, it doesn"t work, as shown below:

this is the .gz file and the decompressed file:
clipboard.png

txt:
clipboard.png

txtutf-8gbkgb2312
clipboard.png

clipboard.png

so I would like to ask everyone:
1 what code can correct the Chinese garbled in the .gz format file
2 I am now dealing with the data line by line, is there any way to identify that the string contains garbled?

Feb.27,2021

there is also a kind of Base64 code. Try


this is a cross-platform coding problem. This gz package should be packed with utf8 code on the linux server, but when you decompress it locally, it is gbk
. You can find a liunx server to extract it to see if it is normal. If it is normal, find a decompression software that can solve the cross-platform coding
or try to repackage it into zip


on the linux server.

if you are sure the source file is OK, extract the .gz file under linux or with the tar tool in cygwin or git. Do not use WinRAR to extract the file. Sometimes there will be some problems. Then use notepadPP or Visual Studio Code to open the extracted text to see if you can find the correct encoding. First make sure that there is no problem with the source file, and then manually determine the encoding, do not use python to solve one by one.

to judge whether there is garbled code, the simple way is to determine whether the string contains the "Jian Jian copy" string. It is reasonable that there is no bus stop called Jijin copy station.

here comes the handcuff. Next stop, hot:)
Menu