Problems with reading unicode-encoded txt files by python pandas.dataframe

I have a txt file encoded in unicode,

clipboard.png


:



1.with open("STK_MKT_ValuationMetrics.txt","rb") as f:
:"utf-8" codec can"t decode byte 0xff in position 0: invalid start byte
2.with open("STK_MKT_ValuationMetrics.txt","rb",encoding="utf-8") as f:
:binary mode doesn"t take an encoding argument
3.with open("STK_MKT_ValuationMetrics.txt","r",encoding="utf-8") as f:
:"utf-8" codec can"t decode byte 0xff in position 0: invalid start byte

txtutf-8:


00000066:

clipboard.png

how should this be solved? If the solution can only read the original file without changing the encoding format, I would feel better, if not, then I can only manually convert to utf-8 to read, so how should I deal with these omitted zeros? Thank you for the answer!


pd.read_table('filename', sep='', encoding="utf-8", dtype={'1': np.1, '2': np.2})

someone in Zhihu has encountered the same problem. The answer is as follows:

-sharp!/usr/bin/python3
-sharp -*- coding:utf8 -*-
import codecs

open("filename",'w',encoding="utf8")
Menu