Why does the sum note in a string become the corresponding ASCII character after being removed?

There is an example of this in

python3-cookbook :

>>> import unicodedata
>>> import sys
>>> cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode)
...                         if unicodedata.combining(chr(c)))
>>> a = "pt is awesome\n"
>>> b = unicodedata.normalize("NFD", a)
>>> b
"pt is awesome\n"
>>> b.translate(cmb_chrs)
"python is awesome\n"
>>>

cmb_chrs the value corresponding to each key is None , so why can you get the string python is awesome\ n after executing b.translate (cmb_chrs) ?

Apr.25,2021

if you execute:

print([ord(x) for x in a])
-sharp [112, 253, 116, 293, 246, 241, 32, 105, 115, 32, 97, 119, 101, 115, 111, 109, 101, 10]
print([ord(x) for x in b])
-sharp [112, 121, 769, 116, 104, 770, 111, 776, 110, 771, 32, 105, 115, 32, 97, 119, 101, 115, 111, 109, 101, 10]

you will find that although a and b print out the same, the internal encoding is different. The reason is that unicode.normalize takes apart all the tonal characters. The Unicode code of the tone symbol is recorded in cmb_chrs . After executing b.translate , the tone is naturally gone.


Magic unicode :

>>> list(a)
['p', '', 't', '', '', '', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e', '\n']
>>> list(b)
['p', 'y', '', 't', 'h', '', 'o', '', 'n', '', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e', '\n']
Menu