Find text encoding

3/7/2023

I can see some ways to fix it, but it will take some labor.

This is a second case of a failure like that I faced with at CodeProject.

It looks like you failed to use this or any other mechanism, with fatal consequences. This is a certain number of bytes called BOM ( Byte Order Mark) different for each UTF which allows for unambiguous detection of UTF encoding. The part of the Unicode standard dedicated to UTFs suggests a standard mechanism of keeping the encoding information with string data. You can think of this piece of information as of the reference to a concrete run-time Encoding class. The serialized Unicode string can be represented as two components: an array of bytes which can be obtained from (string) and the information of the encoding itself.

This mistake is theoretically not reversable, in the same sense that the entropy of a closed system cannot be reduced. You have done a fatal mistake and increased the entropy of the system. Strictly speaking, there is no a regular 100% certain way to tell the UTF from the encoded array of bytes.

0 Comments

Find text encoding

Leave a Reply.

Author

Archives

Categories