unicode - C# char/byte encoding equality -


i have code dump strings stdout check encoding, looks this:

    private void dumpstring(string s)     {            system.console.write("{0}: ", s);         foreach (byte b in s)         {                system.console.write("{0}({1}) ", (char)b, b.tostring("x2"));         }                system.console.writeline();     } 

consider 2 strings, each of appear "ë", different encodings. dumpstring produce following output:

ë: e(65)(08)
ë: ë(eb)

the code looks this:

dumpstring(string1); dumpstring(string2); 

how can convert string2, using system.text.encoding, byte equivalent string1.

they don't have different encodings. strings in c# utf-16 (thus, shouldn't use byte iterate on strings because you'll lose top 8 bits). have different normalization forms.

your first string "\u0065\u0308": latin small letter e + combining diaeresis. decomposed form (nfd).

the second "\u00eb": latin small letter e diaeresis. precomposed form (nfc).

you can convert between them string.normalize.


Comments

Popular posts from this blog

ASP.NET/SQL find the element ID and update database -

jquery - appear modal windows bottom -

c++ - Compiling static TagLib 1.6.3 libraries for Windows -