Unicode Character

Normalization

Some Unicode chracters can be (equivalently) represented with multiple binary representations.

For example, the character ắ is represented by the following code points:

U+1EAF (ắ)
U+0103 U+0301 (ă + ◌́)
U+0061 U+0306 U+0301 (a + ◌̆ + ◌́)

Needless to say that this phenomonen complicates some text related operations such as searching, sorting and matching.

In order to somewhat make things easier, the Unicode standard defined processes with which the different representations of a characters can be normalized into one representation.

After normalizing Unicode strings, they can then be compared or operated on more easily.

These processes, or algorithms, are referred to as normalization forms, for example C, D, KC or KD.

See also the .NET class System.Text.NormalizationForm.