ucd

Files

`ArabicShaping.txt`
`BidiBrackets.txt`
`BidiCharacterTest.txt`
`BidiMirroring.txt`
`BidiTest.txt`
`Blocks.txt`	See also DB: Unicode Blocks
`CJKRadicals.txt`
`CaseFolding.txt`
`CompositionExclusions.txt`
`DerivedAge.txt`
`DerivedCoreProperties.txt`
`DerivedNormalizationProps.txt`
`EastAsianWidth.txt`
`EmojiSources.txt`
`EquivalentUnifiedIdeograph.txt`
`HangulSyllableType.txt`
`Index.txt`
`IndicPositionalCategory.txt`
`IndicSyllabicCategory.txt`
`Jamo.txt`
`LineBreak.txt`
`NameAliases.txt`
`NamedSequences.txt`
`NamedSequencesProv.txt`
`NamesList.html`
`NamesList.txt`
`NormalizationCorrections.txt`
`NormalizationTest.txt`
`NushuSources.txt`
`PropList.txt`
`PropertyAliases.txt`
`PropertyValueAliases.txt`
`ReadMe.txt`
`ScriptExtensions.txt`
`Scripts.txt`
`SpecialCasing.txt`
`StandardizedVariants.txt`
`TangutSources.txt`
`UCD.zip`
`USourceData.txt`
`USourceGlyphs.pdf`
`USourceRSChart.pdf`
`UnicodeData.txt`
`Unihan.zip`
`VerticalOrientation.txt`
`auxiliary/`
`emoji/`
`extracted/`

UnicodeData.txt

The UnicodeData.txt file is documented at https://www.unicode.org/L2/L1999/UnicodeData.html.

Columns

0	Code value	Code value in 4-digit hexadecimal format.
1	Character name	The name of the character as published in Chapter 7 of the Unicode Standard, Version 2.0 (except for the two additional characters).
2	General category	Compare with the .NET enum `System.Globalization.UnicodeCategory`
3	Canonical combining classes	The purpose of the canonical combining class property is to ensure that characters are rendered in a consistent and predictable manner (especially in complex, multilingua texts that use a variety of diacritical marks and combining characters), regardless of the specific font or rendering engine being used. Characters with a low combining class value (e.g. 0 or 1) are usually rendered before characters with a higher combining class value (e.g. 230 or 240) and characters with the same combining class value are usually rendered in the order in which they appear in the text. (Described in chapter 4 of the Unicode Standard?)
4	Bidirectional category	See the list below for an explanation of the abbreviations used in this field. These are the categories required by the Bidirectional Behavior Algorithm in the Unicode Standard. These categories are summarized in Chapter 3 of the Unicode Standard.
5	Character decomposition mapping	In the Unicode Standard, not all of the mappings are full (maximal) decompositions. Recursive application of look-up for decompositions will, in all cases, lead to a maximal decomposition. The decomposition mappings match exactly the decomposition mappings published with the character names in the Unicode Standard.
6	Decimal digit value	This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field
7	Digit value	This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits
8	Numeric value	This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers.
8	Mirrored	If the character has been identified as a "mirrored" character in bidirectional text, this field has the value "Y"; otherwise "N". The list of mirrored characters is also printed in Chapter 4 of the Unicode Standard.
10	Unicode 1.0 Name	This is the old name as published in Unicode 1.0. This name is only provided when it is significantly different from the Unicode 3.0 name for the character.
11	10646 comment field	This is the ISO 10646 comment field. It is in parantheses in the 10646 names list.
12	Uppercase mapping	Upper case equivalent mapping. If a character is part of an alphabet with case distinctions, and has an upper case equivalent, then the upper case equivalent is in this field. See the explanation below on case distinctions. These mappings are always one-to-one, not one-to-many or many-to-one. This field is informative.
13	Lowercase mapping	Similar to Uppercase mapping
14	Titlecase mapping	Similar to Uppercase mapping