Search notes:

ucd

ucd is a directory of the Unicode Character Database (UCD). It stores the database's data files which specify the Unicode character properties and related data.
It also includes data files containing test data for conformance to several important Unicode algorithms.

Files

ArabicShaping.txt
BidiBrackets.txt
BidiCharacterTest.txt
BidiMirroring.txt
BidiTest.txt
Blocks.txt See also DB: Unicode Blocks
CJKRadicals.txt
CaseFolding.txt
CompositionExclusions.txt
DerivedAge.txt
DerivedCoreProperties.txt
DerivedNormalizationProps.txt
EastAsianWidth.txt
EmojiSources.txt
EquivalentUnifiedIdeograph.txt
HangulSyllableType.txt
Index.txt
IndicPositionalCategory.txt
IndicSyllabicCategory.txt
Jamo.txt
LineBreak.txt
NameAliases.txt
NamedSequences.txt
NamedSequencesProv.txt
NamesList.html
NamesList.txt
NormalizationCorrections.txt
NormalizationTest.txt
NushuSources.txt
PropList.txt
PropertyAliases.txt
PropertyValueAliases.txt
ReadMe.txt
ScriptExtensions.txt
Scripts.txt
SpecialCasing.txt
StandardizedVariants.txt
TangutSources.txt
UCD.zip
USourceData.txt
USourceGlyphs.pdf
USourceRSChart.pdf
UnicodeData.txt
Unihan.zip
VerticalOrientation.txt
auxiliary/
emoji/
extracted/

UnicodeData.txt

The UnicodeData.txt file is documented at https://www.unicode.org/L2/L1999/UnicodeData.html.

Columns

0 Code value Code value in 4-digit hexadecimal format.
1 Character name The name of the character as published in Chapter 7 of the Unicode Standard, Version 2.0 (except for the two additional characters).
2 General category Compare with the .NET enum System.Globalization.UnicodeCategory
3 Canonical combining classes The purpose of the canonical combining class property is to ensure that characters are rendered in a consistent and predictable manner (especially in complex, multilingua texts that use a variety of diacritical marks and combining characters), regardless of the specific font or rendering engine being used. Characters with a low combining class value (e.g. 0 or 1) are usually rendered before characters with a higher combining class value (e.g. 230 or 240) and characters with the same combining class value are usually rendered in the order in which they appear in the text. (Described in chapter 4 of the Unicode Standard?)
4 Bidirectional category See the list below for an explanation of the abbreviations used in this field. These are the categories required by the Bidirectional Behavior Algorithm in the Unicode Standard. These categories are summarized in Chapter 3 of the Unicode Standard.
5 Character decomposition mapping In the Unicode Standard, not all of the mappings are full (maximal) decompositions. Recursive application of look-up for decompositions will, in all cases, lead to a maximal decomposition. The decomposition mappings match exactly the decomposition mappings published with the character names in the Unicode Standard.
6 Decimal digit value This is a numeric field. If the character has the decimal digit property, as specified in Chapter 4 of the Unicode Standard, the value of that digit is represented with an integer value in this field
7 Digit value This is a numeric field. If the character represents a digit, not necessarily a decimal digit, the value is here. This covers digits which do not form decimal radix forms, such as the compatibility superscript digits
8 Numeric value This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field. This includes fractions as, e.g., "1/5" for U+2155 VULGAR FRACTION ONE FIFTH Also included are numerical values for compatibility characters such as circled numbers.
8 Mirrored If the character has been identified as a "mirrored" character in bidirectional text, this field has the value "Y"; otherwise "N". The list of mirrored characters is also printed in Chapter 4 of the Unicode Standard.
10 Unicode 1.0 Name This is the old name as published in Unicode 1.0. This name is only provided when it is significantly different from the Unicode 3.0 name for the character.
11 10646 comment field This is the ISO 10646 comment field. It is in parantheses in the 10646 names list.
12 Uppercase mapping Upper case equivalent mapping. If a character is part of an alphabet with case distinctions, and has an upper case equivalent, then the upper case equivalent is in this field. See the explanation below on case distinctions. These mappings are always one-to-one, not one-to-many or many-to-one. This field is informative.
13 Lowercase mapping Similar to Uppercase mapping
14 Titlecase mapping Similar to Uppercase mapping

Index