Unicode: Code points

A Unicode code point is a number in the range 0 … 10FFFF₁₆ that identifies a character, not a glyph.

Each code point also has a unique name associated with it.

Examples of code points names include

Latin Capital letter Q
Latin Small Letter E with grave
Greek Small Letter Sigma
Per mille sign

The first 256 codepoints in Unicode are identical to ISO 8859-1 (aka latin-1).

Representation of code points

In running text, an individual code point is expressed as U+xxxx where xxxx is the four to six hexadecimal digits long hex-value of the code point, for example U+0051.

A range of code points is expresses as U+xxxx-U+yyyy or U+xxxx..U+yyyy.

The U+ portion can sometimes be omitted for brevity, but is required in rule NR2 for cases where characters have names algorithmically derived from their code points.

Some programming languages use \uxxxx to embed a Code point into a string literal. For example, in Python:

>>> print('\u0052\u0065\u006e\u00e9')
René

PowerShell

In PowerShell, the code point number for a character can be determined like so:

PS:>  [int][char] 'a'
97
PS:>  [int][char] 'λ'
955

Links

What Unicode character is this? identifies each character of a text pasted into an entry field.

Awesome codepoints on github: a curated list of characters in Unicode, that have interesting (and maybe not widely known) features or are awesome in some other way.

Unicode: Code points

Representation of code points

PowerShell

See also

Links