Main Page | See live article | Alphabetical index

UCS-4

ISO 10646 defines a 32-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of Unicode, which requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

But the Principles and Procedures document of ITC1/SC2/WG2 now states that all future assignments of character to 10646 will be constrained to the BMP or the first 14 supplementary planes which effectively makes UCS-4 identical to UTF-32 save that UTF-32 has the extra requirement that additional Unicode semantics be observed for all characters.

Related entries: