CJK Unified Ideographs Extension B

From Wikipedia, the free encyclopedia
CJK Unified Ideographs Extension B
RangeU+20000..U+2A6DF
(42,720 code points)
PlaneSIP
ScriptsHan
Assigned42,720 code points
Unused0 reserved code points
Unicode version history
3.1 (2001)42,711 (+42,711)
13.0 (2020)42,718 (+7)
14.0 (2021)42,720 (+2)
Chart
Code chart
Note: [1][2]

CJK Unified Ideographs Extension B is a Unicode block containing rare and historic CJK ideographs for Chinese, Japanese, Korean, and Vietnamese.

The block has dozens of variation sequences defined for standardized variants.[3]

It also has thousands of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4][5] These sequences specify the desired glyph variant for a given Unicode character.

It was the only CJK Unified Ideographs Extension block with a UCS2003 source identifier. Since Extension B contained too many characters, the original code charts were produced with a single glyph for all regions. The glyphs were designed by After the introduction of multi-column code charts on Unicode 5.2, the original glyphs were retained under the UCS2003 source identifier until Unicode 14.0.[6] The glyphs are packaged in the "SimSun-ExtB" font distributed with the Simplified Chinese versions of Windows, and do not adhere to the glyphs for the Mainland China region.

Known issues[]

Unifiable variants and exact duplicates in Extension B[]

Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded.[7] In addition to the deliberate encoding of close glyph variants, six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake:[8]

  • U+34A8 㒨 = U+20457