GB 18030
This article needs additional citations for verification. (September 2016) |
MIME / IANA | GB18030 |
---|---|
Alias(es) | Code page 54936 |
Language(s) | International, but primarily meant for Chinese |
Standard | GB 18030-2005, GB 18030-2000 |
Classification | Unicode Transformation Format, extended ASCII,[a] variable-width encoding, CJK encoding |
Extends | EUC-CN, GBK |
Transforms / Encodes | ISO 10646 (Unicode) |
Preceded by | GBK, GB2312 |
| |
GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312.[1] As a Unicode Transformation Format[a] (i.e. an encoding of all Unicode code points), GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB2312, CP936,[b] and GBK 1.0.
In addition to the "GB18030 character encoding", this standard contains requirements about which scripts must be supported, font support, etc.[2]
History[]
The GB18030 character set is formally called "Chinese National Standard GB 18030-2005: Information Technology—Chinese coded character set". GB abbreviates Guójiā Biāozhǔn (国家标准), which means national standard in Chinese. The standard was published by the China Standard Press, Beijing, 8 November 2005. Only a portion of the standard is mandatory.[2] Since 1 May 2006, support for the mandatory subset is officially required for all software products sold in the PRC.
GB byte sequence |
Unicode code point | |
---|---|---|
GB 18030-2000 | GB 18030-2005 | |
A8 BC (ḿ) | U+E7C7 |
U+1E3F ḿ |
81 35 F4 37 | U+1E3F ḿ | U+E7C7
|
An older version of the standard, known as "Chinese National Standard GB 18030-2000: Information Technology—Chinese ideograms coded character set for information interchange—Extension for the basic set", was published on March 17, 2000. The encoding scheme stays the same in the new version, and the only difference in GB-to-Unicode mapping is that GB 18030-2000 mapped the character A8 BC
(ḿ) to a private use code point U+E7C7, and character 81 35 F4 37
(without specifying any glyph) to U+1E3F (ḿ), whereas GB 18030-2005 swaps these two mapping assignments.[3]: 534 More code points are now associated with characters due to update of Unicode, especially the appearance of CJK Unified Ideographs Extension B. Some characters used by ethnic minorities in China, such as Mongolian characters and Tibetan characters (-1997 and -2006), have been added as well, which accounts for the renaming of the standard.
Compared with its ancestors, GB 18030's mapping to Unicode has been modified for the 81 characters that were provisionally assigned a Unicode Private Use Area code point (U+E000–F8FF) in GBK 1.0 and that have later been encoded in Unicode.[4] This is specified in Appendix E of GB 18030.[3]: 534 [5]: 499 There are 24 characters in GB 18030-2005 that are still mapped to Unicode PUA.[6] According to Ken Lunde, the 2018 Draft of a new revision of GB 18030 will finally eliminate these mappings.[7]
GB byte sequence |
Unicode code point (blue = private use) | ||
---|---|---|---|
GBK 1.0[8][3]: 534 | GB 18030 -2005[6] |
Unicode 4.1 | |
A6 D9[9]: 108 | U+E78D |
U+FE10 ︐ | |
A6 DA | U+E78E |
U+FE12 ︒ | |
A6 DB | U+E78F |
U+FE11 ︑ | |
A6 DC | U+E790 |
U+FE13 ︓ | |
A6 DD | U+E791 |
U+FE14 ︔ | |
A6 DE | U+E792 |
U+FE15 ︕ | |
A6 DF | U+E793 |
U+FE16 ︖ | |
A6 EC | U+E794 |
U+FE17 ︗ | |
A6 ED | U+E795 |
U+FE18 ︘ | |
A6 F3 | U+E796 |
U+FE19 ︙ | |
A8 BC | U+E7C7 |
U+1E3F ḿ | |
A8 BF | U+E7C8 |
U+01F9 ǹ | |
A9 89 | U+E7E7 |
U+303E 〾 | |
A9 8A | U+E7E8 |
U+2FF0 ⿰ | |
A9 8B | U+E7E9 |
U+2FF1 ⿱ | |
A9 8C | U+E7EA |
U+2FF2 ⿲ | |
A9 8D | U+E7EB |
U+2FF3 ⿳ | |
A9 8E | U+E7EC |
U+2FF4 ⿴ | |
A9 8F | U+E7ED |
U+2FF5 ⿵ | |
A9 90 | U+E7EE |
U+2FF6 ⿶ | |
A9 91 | U+E7EF |
U+2FF7 ⿷ | |
A9 92 | U+E7F0 |
U+2FF8 ⿸ | |
A9 93 | U+E7F1 |
U+2FF9 ⿹ | |
A9 94[9]: 173 | U+E7F2 |
U+2FFA ⿺ | |
A9 95 | U+E7F3 |
U+2FFB ⿻ | |
FE 50 | U+E815 |
U+2E81 ⺁ | |
FE 51 | U+E816 |
U+20087 |