ISO-IR-165

From Wikipedia, the free encyclopedia
CCITT Chinese set (ISO-IR 165)
MIME / IANAiso-ir-165
Alias(es)CN-GB-ISOIR165 (EUC form)[1]
Language(s)Simplified Chinese, English, Russian
Partial support:
Greek, Japanese
StandardITU T.101, annex C
DefinitionsISO-IR 165
ExtendsGB 2312
Encoding formatsISO-2022-CN-EXT, Videotex Data Syntax 2
Succeeded byGB 18030

The CCITT Chinese Primary Set[2] is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992.[3] It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex.[2] It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165,[4] and encodable in the ISO-2022-CN-EXT code version.[1]

It is an extended modification of GB 2312-80, and corresponds to the union of the Mainland Chinese GB standards GB 6345.1-86 and GB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated into GB 18030, while GB 8565.2 serves as the Mainland Chinese source reference for certain CJK Unified Ideographs.

GB 6345.1[]

GB 6345.1-86 (32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange) includes both a corrigendum and an extension for GB 2312. The corrigendum alters the following two characters:[3]

Alterations made to existing GB 2312 characters by GB 6345.1[3]
Row-cell EUC Unamended GB 6341.1 Notes
03-71 0xA3E7 ɡ g [a]
79-81 0xEFF1 [b]
  1. ^ Corresponds to U+FF47 in Unicode; however, the unamended reference glyph can also correspond to U+0261 ɡ . See below for how U+0261 is mapped to/from GB 6341.1, versus how it is mapped to/from ISO-IR-165.
  2. ^ The unamended reference glyph is a Traditional Chinese character corresponding to U+937E. The character in question is usually replaced with (U+949F, also the simplification of ) in Simplified Chinese except in names of persons; the amended glyph is an alternate simplified form corresponding to U+953A.

Deployed implementations incorporating GB 2312, such as Windows code page 936, generally follow these corrections when selecting their Unicode mappings.[5]

The extension adds half-width ISO 646-CN characters in row 10 (in addition to the existing full-width characters in row 3), extends the set of 26 non-ASCII pinyin characters in row 8 with six additional such characters, and adds half-width forms of these 32 pinyin characters to row 11.[3] These GB 6345.1 extensions are also incorporated into GB/T 12345, the Traditional Chinese counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.[3][6]

The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345, but not the half-width forms, are included in the classic Mac OS encoding for Simplified Chinese (a modification of EUC-CN),[7] and also as two-byte codes in GB 18030.[8] The additional pinyin characters are as follows:[7]

Extensions made by GB 6345.1 to GB 2312 row 8
Row-cell EUC Character[7][8] Notes
08-27 0xA8BB U+0251 ɑ
08-28 0xA8BC U+1E3F ḿ [a]
08-29 0xA8BD U+0144 ń
08-30 0xA8BE U+0148 ň
08-31 0xA8BF U+01F9 ǹ [b]
08-32 0xA8C0 U+0261 ɡ [c]
  1. ^ Mapped to the Private Use Area U+E7C7 by the first (2000) edition of GB 18030; this was amended by the 2005 edition.[8]
  2. ^ This composed character was added in Unicode 3.0. Prior to this, this character was mapped to its composition sequence (i.e. U+006E+0300) by Apple.[7] This change predates the stabilisation of Unicode normalisation forms, which was introduced in Unicode 3.1.[9]
  3. ^ Matches the unamended reference glyph for 03-71 (see above). ISO-IR-165 differs here (see below).

GB 8565.2[]

GB 8565.2-88 (Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters) defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.[3]

The Unihan database references GB 8565.2 as the Mainland Chinese source of several hanzi included in Unicode. Its Unihan source abbreviation is G8.[2]

CCITT changes[]

ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.[3] Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).[3][4] These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of the Unihan database.[2] In total the set contains 8446 characters.

A number of patterned semigraphic characters are included in row 6.[4] This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese[7] and GB 18030.[8]

The GB 6345.1 corrections to GB 2312 are only partly applied, resulting in two Unicode mappings being reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions:

Row-cell EUC GB 2312 (unamended) GB 6341.1 GB 6341.1 mapping[7][8] ISO-IR-165[4] ISO-IR-165 mapping[10]
03-71 0xA3E7 ɡ g U+FF47 ɡ U+0261
08-32 0xA8C0 (absent) ɡ U+0261 g U+FF47
79-81 0xEFF1 U+953A U+953A

References[]

  1. ^ a b Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922.
  2. ^ a b c d Chung, Jaemin (2018-01-24). "Pseudo-G8 characters" (PDF). ISO/IEC JTC 1/SC 2/WG 2/IRG N2276.
  3. ^ a b c d e f g h Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 94–111. ISBN 978-0-596-51447-1.
  4. ^ a b c d CCITT (1992-07-13). Codes of the Chinese graphic character set for communication (PDF). ITSCJ/IPSJ. ISO-IR-165.
  5. ^ Steele, Shawn (2000). "cp936 to Unicode table". Microsoft, Unicode Consortium.
  6. ^ Lunde, Ken (1998). Appendix F: GB/T 12345 (PDF). CJKV Information Processing. O'Reilly Media. ISBN 9781565922242.
  7. ^ a b c d e f "Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later". Apple, Inc.
  8. ^ a b c d e Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.
  9. ^ "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
  10. ^ Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM. (Note: codes are listed in the source in 7-bit form: add 0x80 to each byte for EUC form, or subtract 0x20 for kuten form)

External links[]

Retrieved from ""