ISO-IR-165

CCITT Chinese set (ISO-IR 165)
MIME / IANA	iso-ir-165
Alias(es)	CN-GB-ISOIR165 (EUC form)
Language(s)	Simplified Chinese, English, Russian; Partial support:; Greek, Japanese
Standard	ITU T.101, annex C
Definitions	ISO-IR 165
Extends	GB 2312
Encoding formats	ISO-2022-CN-EXT, Videotex Data Syntax 2
Succeeded by	GB 18030
	v; t; ;

The CCITT Chinese Primary Set^[2] is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992.^[3] It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex.^[2] It is registered with the ISO-IR registry for use with ISO/IEC 2022 as ISO-IR-165,^[4] and encodable in the ISO-2022-CN-EXT code version.^[1]

It is an extended modification of GB 2312-80, and corresponds to the union of the Mainland Chinese GB standards GB 6345.1-86 and GB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated into GB 18030, while GB 8565.2 serves as the Mainland Chinese source reference for certain CJK Unified Ideographs.

GB 6345.1[]

GB 6345.1-86 (32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange) includes both a corrigendum and an extension for GB 2312. The corrigendum alters the following two characters:^[3]

Alterations made to existing GB 2312 characters by GB 6345.1^[3]
Row-cell	EUC	Unamended	GB 6341.1	Notes
03-71	0xA3E7	ɡ		^[a]
79-81	0xEFF1	鍾	锺	^[b]

^ Corresponds to U+FF47 ｇ in Unicode; however, the unamended reference glyph can also correspond to U+0261 ɡ . See below for how U+0261 is mapped to/from GB 6341.1, versus how it is mapped to/from ISO-IR-165.
^ The unamended reference glyph is a Traditional Chinese character corresponding to U+937E. The character in question is usually replaced with 钟 (U+949F, also the simplification of 鐘) in Simplified Chinese except in names of persons; the amended glyph is an alternate simplified form corresponding to U+953A.

Deployed implementations incorporating GB 2312, such as Windows code page 936, generally follow these corrections when selecting their Unicode mappings.^[5]

The extension adds half-width ISO 646-CN characters in row 10 (in addition to the existing full-width characters in row 3), extends the set of 26 non-ASCII pinyin characters in row 8 with six additional such characters, and adds half-width forms of these 32 pinyin characters to row 11.^[3] These GB 6345.1 extensions are also incorporated into GB/T 12345, the Traditional Chinese counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.^[3]^[6]

The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345, but not the half-width forms, are included in the classic Mac OS encoding for Simplified Chinese (a modification of EUC-CN),^[7] and also as two-byte codes in GB 18030.^[8] The additional pinyin characters are as follows:^[7]

Extensions made by GB 6345.1 to GB 2312 row 8
Row-cell	EUC	Character^[7]^[8]	Notes
08-27	0xA8BB	U+0251 ɑ
08-28	0xA8BC	U+1E3F ḿ	^[a]
08-29	0xA8BD	U+0144 ń
08-30	0xA8BE	U+0148 ň
08-31	0xA8BF	U+01F9 ǹ	^[b]
08-32	0xA8C0	U+0261 ɡ	^[c]

^ Mapped to the Private Use Area U+E7C7 by the first (2000) edition of GB 18030; this was amended by the 2005 edition.^[8]
^ This composed character was added in Unicode 3.0. Prior to this, this character was mapped to its composition sequence (i.e. U+006E+0300) by Apple.^[7] This change predates the stabilisation of Unicode normalisation forms, which was introduced in Unicode 3.1.^[9]
^ Matches the unamended reference glyph for 03-71 (see above). ISO-IR-165 differs here (see below).

GB 8565.2[]

GB 8565.2-88 (Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters) defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.^[3]

The Unihan database references GB 8565.2 as the Mainland Chinese source of several hanzi included in Unicode. Its Unihan source abbreviation is G8.^[2]

CCITT changes[]

ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.^[3] Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).^[3]^[4] These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of the Unihan database.^[2] In total the set contains 8446 characters.

A number of patterned semigraphic characters are included in row 6.^[4] This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese^[7] and GB 18030.^[8]

The GB 6345.1 corrections to GB 2312 are only partly applied, resulting in two Unicode mappings being reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions:

Row-cell	EUC	GB 2312 (unamended)	GB 6341.1	GB 6341.1 mapping^[7]^[8]	ISO-IR-165^[4]	ISO-IR-165 mapping^[10]
03-71	0xA3E7	ɡ		U+FF47	ɡ	U+0261
08-32	0xA8C0	(absent)	ɡ	U+0261		U+FF47
79-81	0xEFF1	鍾	锺	U+953A	锺	U+953A

References[]

^ ^a ^b Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922.
^ ^a ^b ^c ^d Chung, Jaemin (2018-01-24). "Pseudo-G8 characters" (PDF). ISO/IEC JTC 1/SC 2/WG 2/IRG N2276.
^ ^a ^b ^c ^d ^e ^f ^g ^h Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 94–111. ISBN 978-0-596-51447-1.
^ ^a ^b ^c ^d CCITT (1992-07-13). Codes of the Chinese graphic character set for communication (PDF). ITSCJ/IPSJ. ISO-IR-165.
^ Steele, Shawn (2000). "cp936 to Unicode table". Microsoft, Unicode Consortium.
^ Lunde, Ken (1998). Appendix F: GB/T 12345 (PDF). CJKV Information Processing. O'Reilly Media. ISBN 9781565922242.
^ ^a ^b ^c ^d ^e ^f "Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later". Apple, Inc.
^ ^a ^b ^c ^d ^e Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.
^ "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
^ Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM. (Note: codes are listed in the source in 7-bit form: add 0x80 to each byte for EUC form, or subtract 0x20 for kuten form)

External links[]

ISO-IR-165: Code of the Chinese graphic character set for communication (registered 1992, amended 1994)
Unicode mappings for ISO-IR-165

[5] Corresponds to U+FF47 ｇ in Unicode; however, the unamended reference glyph can also correspond to U+0261 ɡ . See below for how U+0261 is mapped to/from GB 6341.1, versus how it is mapped to/from ISO-IR-165.

[6] The unamended reference glyph is a Traditional Chinese character corresponding to U+937E. The character in question is usually replaced with 钟 (U+949F, also the simplification of 鐘) in Simplified Chinese except in names of persons; the amended glyph is an alternate simplified form corresponding to U+953A.

[11] Mapped to the Private Use Area U+E7C7 by the first (2000) edition of GB 18030; this was amended by the 2005 edition.^[8]

[13] This composed character was added in Unicode 3.0. Prior to this, this character was mapped to its composition sequence (i.e. U+006E+0300) by Apple.^[7] This change predates the stabilisation of Unicode normalisation forms, which was introduced in Unicode 3.1.^[9]

[14] Matches the unamended reference glyph for 03-71 (see above). ISO-IR-165 differs here (see below).

[rfc1922-1] Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922.

[chung-2] Chung, Jaemin (2018-01-24). "Pseudo-G8 characters" (PDF). ISO/IEC JTC 1/SC 2/WG 2/IRG N2276.

[lunde2009-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 94–111. ISBN 978-0-596-51447-1.

[iso-ir-4] CCITT (1992-07-13). Codes of the Chinese graphic character set for communication (PDF). ITSCJ/IPSJ. ISO-IR-165.

[ms936-7] Steele, Shawn (2000). "cp936 to Unicode table". Microsoft, Unicode Consortium.

[cjkv-12345-8] Lunde, Ken (1998). Appendix F: GB/T 12345 (PDF). CJKV Information Processing. O'Reilly Media. ISBN 9781565922242.

[macsimpchinese-9] ^ ^a ^b ^c ^d ^e ^f "Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later". Apple, Inc.

[gb18030-10] Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.

[12] "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.

[15] Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM. (Note: codes are listed in the source in 7-bit form: add 0x80 to each byte for EUC form, or subtract 0x20 for kuten form)

[1]

[2]

[3]

[4]

[a]

[b]

[5]

[6]

[7]

[8]

[a]

[b]

[c]

[9]

[10]