ARIB STD B24 character set

ARIB STB-B24 Kanji set
	Weather symbols: a few of the extended symbols included.
Language(s)	Japanese, English, Russian; Partial support: Greek, Chinese
Standard	ARIB STB-B24 Volume 1
Classification	ISO-2022-structured CJK DBCS
Extends	JIS X 0208
Encoding formats	ARIB STB-B24 encoding (ISO 2022 based); Shift JIS (ARIB variant);
	v; t; ;

ARIB STB-B24 encoding
Standard	ARIB STB-B24 Volume 1
Classification	ISO 2022 profile/extension
Transforms / Encodes	ARIB STB-B24 Kanji, Kana and mosaic sets,; JIS X 0201
	v; t; ;

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language^[2] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.^[2] The latest revision is version 6.3 as of 2016-07-06.

It includes a number of ARIB extended characters (ARIB外字, ARIB gaiji) not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.^[3] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.^[4]

Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.^[5] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.

Sets and codes[]

The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.^[6] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):^[7]

Set	Type	Code (column/line)	Code (hexadecimal)	Code (ASCII character)	Comments
Kanji	2-byte	4/2	42	`B`	The escape code `B` used for the ARIB Kanji set^[7] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.^[8]^[9]
Alphanumeric	1-byte	4/10	4A	`J`	JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code `J` matches usage in ISO-2022-JP.^[9]
Proportional alphanumeric	1-byte	3/6	36	`6`
Hiragana	1-byte	3/0	30	`0`	Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Hiragana	1-byte	3/7	37	`7`
Katakana	1-byte	3/1	31	`1`	Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Katakana	1-byte	3/8	38	`8`
JIS X 0201 Katakana	1-byte	4/9	49	`I`	JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3.
Mosaic A	1-byte	3/2	32	`2`	Pseudographics
Mosaic B	1-byte	3/3	33	`3`	Pseudographics
Mosaic C	1-byte	3/4	34	`4`	Non-spacing pseudographics
Mosaic D	1-byte	3/5	35	`5`	Non-spacing pseudographics

Code charts[]

Kanji (double-byte) set[]

This is a double-byte character set extending JIS X 0208.

Lead byte[]

The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208, exceptions are shown with a heavy border.

ARIB STD-B24 Kanji (double-byte) set (lead bytes)
	_0	_1	_2	_3	_4	_5	_6	_7	_8	_9	_A	_B	_C	_D	_E	_F
2_	SP 0020	Punct. LEAD 1-_	Symbol LEAD 2-_	Alnum. LEAD 3-_	Hira. LEAD 4-_	Kata. LEAD 5-_	Greek LEAD 6-_	Cyrillic LEAD 7-_	Box LEAD 8-_	9-_	10-_	11-_	12-_	13-_	14-_	15-_
3_	Kanji L1 LEAD 16-_	Kanji L1 LEAD 17-_	Kanji L1 LEAD 18-_	Kanji L1 LEAD 19-_	Kanji L1 LEAD 20-_	Kanji L1 LEAD 21-_	Kanji L1 LEAD 22-_	Kanji L1 LEAD 23-_	Kanji L1 LEAD 24-_	Kanji L1 LEAD 25-_	Kanji L1 LEAD 26-_	Kanji L1 LEAD 27-_	Kanji L1 LEAD 28-_	Kanji L1 LEAD 29-_	Kanji L1 LEAD 30-_	Kanji L1 LEAD 31-_
4_	Kanji L1 LEAD 32-_	Kanji L1 LEAD 33-_	Kanji L1 LEAD 34-_	Kanji L1 LEAD 35-_	Kanji L1 LEAD 36-_	Kanji L1 LEAD 37-_	Kanji L1 LEAD 38-_	Kanji L1 LEAD 39-_	Kanji L1 LEAD 40-_	Kanji L1 LEAD 41-_	Kanji L1 LEAD 42-_	Kanji L1 LEAD 43-_	Kanji L1 LEAD 44-_	Kanji L1 LEAD 45-_	Kanji L1 LEAD 46-_	Kanji L1 LEAD 47-_
5_	Kanji L2 LEAD 48-_	Kanji L2 LEAD 49-_	Kanji L2 LEAD 50-_	Kanji L2 LEAD 51-_	Kanji L2 LEAD 52-_	Kanji L2 LEAD 53-_	Kanji L2 LEAD 54-_	Kanji L2 LEAD 55-_	Kanji L2 LEAD 56-_	Kanji L2 LEAD 57-_	Kanji L2 LEAD 58-_	Kanji L2 LEAD 59-_	Kanji L2 LEAD 60-_	Kanji L2 LEAD 61-_	Kanji L2 LEAD 62-_	Kanji L2 LEAD 63-_
6_	Kanji L2 LEAD 64-_	Kanji L2 LEAD 65-_	Kanji L2 LEAD 66-_	Kanji L2 LEAD 67-_	Kanji L2 LEAD 68-_	Kanji L2 LEAD 69-_	Kanji L2 LEAD 70-_	Kanji L2 LEAD 71-_	Kanji L2 LEAD 72-_	Kanji L2 LEAD 73-_	Kanji L2 LEAD 74-_	Kanji L2 LEAD 75-_	Kanji L2 LEAD 76-_	Kanji L2 LEAD 77-_	Kanji L2 LEAD 78-_	Kanji L2 LEAD 79-_
7_	Kanji L2 LEAD 80-_	Kanji L2 LEAD 81-_	Kanji L2 LEAD 82-_	Kanji L2 LEAD 83-_	Kanji L2 LEAD 84-_	85-_	86-_	87-_	88-_	89-_	Traffic LEAD 90-_	Map LEAD 91-_	Misc. LEAD 92-_	Misc. LEAD 93-_	List LEAD 94-_	DEL 007F

Character sets 0x21-0x74 (row numbers 1-84: punctuation, alphabets, numbers, Kana, Kanji)[]

Character set 0x7A (row number 90, traffic symbols)[]

Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below with a heavy border) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.^[10] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.^[10]

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7A)^[5]^[11]
	_0	_1	_2	_3	_4	_5	_6	_7	_8	_9	_A	_B	_C	_D	_E	_F
2_		⛌ 26CC 90-1	⛍ 26CD 90-2	❗︎ 2757 90-3	⛏ 26CF 90-4	⛐ 26D0 90-5	⛑ 26D1 90-6	90-7	⛒ 26D2 90-8	⛕ 26D5 90-9	⛓ 26D3 90-10	⛔︎ 26D4 90-11	90-12	90-13	90-14	90-15
3_	WIKI

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

Weather symbols: a few of the extended symbols included.
Language(s)	Japanese, English, Russian Partial support: Greek, Chinese
Standard	ARIB STB-B24 Volume 1
Classification	ISO-2022-structured CJK DBCS
Extends	JIS X 0208
Encoding formats	ARIB STB-B24 encoding (ISO 2022 based) Shift JIS (ARIB variant)^[1]
v t