ARIB STD B24 character set
Standard | ARIB STB-B24 Volume 1 |
---|---|
Classification | ISO 2022 profile/extension |
Transforms / Encodes | ARIB STB-B24 Kanji, Kana and mosaic sets, JIS X 0201 |
Language(s) | Japanese, English, Russian Partial support: Greek, Chinese |
---|---|
Standard | ARIB STB-B24 Volume 1 |
Classification | ISO-2022-structured CJK DBCS |
Extends | JIS X 0208 |
Encoding formats | |
Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language[2] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.[2] The latest revision is version 6.3 as of 2016-07-06.
It includes a number of ARIB extended characters (ARIB外字, ARIB gaiji) not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[3] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[4]
Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.[5] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.
Sets and codes[]
The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.[6] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):[7]
Set | Type | Code (column/line) | Code (hexadecimal) | Code (ASCII character) | Comments |
---|---|---|---|---|---|
Kanji | 2-byte | 4/2 | 42 | B |
The escape code B used for the ARIB Kanji set[7] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[8][9]
|
Alphanumeric | 1-byte | 4/10 | 4A | J |
JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.[9]
|
Proportional alphanumeric | 1-byte | 3/6 | 36 | 6
| |
Hiragana | 1-byte | 3/0 | 30 | 0 |
Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation. |
Proportional Hiragana | 1-byte | 3/7 | 37 | 7
| |
Katakana | 1-byte | 3/1 | 31 | 1 |
Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation. |
Proportional Katakana | 1-byte | 3/8 | 38 | 8
| |
JIS X 0201 Katakana | 1-byte | 4/9 | 49 | I |
JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3. |
Mosaic A | 1-byte | 3/2 | 32 | 2 |
Pseudographics |
Mosaic B | 1-byte | 3/3 | 33 | 3
| |
Mosaic C | 1-byte | 3/4 | 34 | 4 |
Non-spacing pseudographics |
Mosaic D | 1-byte | 3/5 | 35 | 5
|
Code charts[]
Kanji (double-byte) set[]
This is a double-byte character set extending JIS X 0208.
Lead byte[]
The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208, exceptions are shown with a heavy border.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | SP 0020 |
Punct. LEAD 1-_ |
Symbol LEAD 2-_ |
Alnum. LEAD 3-_ |
Hira. LEAD 4-_ |
Kata. LEAD 5-_ |
Greek LEAD 6-_ |
Cyrillic LEAD 7-_ |
Box LEAD 8-_ |
9-_ |
10-_ |
11-_ |
12-_ |
13-_ |
14-_ |
15-_ |
3_ | Kanji L1 LEAD 16-_ |
Kanji L1 LEAD 17-_ |
Kanji L1 LEAD 18-_ |
Kanji L1 LEAD 19-_ |
Kanji L1 LEAD 20-_ |
Kanji L1 LEAD 21-_ |
Kanji L1 LEAD 22-_ |
Kanji L1 LEAD 23-_ |
Kanji L1 LEAD 24-_ |
Kanji L1 LEAD 25-_ |
Kanji L1 LEAD 26-_ |
Kanji L1 LEAD 27-_ |
Kanji L1 LEAD 28-_ |
Kanji L1 LEAD 29-_ |
Kanji L1 LEAD 30-_ |
Kanji L1 LEAD 31-_ |
4_ | Kanji L1 LEAD 32-_ |
Kanji L1 LEAD 33-_ |
Kanji L1 LEAD 34-_ |
Kanji L1 LEAD 35-_ |
Kanji L1 LEAD 36-_ |
Kanji L1 LEAD 37-_ |
Kanji L1 LEAD 38-_ |
Kanji L1 LEAD 39-_ |
Kanji L1 LEAD 40-_ |
Kanji L1 LEAD 41-_ |
Kanji L1 LEAD 42-_ |
Kanji L1 LEAD 43-_ |
Kanji L1 LEAD 44-_ |
Kanji L1 LEAD 45-_ |
Kanji L1 LEAD 46-_ |
Kanji L1 LEAD 47-_ |
5_ | Kanji L2 LEAD 48-_ |
Kanji L2 LEAD 49-_ |
Kanji L2 LEAD 50-_ |
Kanji L2 LEAD 51-_ |
Kanji L2 LEAD 52-_ |
Kanji L2 LEAD 53-_ |
Kanji L2 LEAD 54-_ |
Kanji L2 LEAD 55-_ |
Kanji L2 LEAD 56-_ |
Kanji L2 LEAD 57-_ |
Kanji L2 LEAD 58-_ |
Kanji L2 LEAD 59-_ |
Kanji L2 LEAD 60-_ |
Kanji L2 LEAD 61-_ |
Kanji L2 LEAD 62-_ |
Kanji L2 LEAD 63-_ |
6_ | Kanji L2 LEAD 64-_ |
Kanji L2 LEAD 65-_ |
Kanji L2 LEAD 66-_ |
Kanji L2 LEAD 67-_ |
Kanji L2 LEAD 68-_ |
Kanji L2 LEAD 69-_ |
Kanji L2 LEAD 70-_ |
Kanji L2 LEAD 71-_ |
Kanji L2 LEAD 72-_ |
Kanji L2 LEAD 73-_ |
Kanji L2 LEAD 74-_ |
Kanji L2 LEAD 75-_ |
Kanji L2 LEAD 76-_ |
Kanji L2 LEAD 77-_ |
Kanji L2 LEAD 78-_ |
Kanji L2 LEAD 79-_ |
7_ | Kanji L2 LEAD 80-_ |
Kanji L2 LEAD 81-_ |
Kanji L2 LEAD 82-_ |
Kanji L2 LEAD 83-_ |
Kanji L2 LEAD 84-_ |
85-_ |
86-_ |
87-_ |
88-_ |
89-_ |
Traffic LEAD 90-_ |
Map LEAD 91-_ |
Misc. LEAD 92-_ |
Misc. LEAD 93-_ |
List LEAD 94-_ |
DEL 007F |
Character sets 0x21-0x74 (row numbers 1-84: punctuation, alphabets, numbers, Kana, Kanji)[]
Character set 0x7A (row number 90, traffic symbols)[]
Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below with a heavy border) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.[10] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.[10]
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | ⛌ 26CC 90-1 |
⛍ 26CD 90-2 |
❗︎ 2757 90-3 |
⛏ 26CF 90-4 |
⛐ 26D0 90-5 |
⛑ 26D1 90-6 |
90-7 |
⛒ 26D2 90-8 |
⛕ 26D5 90-9 |
⛓ 26D3 90-10 |
⛔︎ 26D4 90-11 |
90-12 |
90-13 |
90-14 |
90-15 | |
3_ | WIKI |