ARIB STD B24 character set

From Wikipedia, the free encyclopedia
ARIB STB-B24 encoding
StandardARIB STB-B24 Volume 1
ClassificationISO 2022 profile/extension
Transforms / EncodesARIB STB-B24 Kanji, Kana and mosaic sets,
JIS X 0201
ARIB STB-B24 Kanji set
ARIB Extended Font (Weather Symbols) ja.svg
Weather symbols: a few of the extended symbols included.
Language(s)Japanese, English, Russian
Partial support: Greek, Chinese
StandardARIB STB-B24 Volume 1
ClassificationISO-2022-structured CJK DBCS
ExtendsJIS X 0208
Encoding formats
  • ARIB STB-B24 encoding (ISO 2022 based)
  • Shift JIS (ARIB variant)[1]

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language[2] specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on 1999-10-26.[2] The latest revision is version 6.3 as of 2016-07-06.

It includes a number of ARIB extended characters (ARIB外字, ARIB gaiji) not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[3] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[4]

Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.[5] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.

Sets and codes[]

The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.[6] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):[7]

Set Type Code (column/line) Code (hexadecimal) Code (ASCII character) Comments
Kanji 2-byte 4/2 42 B The escape code B used for the ARIB Kanji set[7] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[8][9]
Alphanumeric 1-byte 4/10 4A J JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.[9]
Proportional alphanumeric 1-byte 3/6 36 6
Hiragana 1-byte 3/0 30 0 Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Hiragana 1-byte 3/7 37 7
Katakana 1-byte 3/1 31 1 Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Katakana 1-byte 3/8 38 8
JIS X 0201 Katakana 1-byte 4/9 49 I JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3.
Mosaic A 1-byte 3/2 32 2 Pseudographics
Mosaic B 1-byte 3/3 33 3
Mosaic C 1-byte 3/4 34 4 Non-spacing pseudographics
Mosaic D 1-byte 3/5 35 5

Code charts[]

Kanji (double-byte) set[]

This is a double-byte character set extending JIS X 0208.

Lead byte[]

The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208, exceptions are shown with a heavy border.

ARIB STD-B24 Kanji (double-byte) set (lead bytes)
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_ SP
0020
 
Punct.
LEAD
1-_
Symbol
LEAD
2-_
Alnum.
LEAD
3-_
Hira.
LEAD
4-_
Kata.
LEAD
5-_
Greek
LEAD
6-_
Cyrillic
LEAD
7-_
Box
LEAD
8-_
 
 
9-_
 
 
10-_
 
 
11-_
 
 
12-_
 
 
13-_
 
 
14-_
 
 
15-_
3_ Kanji L1
LEAD
16-_
Kanji L1
LEAD
17-_
Kanji L1
LEAD
18-_
Kanji L1
LEAD
19-_
Kanji L1
LEAD
20-_
Kanji L1
LEAD
21-_
Kanji L1
LEAD
22-_
Kanji L1
LEAD
23-_
Kanji L1
LEAD
24-_
Kanji L1
LEAD
25-_
Kanji L1
LEAD
26-_
Kanji L1
LEAD
27-_
Kanji L1
LEAD
28-_
Kanji L1
LEAD
29-_
Kanji L1
LEAD
30-_
Kanji L1
LEAD
31-_
4_ Kanji L1
LEAD
32-_
Kanji L1
LEAD
33-_
Kanji L1
LEAD
34-_
Kanji L1
LEAD
35-_
Kanji L1
LEAD
36-_
Kanji L1
LEAD
37-_
Kanji L1
LEAD
38-_
Kanji L1
LEAD
39-_
Kanji L1
LEAD
40-_
Kanji L1
LEAD
41-_
Kanji L1
LEAD
42-_
Kanji L1
LEAD
43-_
Kanji L1
LEAD
44-_
Kanji L1
LEAD
45-_
Kanji L1
LEAD
46-_
Kanji L1
LEAD
47-_
5_ Kanji L2
LEAD
48-_
Kanji L2
LEAD
49-_
Kanji L2
LEAD
50-_
Kanji L2
LEAD
51-_
Kanji L2
LEAD
52-_
Kanji L2
LEAD
53-_
Kanji L2
LEAD
54-_
Kanji L2
LEAD
55-_
Kanji L2
LEAD
56-_
Kanji L2
LEAD
57-_
Kanji L2
LEAD
58-_
Kanji L2
LEAD
59-_
Kanji L2
LEAD
60-_
Kanji L2
LEAD
61-_
Kanji L2
LEAD
62-_
Kanji L2
LEAD
63-_
6_ Kanji L2
LEAD
64-_
Kanji L2
LEAD
65-_
Kanji L2
LEAD
66-_
Kanji L2
LEAD
67-_
Kanji L2
LEAD
68-_
Kanji L2
LEAD
69-_
Kanji L2
LEAD
70-_
Kanji L2
LEAD
71-_
Kanji L2
LEAD
72-_
Kanji L2
LEAD
73-_
Kanji L2
LEAD
74-_
Kanji L2
LEAD
75-_
Kanji L2
LEAD
76-_
Kanji L2
LEAD
77-_
Kanji L2
LEAD
78-_
Kanji L2
LEAD
79-_
7_ Kanji L2
LEAD
80-_
Kanji L2
LEAD
81-_
Kanji L2
LEAD
82-_
Kanji L2
LEAD
83-_
Kanji L2
LEAD
84-_
 
 
85-_
 
 
86-_
 
 
87-_
 
 
88-_
 
 
89-_
Traffic
LEAD
90-_
Map
LEAD
91-_
Misc.
LEAD
92-_
Misc.
LEAD
93-_
List
LEAD
94-_
DEL
007F
 

Character sets 0x21-0x74 (row numbers 1-84: punctuation, alphabets, numbers, Kana, Kanji)[]

Character set 0x7A (row number 90, traffic symbols)[]

Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below with a heavy border) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.[10] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.[10]

ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7A)[5][11]
_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _A _B _C _D _E _F
2_
26CC
90-1

26CD
90-2
❗︎
2757
90-3

26CF
90-4

26D0
90-5

26D1
90-6

 
90-7

26D2
90-8

26D5
90-9

26D3
90-10
⛔︎
26D4
90-11

 
90-12

 
90-13

 
90-14

 
90-15
3_ WIKI