JIS X 0208
This article includes a list of general references, but it remains largely unverified because it lacks sufficient corresponding inline citations. (December 2017) |
Alias(es) | JIS C 6226 |
---|---|
Language(s) | Japanese, English, Russian Partial support: Greek, Chinese |
Standard | JIS X 0208:1978 through 1997 |
Classification | ISO 2022, DBCS, CJK encoding |
Extensions | ARIB STD B24 Kanji, NEC PC98 DBCS |
Encoding formats | |
Preceded by | JIS X 0201 |
Succeeded by | JIS X 0213 |
Other related encoding(s) | KS X 1001, GB 2312, JIS X 0212 |
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. The official title of the current standard is 7-bit and 8-bit double byte coded KANJI sets for information interchange (7ビット及び8ビットの2バイト情報交換用符号化漢字集合, Nana-Bitto Oyobi Hachi-Bitto no Ni-Baito Jōhō Kōkan'yō Fugōka Kanji Shūgō). It was originally established as JIS C 6226 in 1978, and has been revised in 1983, 1990, and 1997. It is also called Code page 952 by IBM. The 1978 version is also called Code page 955 by IBM.
Scope of use and compatibility[]
The character set JIS X 0208 establishes is primarily for the purpose of information interchange (情報交換, jōhō kōkan) between data processing systems and the devices connected to them, or mutually between data communication systems. This character set can be used for data processing and text processing.
Partial implementations of the character set are not considered compatible. Because there are places where such things have happened as the original drafting committee of the first standard taking care to separate characters between level 1 and level 2 and the second standard then shuffling some variant characters (異体字, itaiji) between the levels, at least in the first and second standards, it is conjectured that non-kanji and level 1-only implementation Japanese computer systems were at one time considered for development. However, such implementations have never been specified as compatible, though an example like the early NEC PC-9801 did exist.[1]
Even though there are provisions in the JIS X 0208:1997 standard concerning compatibility, at the present time, it is generally considered that this standard neither certifies compatibility nor is it an official manufacturing standard that amounts to a declaration of self-compatibility.[2] Consequently, de facto, JIS X 0208-"compatible" products are not considered to exist. Terminology such as "conformant" (準拠, junkyo) and "support" (対応, taiō) is included in JIS X 0208, but the semantics of these terms vary from person to person.
Code charts[]
Lead byte[]
The first encoding byte corresponds to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth.
For lead bytes used for characters other than kanji, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for kanji, links are provided to the appropriate section of Wiktionary's kanji index.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | SP 0020 |
Punct. LEAD 1-_ |
Symbol LEAD 2-_ |
Alnum. LEAD 3-_ |
Hiragana LEAD 4-_ |
Katakana LEAD 5-_ |
Greek LEAD 6-_ |
Cyrillic LEAD 7-_ |
Box LEAD 8-_ |
9-_ |
10-_ |
11-_ |
12-_ |
13-_ |
14-_ |
15-_ |
3_ | Kanji L1 LEAD 16-_ |
Kanji L1 LEAD 17-_ |
Kanji L1 LEAD 18-_ |
Kanji L1 LEAD 19-_ |
Kanji L1 LEAD 20-_ |
Kanji L1 LEAD 21-_ |
Kanji L1 LEAD 22-_ |
Kanji L1 LEAD 23-_ |
Kanji L1 LEAD 24-_ |
Kanji L1 LEAD 25-_ |
Kanji L1 LEAD 26-_ |
Kanji L1 LEAD 27-_ |
Kanji L1 LEAD 28-_ |
Kanji L1 LEAD 29-_ |
Kanji L1 LEAD 30-_ |
Kanji L1 LEAD 31-_ |
4_ | Kanji L1 LEAD 32-_ |
Kanji L1 LEAD 33-_ |
Kanji L1 LEAD 34-_ |
Kanji L1 LEAD 35-_ |
Kanji L1 LEAD 36-_ |
Kanji L1 LEAD 37-_ |
Kanji L1 LEAD 38-_ |
Kanji L1 LEAD 39-_ |
Kanji L1 LEAD 40-_ |
Kanji L1 LEAD 41-_ |
Kanji L1 LEAD 42-_ |
Kanji L1 LEAD 43-_ |
Kanji L1 LEAD 44-_ |
Kanji L1 LEAD 45-_ |
Kanji L1 LEAD 46-_ |
Kanji L1 LEAD 47-_ |
5_ | Kanji L2 LEAD 48-_ |
Kanji L2 LEAD 49-_ |
Kanji L2 LEAD 50-_ |
Kanji L2 LEAD 51-_ |
Kanji L2 LEAD 52-_ |
Kanji L2 LEAD 53-_ |
Kanji L2 LEAD 54-_ |
Kanji L2 LEAD 55-_ |
Kanji L2 LEAD 56-_ |
Kanji L2 LEAD 57-_ |
Kanji L2 LEAD 58-_ |
Kanji L2 LEAD 59-_ |
Kanji L2 LEAD 60-_ |
Kanji L2 LEAD 61-_ |
Kanji L2 LEAD 62-_ |
Kanji L2 LEAD 63-_ |
6_ | Kanji L2 LEAD 64-_ |
Kanji L2 LEAD 65-_ |
Kanji L2 LEAD 66-_ |
Kanji L2 LEAD 67-_ |
Kanji L2 LEAD 68-_ |
Kanji L2 LEAD 69-_ |
Kanji L2 LEAD 70-_ |
Kanji L2 LEAD 71-_ |
Kanji L2 LEAD 72-_ |
Kanji L2 LEAD 73-_ |
Kanji L2 LEAD 74-_ |
Kanji L2 LEAD 75-_ |
Kanji L2 LEAD 76-_ |
Kanji L2 LEAD 77-_ |
Kanji L2 LEAD 78-_ |
Kanji L2 LEAD 79-_ |
7_ | Kanji L2 LEAD 80-_ |
Kanji L2 LEAD 81-_ |
Kanji L2 LEAD 82-_ |
Kanji L2 LEAD 83-_ |
Kanji L2 LEAD 84-_ |
85-_ |
86-_ |
87-_ |
88-_ |
89-_ |
90-_ |
91-_ |
92-_ |
93-_ |
94-_ |
DEL 007F |
Non-Kanji rows[]
Character set 0x21 (row number 1, special characters)[]
Some vendors use slightly different Unicode mapping for this set than the one below. For example, Microsoft maps kuten 1-29 (JIS 0x213D) to U+2015 (Horizontal Bar),[3] whereas Apple maps it to U+2014 (Em Dash).[4] Similarly, Microsoft maps kuten 1-61 (JIS 0x215D) to U+FF0D[3] (the fullwidth form of U+002D Hyphen-Minus), and Apple maps it to U+2212 (Minus Sign).[4] Unicode mapping of the wave dash also differs between vendors. See the cells with footnotes below.
ASCII and JISCII punctuation (shown here with a heavy green border) may use alternative mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as Shift JIS, EUC-JP or ISO 2022-JP.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | IDSP 3000 1-1 |
、 3001 1-2 |
。 3002 1-3 |
, 002C 1-4 |
. 002E 1-5 |
・ 30FB 1-6 |
: 003A 1-7 |
; 003B 1-8 |
? 003F 1-9 |
! 0021 1-10 |
゛ 309B 1-11 |
゜ 309C 1-12 |
´ 00B4 1-13 |
` 0060 1-14 |
¨ 00A8 1-15 | |
3_ | ^ 005E 1-16 |
‾ 203E 1-17 |
_ 005F 1-18 |
ヽ 30FD 1-19 |
ヾ 30FE 1-20 |
ゝ 309D 1-21 |
ゞ 309E 1-22 |
〃 3003 1-23 |
仝 4EDD 1-24 |
々 3005 1-25 |
〆 3006 1-26 |
〇 3007 1-27 |
ー 30FC 1-28 |
—[b] 2014 1-29 |
‐ 2010 1-30 |
/ 002F 1-31 |
4_ | \ 005C 1-32 |
〜[c] 301C 1-33 |
‖[d] 2016 1-34 |
| 007C 1-35 |
… 2026 1-36 |
‥ 2025 1-37 |
‘ 2018 1-38 |
’ 2019 1-39 |
“ 201C 1-40 |
” 201D 1-41 |
( 0028 1-42 |
) 0029 1-43 |
〔 3014 1-44 |
〕 3015 1-45 |
[ 005B 1-46 |
] 005D 1-47 |
5_ | { 007B 1-48 |
} 007D 1-49 |
〈 3008 1-50 |
〉 3009 1-51 |
《 300A 1-52 |
》 300B 1-53 |
「 300C 1-54 |
」 300D 1-55 |
『 300E 1-56 |
』 300F 1-57 |
【 3010 1-58 |
】 3011 1-59 |
+ 002B 1-60 |
−[e] 2212 1-61 |
± 00B1 1-62 |
× 00D7 1-63 |
6_ | ÷ 00F7 1-64 |
= 003D 1-65 |
≠ 2260 1-66 |
< 003C 1-67 |
> 003E 1-68 |
≦ 2266 1-69 |
≧ 2267 1-70 |
∞ 221E 1-71 |
∴ 2234 1-72 |
♂ 2642 1-74 |
♀ 2640 1-73 |
° 00B0 1-75 |
′ 2032 1-76 |
″ 2033 1-77 |
℃ 2103 1-78 |
¥ 00A5 1-79 |
7_ | $ 0024 1-80 |
¢ 00A2 1-81 |
£ 00A3 1-82 |
% 0025 1-83 |
# 0023 1-84 |
& 0026 1-85 |
* 002A 1-86 |
@ 0040 1-87 |
§ 00A7 1-88 |
☆ 2606 1-89 |
★ 2605 1-90 |
○ 25CB 1-91 |
● 25CF 1-92 |
◎ 25CE 1-93 |
◇ 25C7 1-94 |
Letter Number Punctuation Symbol Other Undefined
Character set 0x22 (row number 2, special characters)[]
Most of the characters in this set were added in 1983, except for characters 0x2221–0x222E (kuten 2-1 through 2-14, or the first line of the chart below), which were included in the original 1978 version of the standard.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | ◆ 25C6 2-1 |
□ 25A1 2-2 |
■ 25A0 2-3 |
△ 25B3 2-4 |
▲ 25B2 2-5 |
▽ 25BD 2-6 |
▼ 25BC 2-7 |
※ 203B 2-8 |
〒 3012 2-9 |
→ 2192 2-10 |
← 2190 2-11 |
↑ 2191 2-12 |
↓ 2193 2-13 |
〓 3013 2-14 |
2-15 | |
3_ | 2-16 |
2-17 |
2-18 |
2-19 |
2-21 |
2-21 |
2-22 |
2-23 |
2-24 |
2-25 |
∈ 2208 2-26 |
∋ 220B 2-27 |
⊆ 2286 2-28 |
⊇ 2287 2-29 |
⊂ 2282 2-30 |
⊃ 2283 2-31 |
4_ | ∪ 222A 2-32 |
∩ 2229 2-33 |
2-34 |
2-35 |
2-36 |
2-37 |
2-38 |
2-39 |
2-40 |
2-41 |
∧ 2227 2-42 |
∨ 2228 2-43 |
¬ 00AC 2-44 |
⇒ 21D2 2-45 |
⇔ 21D4 2-46 |
∀ 2200 2-47 |
5_ | ∃ 2203 2-48 |
2-49 |
2-50 |
2-51 |
2-52 |
2-53 |
2-54 |
2-55 |
2-56 |
2-57 |
2-58 |
2-59 |
∠ 2220 2-60 |
⊥ 22A5 2-61 |
⌒ 2312 2-62 |
∂ 2202 2-63 |
6_ | ∇ 2207 2-64 |
≡ 2261 2-65 |
≒ 2252 2-66 |
≪ 226A 2-67 |
≫ 226B 2-68 |
√ 221A 2-69 |
∽ 223D 2-70 |
∝ 221D 2-71 |
∵ 2235 2-72 |
∫ 222B 2-73 |
∬ 222C 2-74 |
2-75 |
2-76 |
2-77 |
2-78 |
2-79 |
7_ | 2-80 |
2-81 |
Å 212B 2-82 |
‰ 2030 2-83 |
♯ 266F 2-84 |
♭ 266D 2-85 |
♪ 266A 2-86 |
† 2020 2-87 |
‡ 2021 2-88 |
¶ 00B6 2-89 |
2-90 |
2-91 |
2-92 |
2-93 |
◯ 25EF 2-94 |
Character set 0x23 (row number 3, digits and Roman)[]
This set includes a subset of the ISO 646 invariant set (and therefore also a subset of both ASCII and the JIS X 0201 Roman set), minus punctuation and symbols, comprising western Arabic numerals and both cases of the Basic Latin alphabet. Characters in this set may use alternative Unicode mappings to the Halfwidth and Fullwidth Forms block if used in an encoding which combines JIS X 0208 with ASCII or with JIS X 0201, such as EUC-JP, Shift JIS or ISO 2022-JP.
Compare row 3 of KPS 9566, which this row exactly matches. Compare and contrast row 3 of KS X 1001 and of GB 2312, which include their entire national variants of ISO 646 in this row, rather than only the alphanumeric subset.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | 3-1 |
3-2 |
3-3 |
3-4 |
3-5 |
3-6 |
3-7 |
3-8 |
3-9 |
3-10 |
3-11 |
3-12 |
3-13 |
3-14 |
3-15 | |
3_ | 0 0030 3-16 |
1 0031 3-17 |
2 0032 3-18 |
3 0033 3-19 |
4 0034 3-20 |
5 0035 3-21 |
6 0036 3-22 |
7 0037 3-23 |
8 0038 3-24 |
9 0039 3-25 |
3-26 |
3-27 |
3-28 |
3-29 |
3-30 |
3-31 |
4_ | 3-32 |
A 0041 3-33 |
B 0042 3-34 |
C 0043 3-35 |
D 0044 3-36 |
E 0045 3-37 |
F 0046 3-38 |
G 0047 3-39 |
H 0048 3-40 |
I 0049 3-41 |
J 004A 3-42 |
K 004B 3-43 |
L 004C 3-44 |
M 004D 3-45 |
N 004E 3-46 |
O 004F 3-47 |
5_ | P 0050 3-48 |
Q 0051 3-49 |
R 0052 3-50 |
S 0053 3-51 |
T 0054 3-52 |
U 0055 3-53 |
V 0056 3-54 |
W 0057 3-55 |
X 0058 3-56 |
Y 0059 3-57 |
Z 005A 3-58 |
3-59 |
3-60 |
3-61 |
3-62 |
3-63 |
6_ | 3-64 |
a 0061 3-65 |
b 0062 3-66 |
c 0063 3-67 |
d 0064 3-68 |
e 0065 3-69 |
f 0066 3-70 |
g 0067 3-71 |
h 0068 3-72 |
i 0069 3-73 |
j 006A 3-74 |
k 006B 3-75 |
l 006C 3-76 |
m 006D 3-77 |
n 006E 3-78 |
o 006F 3-79 |
7_ | p 0070 3-80 |
q 0071 3-81 |
r 0072 3-82 |
s 0073 3-83 |
t 0074 3-84 |
u 0075 3-85 |
v 0076 3-86 |
w 0077 3-87 |
x 0078 3-88 |
y 0079 3-89 |
z 007A 3-90 |
3-91 |
3-92 |
3-93 |
3-94 |
Character set 0x24 (row number 4, Hiragana)[]
This row contains Japanese Hiragana.
Compare row 4 of GB 2312, which matches this row. Compare and contrast row 10 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | ぁ 3041 4-1 |
あ 3042 4-2 |
ぃ 3043 4-3 |
い 3044 4-4 |
ぅ 3045 4-5 |
う 3046 4-6 |
ぇ 3047 4-7 |
え 3048 4-8 |
ぉ 3049 4-9 |
お 304A 4-10 |
か 304B 4-11 |
が 304C 4-12 |
き 304D 4-13 |
ぎ 304E 4-14 |
く 304F 4-15 | |
3_ | ぐ 3050 4-16 |
け 3051 4-17 |
げ 3052 4-18 |
こ 3053 4-19 |
ご 3054 4-20 |
さ 3055 4-21 |
ざ 3056 4-22 |
し 3057 4-23 |
じ 3058 4-24 |
す 3059 4-25 |
ず 305A 4-26 |
せ 305B 4-27 |
ぜ 305C 4-28 |
そ 305D 4-29 |
ぞ 305E 4-30 |
た 305F 4-31 |
4_ | だ 3060 4-32 |
ち 3061 4-33 |
ぢ 3062 4-34 |
っ 3063 4-35 |
つ 3064 4-36 |
づ 3065 4-37 |
て 3066 4-38 |
で 3067 4-39 |
と 3068 4-40 |
ど 3069 4-41 |
な 306A 4-42 |
に 306B 4-43 |
ぬ 306C 4-44 |
ね 306D 4-45 |
の 306E 4-46 |
は 306F 4-47 |
5_ | ば 3070 4-48 |
ぱ 3071 4-49 |
ひ 3072 4-50 |
び 3073 4-51 |
ぴ 3074 4-52 |
ふ 3075 4-53 |
ぶ 3076 4-54 |
ぷ 3077 4-55 |
へ 3078 4-56 |
べ 3079 4-57 |
ぺ 307A 4-58 |
ほ 307B 4-59 |
ぼ 307C 4-60 |
ぽ 307D 4-61 |
ま 307E 4-62 |
み 307F 4-63 |
6_ | む 3080 4-64 |
め 3081 4-65 |
も 3082 4-66 |
ゃ 3083 4-67 |
や 3084 4-68 |
ゅ 3085 4-69 |
ゆ 3086 4-70 |
ょ 3087 4-71 |
よ 3088 4-72 |
ら 3089 4-73 |
り 308A 4-74 |
る 308B 4-75 |
れ 308C 4-76 |
ろ 308D 4-77 |
ゎ 308E 4-78 |
わ 308F 4-79 |
7_ | ゐ 3090 4-80 |
ゑ 3091 4-81 |
を 3092 4-82 |
ん 3093 4-83 |
4-84 |
4-85 |
4-86 |
4-87 |
4-88 |
4-89 |
4-90 |
4-91 |
4-92 |
4-93 |
4-94 |
Character set 0x25 (row number 5, Katakana)[]
This row contains Japanese Katakana.
Compare row 5 of GB 2312, which matches this row. Compare and contrast row 11 of KPS 9566 and of KS X 1001, which use the same layout, but in a different row. Contrast the considerably different Katakana layout used by JIS X 0201.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | ァ 30A1 5-1 |
ア 30A2 5-2 |
ィ 30A3 5-3 |
イ 30A4 5-4 |
ゥ 30A5 5-5 |
ウ 30A6 5-6 |
ェ 30A7 5-7 |
エ 30A8 5-8 |
ォ 30A9 5-9 |
オ 30AA 5-10 |
カ 30AB 5-11 |
ガ 30AC 5-12 |
キ 30AD 5-13 |
ギ 30AE 5-14 |
ク 30AF 5-15 | |
3_ | グ 30B0 5-16 |
ケ 30B1 5-17 |
ゲ 30B2 5-18 |
コ 30B3 5-19 |
ゴ 30B4 5-20 |
サ 30B5 5-21 |
ザ 30B6 5-22 |
シ 30B7 5-23 |
ジ 30B8 5-24 |
ス 30B9 5-25 |
ズ 30BA 5-26 |
セ 30BB 5-27 |
ゼ 30BC 5-28 |
ソ 30BD 5-29 |
ゾ 30BE 5-30 |
タ 30BF 5-31 |
4_ | ダ 30C0 5-32 |
チ 30C1 5-33 |
ヂ 30C2 5-34 |
ッ 30C3 5-35 |
ツ 30C4 5-36 |
ヅ 30C5 5-37 |
テ 30C6 5-38 |
デ 30C7 5-39 |
ト 30C8 5-40 |
ド 30C9 5-41 |
ナ 30CA 5-42 |
ニ 30CB 5-43 |
ヌ 30CC 5-44 |
ネ 30CD 5-45 |
ノ 30CE 5-46 |
ハ 30CF 5-47 |
5_ | バ 30D0 5-48 |
パ 30D1 5-49 |
ヒ 30D2 5-50 |
ビ 30D3 5-51 |
ピ 30D4 5-52 |
フ 30D5 5-53 |
ブ 30D6 5-54 |
プ 30D7 5-55 |
ヘ 30D8 5-56 |
ベ 30D9 5-57 |
ペ 30DA 5-58 |
ホ 30DB 5-59 |
ボ 30DC 5-60 |
ポ 30DD 5-61 |
マ 30DE 5-62 |
ミ 30DF 5-63 |
6_ | ム 30E0 5-64 |
メ 30E1 5-65 |
モ 30E2 5-66 |
ャ 30E3 5-67 |
ヤ 30E4 5-68 |
ュ 30E5 5-69 |
ユ 30E6 5-70 |
ョ 30E7 5-71 |
ヨ 30E8 5-72 |
ラ 30E9 5-73 |
リ 30EA 5-74 |
ル 30EB 5-75 |
レ 30EC 5-76 |
ロ 30ED 5-77 |
ヮ 30EE 5-78 |
ワ 30EF 5-79 |
7_ | ヰ 30F0 5-80 |
ヱ 30F1 5-81 |
ヲ 30F2 5-82 |
ン 30F3 5-83 |
ヴ 30F4 5-84 |
ヵ 30F5 5-85 |
ヶ 30F6 5-86 |
5-87 |
5-88 |
5-89 |
5-90 |
5-91 |
5-92 |
5-93 |
5-94 |
Character set 0x26 (row number 6, Greek)[]
This row contains basic support for the modern Greek alphabet, without diacritics or the final sigma.
Compare row 6 of GB 2312 and GB 12345 and row 6 of KPS 9566, which include the same Greek letters in the same layout, although GB 12345 adds vertical presentation forms and KPS 9566 adds Roman numerals. Compare and contrast row 5 of KS X 1001, which offsets the Greek letters to include the Roman numerals first.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | Α 0391 6-1 |
Β 0392 6-2 |
Γ 0393 6-3 |
Δ 0394 6-4 |
Ε 0395 6-5 |
Ζ 0396 6-6 |
Η 0397 6-7 |
Θ 0398 6-8 |
Ι 0399 6-9 |
Κ 039A 6-10 |
Λ 039B 6-11 |
Μ 039C 6-12 |
Ν 039D 6-13 |
Ξ 039E 6-14 |
Ο 039F 6-15 | |
3_ | Π 03A0 6-16 |
Ρ 03A1 6-17 |
Σ 03A3 6-18 |
Τ 03A4 6-19 |
Υ 03A5 6-20 |
Φ 03A6 6-21 |
Χ 03A7 6-22 |
Ψ 03A8 6-23 |
Ω 03A9 6-24 |
6-25 |
6-26 |
6-27 |
6-28 |
6-29 |
6-30 |
6-31 |
4_ | 6-32 |
α 03B1 6-33 |
β 03B2 6-34 |
γ 03B3 6-35 |
δ 03B4 6-36 |
ε 03B5 6-37 |
ζ 03B6 6-38 |
η 03B7 6-39 |
θ 03B8 6-40 |
ι 03B9 6-41 |
κ 03BA 6-42 |
λ 03BB 6-43 |
μ 03BC 6-44 |
ν 03BD 6-45 |
ξ 03BE 6-46 |
ο 03BF 6-47 |
5_ | π 03C0 6-48 |
ρ 03C1 6-49 |
σ 03C3 6-50 |
τ 03C4 6-51 |
υ 03C5 6-52 |
φ 03C6 6-53 |
χ 03C7 6-54 |
ψ 03C8 6-55 |
ω 03C9 6-56 |
6-57 |
6-58 |
6-59 |
6-60 |
6-61 |
6-62 |
6-63 |
6_ | 6-64 |
6-65 |
6-66 |
6-67 |
6-68 |
6-69 |
6-70 |
6-71 |
6-72 |
6-73 |
6-74 |
6-75 |
6-76 |
6-77 |
6-78 |
6-79 |
7_ | 6-80 |
6-81 |
6-82 |
6-83 |
6-84 |
6-85 |
6-86 |
6-87 |
6-88 |
6-89 |
6-90 |
6-91 |
6-92 |
6-93 |
6-94 |
Character set 0x27 (row number 7, Cyrillic)[]
This row contains the modern Russian alphabet and is not necessarily sufficient for representing other forms of the Cyrillic script.
Compare row 7 of GB 2312, which matches this row. Compare and contrast row 12 of KS X 1001 and row 5 of KPS 9566, which use the same layout (but in a different row).
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | А 0410 7-1 |
Б 0411 7-2 |
В 0412 7-3 |
Г 0413 7-4 |
Д 0414 7-5 |
Е 0415 7-6 |
Ё 0401 7-7 |
Ж 0416 7-8 |
З 0417 7-9 |
И 0418 7-10 |
Й 0419 7-11 |
К 041A 7-12 |
Л 041B 7-13 |
М 041C 7-14 |
Н 041D 7-15 | |
3_ | О 041E 7-16 |
П 041F 7-17 |
Р 0420 7-18 |
С 0421 7-19 |
Т 0422 7-20 |
У 0423 7-21 |
Ф 0424 7-22 |
Х 0425 7-23 |
Ц 0426 7-24 |
Ч 0427 7-25 |
Ш 0428 7-26 |
Щ 0429 7-27 |
Ъ 042A 7-28 |
Ы 042B 7-29 |
Ь 042C 7-30 |
Э 042D 7-31 |
4_ | Ю 042E 7-32 |
Я 042F 7-33 |
7-34 |
7-35 |
7-36 |
7-37 |
7-38 |
7-39 |
7-40 |
7-41 |
7-42 |
7-43 |
7-44 |
7-45 |
7-46 |
7-47 |
5_ | 7-48 |
а 0430 7-49 |
б 0431 7-50 |
в 0432 7-51 |
г 0433 7-52 |
д 0434 7-53 |
е 0435 7-54 |
ё 0451 7-55 |
ж 0436 7-56 |
з 0437 7-57 |
и 0438 7-58 |
й 0439 7-59 |
к 043A 7-60 |
л 043B 7-61 |
м 043C 7-62 |
н 043D 7-63 |
6_ | о 043E 7-64 |
п 043F 7-65 |
р 0440 7-66 |
с 0441 7-67 |
т 0442 7-68 |
у 0443 7-69 |
ф 0444 7-70 |
х 0445 7-71 |
ц 0446 7-72 |
ч 0447 7-73 |
ш 0448 7-74 |
щ 0449 7-75 |
ъ 044A 7-76 |
ы 044B 7-77 |
ь 044C 7-78 |
э 044D 7-79 |
7_ | ю 044E 7-80 |
я 044F 7-81 |
7-82 |
7-83 |
7-84 |
7-85 |
7-86 |
7-87 |
7-88 |
7-89 |
7-90 |
7-91 |
7-92 |
7-93 |
7-94 |
Character set 0x28 (row number 8, box drawing)[]
All characters in this set were added in 1983, and were not present in the original 1978 revision of the standard.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | │ 2502 8-1 |
─ 2500 8-2 |
┐ 2510 8-3 |
┌ 250C 8-4 |
└ 2514 8-5 |
┘ 2518 8-6 |
┤ 2524 8-7 |
┬ 252C 8-8 |
├ 251C 8-9 |
┴ 2534 8-10 |
┼ 253C 8-11 |
━ 2501 8-12 |
┃ 2503 8-13 |
┏ 250F 8-14 |
┓ 2513 8-15 | |
3_ | ┗ 2517 8-16 |
┛ 251B 8-17 |
┫ 252B 8-18 |
┳ 2533 8-19 |
┣ 2523 8-20 |
┻ 253B 8-21 |
╋ 254B 8-22 |
┠ 2520 8-23 |
┯ 252F 8-24 |
┨ 2528 8-25 |
┷ 2537 8-26 |
┿ 253F 8-27 |
┝ 251D 8-28 |
┰ 2530 8-29 |
┥ 2525 8-30 |
┷ 2537 8-31 |
4_ | ╂ 2542 8-32 |
8-33 |
8-34 |
8-35 |
8-36 |
8-37 |
8-38 |
8-39 |
8-40 |
8-41 |
8-42 |
8-43 |
8-44 |
8-45 |
8-46 |
8-47 |
5_ | 8-48 |
8-49 |
8-50 |
8-51 |
8-52 |
8-53 |
8-54 |
8-55 |
8-56 |
8-57 |
8-58 |
8-59 |
8-60 |
8-61 |
8-62 |
8-63 |
6_ | 8-64 |
8-65 |
8-66 |
8-67 |
8-68 |
8-69 |
8-70 |
8-71 |
8-72 |
8-73 |
8-74 |
8-75 |
8-76 |
8-77 |
8-78 |
8-79 |
7_ | 8-80 |
8-81 |
8-82 |
8-83 |
8-84 |
8-85 |
8-86 |
8-87 |
8-88 |
8-89 |
8-90 |
8-91 |
8-92 |
8-93 |
8-94 |
Extension character set 0x2D (row number 13, NEC special characters)[]
Rows 9 through 15 of the JIS X 0208 standard are left empty.
However, the following layout for row 13, first introduced by NEC, is a common extension. It is used (with minor variations, noted in footnotes) by Windows-932[3] (which is matched by the WHATWG Encoding Standard used by HTML5), by the PostScript variant (but, since KanjiTalk version 7, not the regular variant)[5] of MacJapanese, and by JIS X 0213 (the successor to JIS X 0208).[5][6] Unlike the other extensions made by Windows-932/WHATWG and JIS X 0213, the two match rather than colliding, so decoding of most of this row is better supported than the other extensions made by JIS X 0213.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ | ① 2460 13-1 |
② 2461 13-2 |
③ 2462 13-3 |
④ 2463 13-4 |
⑤ 2464 13-5 |
⑥ 2465 13-6 |
⑦ 2466 13-7 |
⑧ 2467 13-8 |
⑨ 2468 13-9 |
⑩ 2469 13-10 |
⑪ 246A 13-11 |
⑫ 246B 13-12 |
⑬ 246C 13-13 |
⑭ 246D 13-14 |
⑮ 246E 13-15 | |
3_ | ⑯ 246F 13-16 |
⑰ 2470 13-17 |
⑱ 2471 13-18 |
⑲ 2472 13-19 |
⑳ 2473 13-20 |
Ⅰ 2160 13-21 |
Ⅱ 2161 13-22 |
Ⅲ 2162 13-23 |
Ⅳ 2163 13-24 |
Ⅴ 2164 13-25 |
Ⅵ 2165 13-26 |
Ⅶ 2166 13-27 |
Ⅷ 2167 13-28 |
Ⅸ 2168 13-29 |
Ⅹ 2169 13-30 |
Ⅺ[f] 216A 13-31 |
4_ | ㍉ 3349 13-32 |
㌔ 3314 13-33 |
㌢ 3322 13-34 |
㍍ 334D 13-35 |
㌘ 3318 13-36 |
㌧ 3327 13-37 |
㌃ 3303 13-38 |
㌶ 3336 13-39 |
㍑ 3351 13-40 |
㍗ 3357 13-41 |
㌍ 330D 13-42 |
㌦ 3326 13-43 |
㌣ 3323 13-44 |
㌫ 332B 13-45 |
㍊ 334A 13-46 |
㌻ 333B 13-47 |
5_ | ㎜ 339C 13-48 |
㎝ 339D 13-49 |
㎞ 339E 13-50 |
㎎ 338E 13-51 |
㎏ 338F 13-52 |
㏄ 33C4 13-53 |
㎡ 33A1 13-54 |
Ⅻ[f] 216B 13-55 |
13-56 |
13-57 |
13-58 |
13-59 |
13-60 |
13-61 |
13-62 |
㍻[g] 337B 13-63 |
6_ | 〝 301D 13-64 |
〟 301F 13-65 |
№ 2116 13-66 |
㏍ 33CD 13-67 |
℡ 2121 13-68 |
32A4 13-69 |
32A5 13-70 |
32A6 13-71 |
㊧ 32A7 13-72 |
㊨ 32A8 13-73 |
㈱ 3231 13-74 |
3232 13-75 |
3239 13-76 |
㍾ 337E 13-77 |
㍽ 337D 13-78 |
㍼ 337C 13-79 |
7_ | ≒[h] 2252 13-80 |
≡[h] 2261 13-81 |
∫[h] 222B 13-82 |
∮ 222E 13-83 |
∑ 2211 13-84 |
√[h] 221A 13-85 |
⊥[h] 22A5 13-86 |
∠[h] 2220 13-87 |
∟ 221F 13-88 |
⊿ 22BF 13-89 |
∵[h] 2235 13-90 |
∩[h] 2229 13-91 |
∪[h] 222A 13-92 |
❖[f] 2756 13-93 |
☞[f] 261E 13-94 |
Kanji rows[]
Code structure[]
In order to represent code points, column/line numbers are used for one-byte codes and kuten numbers are used for two-byte codes. For a way to identify a character without depending on a code, character names are used.
Single byte codes[]
Almost all JIS X 0208 graphic character codes are represented with two bytes of at least seven bits each. However, every control character, as well as the plain space – although not the ideographic space – is represented with a one-byte code. In order to represent the bit combination (ビット組合せ, bitto kumiawase) of a one-byte code, two decimal numbers – a column number and a line number – are used. Three high-order bits out of seven or four high-order bits out of eight, counting from zero to seven or from zero to fifteen respectively, form the column number. Four low-order bits counting from zero to fifteen form the line number. Each decimal number corresponds to one hexadecimal digit. For example, the bit combination corresponding to the graphic character "space" is 010 0000 as a 7-bit number, and 0010 0000 as an 8-bit number. In column/line notation, this is represented as 2/0. Other representations of the same single-byte code include 0x20 as hexadecimal, or 32 as a single decimal number.
Code points and code numbers[]
The double-byte codes are laid out in 94 numbered groups, each called a row (区, ku, lit. "section"). Every row contains 94 numbered codes, each called a cell (点, ten, lit. "point").[i] This makes a total of 8836 (94 × 94) possible code points (although not all are assigned, see below); these are laid out in the standard in a 94-line, 94-column code table.
A row number and a cell number (each numbered from 1 to 94, for a standard JIS X 0208 code) form a kuten (区点) point, which is used to represent double-byte code points. A code number or kuten number (区点番号, kuten bangō) is expressed in the form "row-cell", the row and cell numbers being separated by a hyphen. For example, the character "亜" has a code point at row 16, cell 1, so its code number is represented as "16-01".
In 7-bit JIS X 0208 (as might be switched to in JIS X 0202 / ISO-2022-JP), both bytes must be from the 94-byte range of 0x21 (used for row or cell number 1) through 0x7E (used for row or cell number 94) – exactly corresponding to the range used for 7-bit ASCII printing characters, not counting the space. Accordingly, the encoded bytes are obtained by adding 0x20 (32) to each number.[7] For instance, the above example of 16-01 ("亜") would be represented by the bytes 0x30 0x21
. The 8-bit EUC-JP instead uses the range 0xA1 through 0xFE (setting the high bit to 1), whereas other encodings such as Shift JIS use more complicated transforms. Shift JIS includes more encoding space than is needed for JIS X 0208 itself; some Shift JIS specific extensions to JIS X 0208 make use of row numbers above 94.[8]
This structure is also used in the Mainland Chinese GB 2312 (where it is natively known as 区位; qūwèi) and the South Korean KS C 5601 (currently KS X 1001; the ku and ten are respectively known as hang and yol).[9] The later JIS X 0213 extends this structure by having more than one plane (面, men, lit. "face") of rows, which is also the structure used by CNS 11643.
Unassigned code points[]
Among the 2-byte codes, rows 9 to 15 and 85 to 94 are unassigned code points (空き領域, aki ryōiki); that is, they are code points with no characters assigned to them. Also, some cells in other rows are also essentially unassigned code points.
These empty areas contain code points that should basically not be used. Except when there is prior agreement among the relevant parties, characters (gaiji) for information interchange should not be assigned to the unassigned code points.
Even when assigning characters to unassigned code points, graphic characters defined in the standard should not be assigned to them, and the same character should not be assigned to multiple unassigned code points; characters should not be duplicated in the set.
Furthermore, when assigning characters to unassigned code points, it is necessary to be cautious of unification in regards to kanji glyphs. For example, row 25 cell 66 corresponds to the kanji meaning "high" or "expensive"; both the form with a component resembling the "mouth" character (口) in the middle (高) and the less common form with a ladder-like construction in the same location (髙) are subsumed into the same code point. Consequently, limiting point 25-66 to the "mouth" form and assigning the latter "ladder" form to an unassigned code point would technically be in violation of the standard.
In practice, however, several vendor-specific Shift JIS variants, including Windows-932 and MacJapanese, encode vendor extensions in unallocated rows of the encoding space for JIS X 0208. Also, most of the codes unassigned in JIS X 0208 are assigned by the newer JIS X 0213 standard.
Character names[]
Each JIS X 0208 character is given a name. By using a character's name, it is possible to identify characters without relying on their codes. The names of characters are coordinated with other character set standards, notably the Universal Coded Character Set (UCS/Unicode), so this is one possible source of character mappings to character sets such as Unicode. For example, both the character at ISO/IEC 646 International Reference Version (US-ASCII) column 4 line 1 and the one at JIS X 0208 row 3 cell 33 have the name "LATIN CAPITAL LETTER A". Therefore, the character at 4/1 in ASCII and the character at 3-33 in JIS X 0208 can be regarded as the same character (although, in practice, alternative mapping is used for the JIS X 0208 character due to encodings providing ASCII separately). Conversely, ASCII characters 2/2 (quotation mark), 2/7 (apostrophe), 2/13 (hyphen-minus), and 7/14 (tilde) can be determined to be characters that do not exist in this standard.
Character names of non-kanji characters use uppercase Roman letters, spaces, and hyphens. Non-kanji characters are given a Japanese-language common name (日本語通用名称, Nihongo tsūyō meishō), but some provisions for these names do not exist.[j] The names of kanji, on the other hand, are mechanically set according to the corresponding hexadecimal representation of their code in UCS/Unicode. The name of a kanji can be arrived at by prepending the Unicode codepoint with "CJK UNIFIED IDEOGRAPH-". For example, row 16 cell 1 (亜) corresponds to U+4E9C in UCS, so the name of it would be "CJK UNIFIED IDEOGRAPH-4E9C". Kanji are not given Japanese common names.
Kanji set[]
Overview[]
JIS X 0208 prescribes a set of 6879 graphical characters that correspond to two-byte codes with either seven or eight bits to the byte; in JIS X 0208, this is called the kanji set (漢字集合, kanji shūgō), which includes 6355 kanji as well as 524 non-kanji (非漢字, hikanji), including characters such as Latin letters, kana, and so forth.
- Special characters
- Occupies rows 1 and 2. There are 18 descriptor symbols (記述記号, kijutsu kigō) such as the "ideographic space" ( ), and the Japanese comma and period; eight diacritical marks such as dakuten and handakuten; 10 characters for things that follow kana or kanji (仮名又は漢字に準じるもの, kana mata wa kanji ni junjiru mono) such as the Iteration mark; 22 bracket symbols (括弧記号, kakko kigō); 45 mathematical symbols (学術記号, gakujutsu kigō); and 32 unit symbols, which includes the currency sign and the postal mark, for a total of 147 characters.
- Numerals
- Occupies part of row 3. The ten digits from "0" to "9".
- Latin letters
- Occupies part of row 3. The 26 letters of the English alphabet in uppercase and lowercase form for a total of 52.
- Hiragana
- Occupies row 4. Contains 48 unvoiced kana (including the obsolete wi and we), 20 voiced kana (dakuten), 5 semi-voiced kana (handakuten), 10 small kana for palatalized and assimilated sounds, for a total of 83 characters.
- Katakana
- Occupies row 5. There are 86 characters; in addition to the katakana equivalents of the hiragana characters, the small ka/ke kana (ヵ/ヶ) and the vu kana (ヴ).
- Greek letters
- Occupies row 6. The 24 letters of the Greek alphabet in uppercase and lowercase form (minus the final sigma) for a total of 48.
- Cyrillic letters
- Occupies row 7. The 33 letters of the Russian alphabet in uppercase and lowercase form for a total of 66.
- Box-drawing characters
- Occupies row 8. Thin segments, thick segments, and mixed thin and thick segments, 32 total.
- Kanji
- The 2965 characters of level 1 (第1水準, dai ichi suijun) from row 16 to row 47, and the 3390 characters of level 2 (第2水準, dai ni suijun) from row 48 to row 84 for a total of 6355.
Special characters, numerals, and Latin characters[]
As for the special characters in the kanji set, some characters from the graphic character set of the International Reference Version (IRV) of ISO/IEC 646:1991 (equivalent to ASCII) are absent from JIS X 0208. There are the aforementioned four characters "QUOTATION MARK", "APOSTROPHE", "HYPHEN-MINUS", and "TILDE". The former three are split into different code points in the kanji set (Nishimura, 1978; JIS X 0221-1:2001 standard, Section 3.8.7). The "TILDE" of IRV has no corresponding character in the kanji set.
In the following table, the ISO/IEC 646:1991 IRV characters in question are compared with their multiple equivalents in JIS X 0208, except for the IRV character "TILDE", which is compared with the "WAVE DASH" of JIS X 0208. The entries under the "Symbol" columns utilize UCS/Unicode code points, so the specifics of display may differ.
The ASCII/IRV characters without exact JIS X 0208 equivalents were later assigned code points by JIS X 0213, these are also listed below, as are Microsoft's mapping of the four characters.
ISO/IEC 646:1991 IRV | JIS X 0208 | ||||||
---|---|---|---|---|---|---|---|
Column/Line | x0213[10] | Microsoft | Symbol | Name | Kuten | Symbol | Name |
2/2 | 1-2-16 | 92-94[A] 115-24[B] |
" | QUOTATION MARK | 1-15 | ¨ | DIAERESIS |
1-40 | “ | LEFT DOUBLE QUOTATION MARK | |||||
1-41 | ” | RIGHT DOUBLE QUOTATION MARK | |||||
1-77 | ″ | DOUBLE PRIME | |||||
2/7 | 1-2-15 | 92-93[A] 115-23[B] |
' | APOSTROPHE | 1-13 | ´ | ACUTE ACCENT |
1-38 | ‘ | LEFT SINGLE QUOTATION MARK | |||||
1-39 | ’ | RIGHT SINGLE QUOTATION MARK | |||||
1-76 | ′ | PRIME | |||||
2/13 | 1-2-17 | 1-61[C] | - | HYPHEN-MINUS | 1-30 | ‐ | HYPHEN |
1-61 | − | MINUS SIGN | |||||
7/14 | 1-2-18 | 1-33[D] | ~ | TILDE | (no corresponding character) | ||
(no corresponding character) | 1-33 | 〜 | WAVE DASH[D] |
- ^ Jump up to: a b From "NEC selection of IBM extensions". Occupies a code point unallocated in JIS X 0208.
- ^ Jump up to: a b From "IBM extensions". Outside range of JIS X 0208, but encodable in Shift_JIS.
- ^ Microsoft treat the JIS minus sign as a fullwidth form of the hyphen-minus.
- ^ Jump up to: a b Wave Dash is sometimes treated as a fullwidth form of the tilde, e.g. by Microsoft (see Tilde § Unicode and Shift JIS encoding of wave dash). The ASCII / IRV tilde is an ambiguous code point which may appear either as a tilde accent mark (˜) or as a dash with the same curvature (∼), although the dash is more common due to the spacing accent having a separate code point in Windows-1252; there is no JIS X 0208 character for a tilde accent. Character 1-2-18 in JIS X 0213 is shown as a tilde accent in the code chart.[10]
This means that the kanji set is the most widespread non-upward-compatible character set in the world; it is counted as one of the weak points of this standard.
Even with the 90 special characters, numerals, and Latin letters the kanji set and the IRV set have in common, this standard does not follow the arrangement of ISO/IEC 646. These 90 characters are split between rows 1 (punctuation) and 3 (letters and numbers), although row 3 does follow ISO 646 arrangement for the 62 letters and numbers alone (e.g. 4/1
("A") in ISO 646 becomes 2/3 4/1
(i.e. 3-33) in JIS X 0208).
As to the cause of how these numerals, Latin letters, and so forth in the kanji set are the "full-width alphanumeric characters" (全角英数字, zenkaku eisūji) and how the original implementation came forth with a differing interpretation compared to the IRV, it is thought that it is due to these incompatibilities.
Ever since the first standard, it has been possible to represent composites (合成, gōsei) such as encircled numbers, ligatures for measurement unit names, and Roman numerals;[11] they were not given independent kuten code points. Although individual companies that manufacture information systems can make an effort to represent these characters as customers may require by the composition of the characters, none has requested to have them added to the standard, instead choosing to proprietarily offer them as gaiji.
In the fourth standard (1997), all these characters were explicitly defined as characters that accompany an advancement of the current position; that is to say, they are spacing characters. Furthermore, it was ruled that they should not be made by the composition of characters. For this reason, it became disallowed to represent Latin characters with diacritics at all, with possibly the sole exception of the ångström symbol (Å) at row 2 cell 82.
Hiragana and katakana[]
The hiragana and katakana in JIS X 0208, unlike JIS X 0201, includes dakuten and handakuten markings as part of a character. The katakana wi (ヰ) and we (ヱ) (both obsolete in modern Japanese) as well as the small wa (ヮ), not in JIS X 0201, are also included.
The arrangement of kana in JIS X 0208 is different from the arrangement of katakana in JIS X 0201. In JIS X 0201, the syllabary starts with wo (ヲ), followed by the small kana sorted by gojūon order, followed by the full-size kana, also in gojūon order (ヲァィゥェォャュョッーアイウエオ......ラリルレロワン). On the other hand, in JIS X 0208, the kana are sorted first by gojūon order, then in the order of "small kana, full-size kana, kana with dakuten, and kana with handakuten" such that the same fundamental kana is grouped with its derivatives (ぁあぃいぅうぇえぉお......っつづ......はばぱひびぴふぶぷへべぺほぼぽ......ゎわゐゑをん). This ordering was chosen in order to more simply facilitate the sorting of kana-based dictionary look-ups (Yasuoka, 2006).[k]
As mentioned above, in this standard, the previously defined katakana order in JIS X 0201 was not followed in JIS X 0208. It is thought that the JIS X 0201 katakana being "half-width kana" arose due to the incompatibility with the katakana of this standard. This point is also one of the weaknesses of this standard.
Kanji[]
How the kanji in this standard were chosen from what sources, why they are split into level 1 and level 2, and how they are arranged are all explained in detail in the fourth standard (1997). Per that explanation, the kanji included in the following four kanji listings were reflected in the 6349 characters of the first standard (1978).
- Kanji Listing for Standard Code (Tentative) (標準コード用漢字表 (試案), Hyōjun Kōdo-yō Kanjihyō (Shian))
- The Information Processing Society of Japan kanji code committee compiled this list in 1971. In the below "Correspondence Analysis Results", this appears to be 6086 characters.
- Basic Kanji for Administrative Data Processing Use (行政情報処理用基本漢字, Gyōsei Jōhō Shoriyō Kihon Kanji)
- Selected by the of Japan in 1975, it consists of 2817 characters. For data for the purpose of selection, the Agency made a report which, starting with the "Kanji Listing for Standard Code (Tentative)", contrasted several kanji listings, the "Correspondence Analysis Results and Frequency of Use of Kanji for Administrative Data Processing Use Normal Kanji Selection" (行政情報処理用標準漢字選定のための漢字の使用頻度および対応分析結果, Gyōsei Jōhō Shoriyō Kihon Kanji Sentei no Tame no Kanji no Shiyō Hindo Oyobi Taiō Bunseki Kekka), or "Correspondence Analysis Results" (対応分析結果, Taiō Bunseki Kekka) for short.
- Japanese Personality Registration Name Kanji (日本生命収容人名漢字, Nihon Seimei Shūyō Jinmei Kanji)
- One of the kanji listings that compose the "Correspondence Analysis Results", consisting of 3044 characters. It no longer exists. The original list was nonexistent for the original drafting committee; this kanji list was reflected in the standard to follow the "Correspondence Analysis Results".
- Kanji for National Administrative District Listing (国土行政区画総覧使用漢字, Kokudo Gyōsei Kukaku Sōran Shiyō Kanji)
- One of the kanji listings that compose the "Correspondence Analysis Results", consisting of 3251 characters. They are the kanji used in the list of all administrative place names compiled by the , the "National Administrative District Listing" (国土行政区画総覧, Kokudo Gyōsei Kukaku Sōran). The original drafting committee did not investigate the listing itself; the kanji used from this list followed the "Correspondence Analysis Results".
In the second and third standards, they added four and two characters to level 2, respectively, bringing the total kanji to 6355. Also, in the second standard, character forms were changed as well as transposition among the levels; in the third standard as well, character forms were changed. These are described further below.
Level partitioning[]
The 2,965 Level 1 kanji occupy rows 16 to 47. The 3,390 Level 2 kanji occupy rows 48 to 84.
For level 1, characters common to multiple kanji glyph listings were chosen, using the tōyō kanji, the tōyō kanji correction draft, and the jinmeiyō kanji as a basis. Also, JIS C 6260 ("To-Do-Fu-Ken (Prefecture) Identification Code"; currently ) and JIS C 6261 ("Identification code for cities, towns and villages"; currently ) were consulted; kanji for nearly all Japanese prefectures, cities, districts, wards, towns, villages, and so forth were intentionally placed in level 1.[l] Furthermore, amendments by experts were added.
Level 2 was dedicated to kanji that made an appearance in the aforementioned four major listings but were not selected for level 1. As noted below, the kanji of level 1 were ordered by their pronunciation, so among the kanji whose pronunciation were difficult to determine, there were those that were transferred from level 1 to level 2 on that basis (Nishimura, 1978).
Due to these decisions, for the most part, level 1 contains more frequently used kanji, and level 2 contains more infrequently used kanji, but of course, those were judged by the standards of the day; over the passage of time, some level 2 kanji have become more frequently used, such as one meaning "to soar" (翔) and one meaning "to glitter" (煌); and inversely, some level 1 kanji have become infrequent, notably the ones meaning "centimeter" (糎) and "millimeter" (粍). Of the current jōyō kanji, 30 fall into level 2,[m] while three are missing altogether (塡