Cangjie input method

From Wikipedia, the free encyclopedia
Cangjie input method
倉頡輸入法 拆碼.jpg
Coding of "倉頡輸入法" (i.e. Cangjie method) in traditional Chinese characters
Traditional Chinese倉頡輸入法
Simplified Chinese仓颉输入法

The Cangjie input method (Tsang-chieh input method, sometimes called Changjie, Cang Jie, Changjei[1] or Chongkit) is a system for entering Chinese characters into a computer using a standard computer keyboard. In filenames and elsewhere, the name Cangjie is sometimes abbreviated as cj.

The input method was invented in 1976 by Chu Bong-Foo, and named after Cangjie (Tsang-chieh), the mythological inventor of the Chinese writing system, at the suggestion of Chiang Wei-kuo, the former Defense Minister of Taiwan. Chu Bong-Foo released the patent for Cangjie in 1982, as he thought that the method should belong to Chinese cultural heritage.[2] Therefore, Cangjie has become open-source software and is on every computer system that supports traditional Chinese characters, and it has been extended so that Cangjie is compatible with the simplified Chinese character set.

A Chinese keyboard in Shek Tong Tsui Municipal Services Building, Hong Kong with Cangjie hints printed on the lower-left corners of the keys. (Printed on the lower-right and upper-right corners are Dayi hints and Zhuyin symbols respectively.)

Cangjie is the first Chinese input method to use the QWERTY keyboard. Chu saw that the QWERTY keyboard had become an international standard, and therefore believed that Chinese-language input had to be based on it.[3] Other, earlier methods use large keyboards with 40 to 2400 keys, except the Four-Corner Method, which uses only number keys.

Unlike the Pinyin input method, Cangjie is based on the graphological aspect of the characters: each graphical unit, called a "radical" (not to confused with Kangxi radicals), is represented by a basic character component, 24 in total, each mapped to a particular letter key on a standard QWERTY keyboard. An additional "difficult character" function is mapped to the X key. Keys are categorized into four groups, to facilitate learning and memorization. Assigning codes to Chinese characters is done by separating the constituent "radicals" of the characters.

Overview[]

Keys and "radicals"[]

The basic character components in Cangjie are called "radicals" (字根) or "letters" (字母). There are 24 radicals but 26 keys; the 24 radicals (the basic shapes 基本字形) are associated with roughly 76 auxiliary shapes (輔助字形), which in many cases are either rotated or transposed versions of components of the basic shapes. For instance, the letter A () can represent either itself, the slightly wider 曰, or a 90° rotation of itself. (For a more complete account of the 76-odd transpositions and rotations than the ones listed below, see the article on Cangjie entry in Chinese Wikibooks.)

The 24 keys are placed in four groups:

  • Philosophical Group — corresponds to the letters 'A' to 'G' and represents the sun, the moon, and the five elements
  • Strokes Group — corresponds to the letters 'H' to 'N' and represents the brief and subtle strokes
  • Body-Related Group — corresponds to the letters 'O' to 'R' and represents various parts of the human anatomy
  • Shapes Group — corresponds to the letters 'S' to 'Y' and represents complex and enclosed character forms
Group Key Name Primary meaning
Philosophical group A 日 sun 日, 曰, 90° rotated 日 (as in 巴)
B 月 moon the top four strokes of 目; 冂; 爫; 冖; the top and top-left part of 炙, 然, and 祭; the top-left four strokes of 豹 and 貓; and the top four strokes of 骨
C 金 gold itself, 丷, 八, and the penultimate two strokes of 四 and 匹
D 木 wood itself, the first two strokes of 寸 and 才, the first two strokes of 也 and 皮
E 水 water 氵, the last five strokes of 暴 and 康, 又
F 火 fire the shape 小, 灬, the first three strokes in 當 and 光
G 土 land itself, or 士 for soldier
Stroke group H 竹 bamboo The slant and short slant, the Kangxi radical 竹, namely the upper parts in 笨 and 節
I 戈 dagger axe The dot, the first three strokes in 床 and 庫, and the shape 厶
J 十 ten The cross shape and the shape 宀
K 大 big The X shape, including 乂 and the first two strokes of 右, as well as 疒
L 中 centre The vertical stroke, as well as 衤 and the first four strokes of 書 and 盡
M 一 one The horizontal stroke, as well as the final stroke of 孑 and 刁, the shape 厂, and the shape 工
N 弓 bow The crossbow and the hook
Body parts group O 人 person The dismemberment; the Kangxi radical 人; the first two strokes of 丘 and 乓; the first two strokes of 知, 攻, and 氣; and the final two strokes of 兆
P 心 heart The Kangxi radical 忄; the second stroke in 心; the last four strokes in 恭, 慕, and 忝; the shape 匕; the shape 七; the penultimate two strokes in 代; and the shape 勹
Q 手 hand The Kangxi radical 手
R 口 mouth The Kangxi radical 口
Character shapes group S 尸 corpse 匚, the first two strokes of 己, the first stroke of 司 and 刀, the third stroke of 成 and 豕, the first four strokes of 長 and 髟
T 廿 twenty Two vertical strokes connected by a horizontal stroke; the Kangxi radical 艸 when written as 艹 (whether the horizontal stroke is connected or broken)
U 山 mountain Three-sided enclosure with an opening on the top
V 女 woman A hook to the right; a V shape; the last three strokes in 艮, 衣, and 長
W 田 field Itself, as well as any four-sided enclosure with something inside it, including the first two strokes in 母 and 毋
Y 卜 fortune telling The 卜 shape and rotated forms, the shape 辶, the first two strokes in 斗
Collision/Difficult key* X 重/難 collision/difficult (1) disambiguation of Cangjie code decomposition collisions, (2) code for a "difficult-to-decompose" part
Special character key* Z (See note) Auxiliary code used for entering special characters (no meaning on its own). In most cases, this key combined with other keys will produce Chinese punctuations (such as 。,、,「 」,『 』).

Note: Some variants use Z as a collision key instead of X. In those systems, Z has the name "collision" (重) and X has the name "difficult" (難); but the use of Z as a collision key is neither in the original Cangjie nor used in the current mainstream implementations. In other variants, Z may have the name "user-defined" (造) or some other name.

Wildcard Shift + 8 (*) Wildcard It can replace any key from 2nd to 5th place, and return a list matches the combination. It is very useful for unknown guesses when you are sure about the first and last input. (e.g. Input 竹*竹 will include the following in the list: 身, 物, 秒, 第 )

The auxiliary shapes of each Cangjie radical have changed slightly across different versions of the Cangjie method. Thus, this is one reason that different versions of the Cangjie method are not completely compatible.

Chu Bong-Foo has provided alternate names for some letters according to their characteristics. For example, H (竹) is also called 斜, which means slant. The names form a rhyme to help learners memorize the letters, each group being in a line (The sounds of final characters are given in parentheses):

日 月 金 木 水 火 土 (tǔ)
斜 點 交 叉 縱 橫 鈎 (gōu)
人 心 手 口 (kǒu)
側 並 仰 紐 方 卜 (bǔ)

Keyboard layout[]

A typical keyboard layout for Cangjie method, based on United States keyboard layout. Note the non-standard use of Z as the collision key.

Basic rules[]

The typist must be familiar with several decomposition rules (拆字規則) that define how to analyze a character to arrive at a Cangjie code.

  • Direction of decomposition: left to right, top to bottom, and outside to inside
  • Geometrically connected forms: take four Cangjie codes, namely the first, second, third, and last codes
  • Geometrically unconnected forms that can be broken into two subforms (e.g., 你): identify the two geometrically connected subforms according to the direction of decomposition rules (i.e., 人 and 尔), then take the first and last codes of the first subform and the first, second, and last code of the second subform.
  • Geometrically unconnected forms that can be broken into multiple subforms (e.g., 謝): identify the first geometrically connected subform according to the direction of decomposition rules (i.e., 言) and take the first and last codes of that form. Next, break the remainder (i.e., 射) into subforms (i.e., 身 and 寸) and take the first and last codes of the first subform and the last code of the last subform.

The rules are subject to various principles:

  • Conciseness (精簡) – if multiple ways of decomposition are possible, the shorter decomposition is consider to be correct.
  • Completeness (完整) – if multiple ways of decomposition with the same length of code are possible, the one that identifies a more complex form first is the correct decomposition.
  • Reflection of the form of the radical (字型特徵) – the decomposition should reflect the shape of the radical, meaning (a) using the same code twice or more should be avoided if possible, and (b) the shape of the character should not be "cut" at a corner in the form.
  • Omission of codes (省略)
    • Partial omission (部分省略) – when the number of codes in a complete decomposition exceeds the permitted number of codes, the extra codes are ignored.
    • Omission in enclosed forms (包含省略) – when part of the character to be decomposed and the form is an enclosed form, only the shape of the enclosure is decomposed; the enclosed forms are omitted.

Examples[]

Typing Chinese with Cangjie input method version 5
Typing Chinese with Cangjie input method on an Android device
  • 車 (chē: vehicle)
    • This character is geometrically connected, consisting of a single vertical structure, so we take the first, second, and last Cangjie codes from top to bottom.
    • The Cangjie code is thus 十 田 十 (JWJ), corresponding to the basic shapes of the codes in this example.
  • 謝 (xiè: to thank, to wither)
    • This character consists of geometrically unconnected parts arranged horizontally. For the initial decomposition, we treat it as two parts, 言 and 射.
    • The first part, 言, is geometrically unconnected from top to bottom; we take the first (亠, auxiliary shape of 卜 Y) and last parts (口, basic shape of 口 R) and arrive at 卜 口 (YR).
    • The second part is again geometrically unconnected, arranged horizontally. The two parts are 身 and 寸.
      • For the first part of this second part, 身, we take the first and last codes. Both are slants and therefore H; the first and last codes are thus 竹 竹 (HH).
      • For the second part of the original second part, 寸, we take only the last part. Because this is geometrically unconnected and consists of two parts, the first part is the outer form while the second part is the dot in the middle. The dot is I, and therefore the last code is 戈 (I).
    • The Cangjie code is thus 卜 口 (YR) 竹 竹 (HH) 戈 (I), or 卜 口 竹 竹 戈 (YRHHI).
  • 谢 (simplified version of 謝)
    • This example is identical to the example just above, except that the first part is 讠; the first and last codes are 戈 (I) and 女 (V).
    • Repeating the same steps as in the above example, we get 戈 女 (IV) 竹 竹 (HH) 戈 (I), or 戈 女 竹 竹 戈 (IVHHI).

Exceptions[]

Some forms are always decomposed in the same way, whether the rules say they should be decomposed this way or not. The number of such exceptions is small:

Form Fixed decomposition
Version 2 Version 3 Version 5
門 (door) 日 弓 (AN)
目 (eye) 月 山 (BU)
鬼 (ghost) 竹 戈 (HI) 竹 戈 (HI) or HUI
几 (small table) 竹 山 (HU) 竹 弓 (HN)
贏 (win) 卜 口 月 月 弓 (YRBBN) 卜 弓 月 山 金 (YNBUC)
虍 (tiger [radical]) 卜 心 (YP)
亡 on top of 口 (吂) 卜 口 (YR) 卜 女 口 (YVR)
隹 (fowl) 人 土 (OG)
气 (air [radical]) 人 山 (OU) 人 弓 (ON) 人 一 弓 (OMN)
畿 minus the 田 女 戈 (VI)
鬥 (compete) 中 弓 (LN)
阝 (mound or city radical) 弓 中 (NL)

Some forms cannot be decomposed. They are represented by an X, which is the 難 key on a Cangjie keyboard.[4]

Form Fixed decomposition (v5)
HX
HXYC
HXBC
HXBT
VLXH
YX
TXC
鹿 IXP
HXH
NX
RXU
NXU
IXF
IXE
ELXL
LX

Early development[]

Initially, the Cangjie input method was not intended to produce a character in any character set. Instead, it was part of an integrated system consisting of the Cangjie input rules and a Cangjie controller board. This controller board contains character generator firmware, which dynamically generates Chinese characters from Cangjie codes when characters are output, using the hi-res graphics mode of the Apple II computer. In the preface of the Cangjie user's manual, Chu Bong-Foo wrote in 1982:

[in translation]
In terms of output: The output and input, in fact, [form] an integrated whole; there is no reason that [they should be] dogmatically separated into two different facilities.… This is in fact necessary.…

In this early system, when the user types "yk", for example, to get the Chinese character 文, the Cangjie codes do not get converted to any character encoding and the actual string "yk" is stored. The Cangjie code for each character (a string of 1 to 5 lowercase letters plus a space) was the encoding of that particular character.

Demonstration of character generator Mingzhu's capability to generate characters according to the codes. The first character is