KOI8-RU

From Wikipedia, the free encyclopedia
KOI8-RU
Language(s)Belarusian, Ukrainian, Russian, Bulgarian
Classification8-bit KOI, extended ASCII
ExtendsKOI8-B
Based onKOI8-U, KOI8-R
Other related encoding(s)KOI8-E, KOI8-F

KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters Ґ, Є, І, Ї, and Ў in both upper case and lower case. It is even more closely related to KOI8-U, which does not include Ў but otherwise makes the same replacements. The additional letter allocations are matched by KOI8-E, except for Ґ which is added to KOI8-F.

In IBM, KOI8-RU is assigned code page/CCSID 1167.[1][2]

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

KOI8 stands for Kod obmena informatsiey, 8 bit (Russian: Код обмена информацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-RU becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.

Character set[]

The following table shows the KOI8-RU encoding. Each character is shown with its equivalent Unicode code point.

KOI8-RU[3][4][5]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x
2x  SP  ! " # $ % & ' ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~
8x
9x NBSP » ® « · ¤
Ax ё є і ї ґ ў
Bx Ё Є І Ї Ґ Ў ©
Cx ю а б ц д е ф г х и й к л м н о
Dx п я р с т у ж в ь ы з ш э щ ч ъ
Ex Ю А Б Ц Д Е Ф Г Х И Й К Л М Н О
Fx П Я Р С Т У Ж В Ь Ы З Ш Э Щ Ч Ъ
  Differences from KOI8-R

Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also[]

  • KOI character encodings

References[]

  1. ^ "Code page 1167 information document". Archived from the original on 2017-01-16.
  2. ^ "CCSID 1167 information document". Archived from the original on 2016-03-27.
  3. ^ Leisher, Mark (1999-12-20), KOI8-RU Belorusian/Ukrainian Cyrillic to Unicode 2.1 mapping table, KOI8RU.TXT
  4. ^ Code Page CPGID 01167 (pdf) (PDF), IBM
  5. ^ Code Page CPGID 01167 (txt), IBM

External links[]

Retrieved from ""