Cork encoding

From Wikipedia, the free encyclopedia

The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west and east-European languages with the Latin alphabet.[2]

Details[]

In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.

Character set[]

Cork encoding
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x ` ´ ˆ ˜ ¨ ˝ ˚ ˇ ˘ ¯ ˙ ¸ ˛
1x « » ZWSP [a] ı[b] ȷ[b]
2x  SP  ! " # $ % & ( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ SHY[c]
8x Ă Ą Ć Č Ď Ě Ę Ğ Ĺ Ľ Ł Ń Ň Ŋ Ő Ŕ
9x Ř Ś Š Ș Ť Ț Ű Ů Ÿ Ź Ž Ż IJ İ đ §
Ax ă ą ć č ď ě ę ğ ĺ ľ ł ń ň ŋ ő ŕ
Bx ř ś š ș ť ț ű ů ÿ ź ž ż ij ¡ ¿ £
Cx À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Dx Ð[d] Ñ Ò Ó Ô Õ Ö Œ Ø Ù Ú Û Ü Ý Þ SS[e]
Ex à á â ã ä å æ ç è é ê ë ì í î ï
Fx ð ñ ò ó ô õ ö œ ø ù ú û ü ý þ ß

Notes[]

  • Hexadecimal values under the characters in the table are the Unicode character codes.
  • The first 12 characters are often used as combining characters.
  1. ^ 0x18 is just a "trailing zero", used to compose or (or arbitrary smaller quantities) out of percent sign (%).
  2. ^ a b Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
  3. ^ 0x7F is the hyphenation character (not really a soft hyphen).
  4. ^ 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
  5. ^ 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.

Supported languages[]

The encoding supports most European languages written in Latin alphabet. Notable exceptions are:

  • Esperanto (using IL3)
  • Latvian language and Lithuanian language (using L7X)
  • Welsh language

Languages with slightly suboptimal support include:

  • Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
  • Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
  • Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages


References[]

  1. ^ a b Petrlik, Lukas (1996-06-19). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
  2. ^ Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, 11 (4): 514–516
  3. ^ TeX hyphenation patterns

External links[]

Retrieved from ""