Cork encoding
This article relies largely or entirely on a single source. (November 2012) |
The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west and east-European languages with the Latin alphabet.[2]
Details[]
In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}
, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.
Character set[]
Cork encoding | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ` | ´ | ˆ | ˜ | ¨ | ˝ | ˚ | ˇ | ˘ | ¯ | ˙ | ¸ | ˛ | ‚ | ‹ | › |
1x | “ | ” | „ | « | » | – | — | ZWSP | ₀[a] | ı[b] | ȷ[b] | ff | fi | fl | ffi | ffl |
2x | SP | ! | " | # | $ | % | & | ’ | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ‘ | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | SHY[c] |
8x | Ă | Ą | Ć | Č | Ď | Ě | Ę | Ğ | Ĺ | Ľ | Ł | Ń | Ň | Ŋ | Ő | Ŕ |
9x | Ř | Ś | Š | Ș | Ť | Ț | Ű | Ů | Ÿ | Ź | Ž | Ż | IJ | İ | đ | § |
Ax | ă | ą | ć | č | ď | ě | ę | ğ | ĺ | ľ | ł | ń | ň | ŋ | ő | ŕ |
Bx | ř | ś | š | ș | ť | ț | ű | ů | ÿ | ź | ž | ż | ij | ¡ | ¿ | £ |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð[d] | Ñ | Ò | Ó | Ô | Õ | Ö | Œ | Ø | Ù | Ú | Û | Ü | Ý | Þ | SS[e] |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | œ | ø | ù | ú | û | ü | ý | þ | ß |
Notes[]
- Hexadecimal values under the characters in the table are the Unicode character codes.
- The first 12 characters are often used as combining characters.
- ^ 0x18 is just a "trailing zero", used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).
- ^ a b Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
- ^ 0x7F is the hyphenation character (not really a soft hyphen).
- ^ 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
- ^ 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.
Supported languages[]
The encoding supports most European languages written in Latin alphabet. Notable exceptions are:
- Esperanto (using IL3)
- Latvian language and Lithuanian language (using L7X)
- Welsh language
Languages with slightly suboptimal support include:
- Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
- Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
- Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages
References[]
- ^ a b Petrlik, Lukas (1996-06-19). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
- ^ Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, 11 (4): 514–516
- ^ TeX hyphenation patterns
External links[]
- Character encoding
- Character sets
- TeX