unicode - grep.ro · unicode alex morega rosedu tech talks 2 aprilie 2011 saturday, 2 april, 2011
TRANSCRIPT
UnicodeAlex Morega
ROSEdu Tech Talks2 aprilie 2011
Saturday, 2 April, 2011
cartele perforatehttp://en.wikipedia.org/wiki/Punched_card
Saturday, 2 April, 2011
(practic: protocol de imprimantă)http://en.wikipedia.org/wiki/Teleprinter
Saturday, 2 April, 2011
ASCIIhttp://en.wikipedia.org/wiki/ASCII
Saturday, 2 April, 2011
ISO-8859-1 (Latin-1)http://en.wikipedia.org/wiki/ISO/IEC_8859-1
Saturday, 2 April, 2011
Shift-JIShttp://en.wikipedia.org/wiki/Shift-JIS
Saturday, 2 April, 2011
(practic: cod universal)
Saturday, 2 April, 2011
q #0071ø #00F8ș #0219 (sau #0073, #0326)ω #03C9 #0634➶ #27B6ヂ #30C2森 #68EE𝄞 #1D11E
ش
Saturday, 2 April, 2011
Basic Multilingual Planehttp://en.wikipedia.org/wiki/Plane_(Unicode)
Saturday, 2 April, 2011
• Plane 0 (0000–FFFF)% Basic Multilingual Plane
• Plane 1 (10000–1FFFF)% Supplementary Multilingual Plane
• Plane 2 (20000–2FFFF)% Supplementary Ideographic Plane
• Planes 3 – 13 (30000–DFFFF)% Unassigned
• Plane 14 (E0000–EFFFF)% Supplementary Special-purpose Plane
• Planes 15 – 16 (F0000–10FFFF)% Private Use Area
Saturday, 2 April, 2011
Normalizare
• Canonical (NF): combining diacritics, etc (echivalență vizuală)% [ș]⇔[s ̦], [ç]⇔[c¸]
• Compatibilitate (NKF): sub/superscript, ligaturi, etc (echivalență semantică)% [⁵]⇒[5], [ffi]⇒[ffi]
• NKF ⊃ NF
Saturday, 2 April, 2011
UCS-2, UTF-16
• în general 2 bytes per code point
• 4 bytes pentru non-BMP, surrogate pairs% U+D800 – U+DFFF
• endianness, BOM
• NULL chars
Saturday, 2 April, 2011
UTF-8
• Variable-width
• Compatibil cu ASCII și NULL-safe
• Sortare fără decodare
• Resincronizare
• Valori invalide
Saturday, 2 April, 2011
http://en.wikipedia.org/wiki/UTF-8Saturday, 2 April, 2011
Programare
• C: char, wchar_t, iconv
• Java: char, int
• PHP, Ruby: bytes
• Python: ~UTF-16, UTF-32
Saturday, 2 April, 2011
HTML, XML
• Entități: [&], [ș], [ș]
• “<?xml version="1.0" encoding="UTF-8" ?>”
• “Content-Type: text/html;charset=utf-8”
• http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
Saturday, 2 April, 2011
Mulțumesc. Întrebări?
http://grep.ro/quickpub/rtt-unicode
Saturday, 2 April, 2011