tugboat, volume 41 (2020), no. 3 275 - tex

TUGboat, Volume 41 (2020), No. 3 275

The Non-Latin scripts & typographyKamal Mansour

1 IntroductionIt is quite common in typographic terminology todivide the world’s scripts into Latin and non-Latins.At first glance that might seem to be a reasonablecategorization until one looks a bit closer to findthat non-Latin consists of a huge number of diversescripts. Imagine if we were to call Latin script blue,while calling all others non-blue. We would be gath-ering the remaining colors of the spectrum into onegroup without differentiation. Calling a color non-blue doesn’t tell us anything about what it mightbe; is it orange, green, magenta, indigo?

However strange this terminology may be, thereis good reason behind it. Gutenberg invented move-able type for Latin script; in particular, he focusedon a certain Gothic style used at the time by Ger-man scribes. We know things didn’t stop there. Theprinted forms of Latin script moved away from thehandwritten forms over time, evolving into a dis-tinct craft and discipline. In the process, the letterforms underwent considerable simplification. By thetime typographic letters began to develop for otherscripts, Latin type had reached a maturity possibleonly through time. Over the centuries, the machin-ery and techniques had been refined for the needs ofLatin type and any newcomers to the game neededto adapt to the current state of things.

In the realm of non-Latin scripts, what are themain groups and families? Moving eastward fromthe origins of Latin script, we find Greek script, andfurther east, its close relative, Cyrillic. Hoveringover the Caucasus, Armenian and Georgian raisetheir heads. Reaching from North Africa eastward,Arabic script occupies a large swath of the land-scape that reaches as far as India. Near the junctureof Western Asia and North Africa, Hebrew has itshome. In East Africa, Ethiopic script — in ancienttimes know as Ge’ez — has a sizable presence. Overthe large territory of China, as well as in Taiwan,Chinese is the dominant script. Hangul, a syllabicscript with some Chinese visual features, dominatesin Korea. Meanwhile, further south in Japan, rules aunique hybrid writing system consisting of two syl-labaries (katakana and hiragana), Chinese charac-ters, in addition to Latin. The Indian subcontinentis home for a large family of scripts descended fromancient Brahmi writing. Over the centuries, the de-scendants split into two groups, northern and south-ern scripts. In the northern group, Devanagari, Ben-gali, and Gujarati scripts stand out. The southern

group includes Tamil, Telugu, Kannada, Malayalam,and Sinhala. In Southeast Asia, we find more dis-tant descendants of Brahmi in Thai, Lao, Khmer,and Myanmar scripts.

As close relatives of Latin, the Greek and Cyril-lic scripts had more time to develop typographicallythan other non-Latins. In its earliest typographicforms, Greek initially imitated cursive scribal formbefore eventually settling on printed forms based onits classic origins. Classical and Biblical studies fur-ther established the need for Greek typefaces in thelate nineteenth and early twentieth centuries.

Cyrillic, originally derived from Greek, was ex-tensively revised and “europeanized” under the handof Peter the Great. Then early in the twentieth cen-tury, Cyrillic was further simplified by purging it ofunnecessary vestiges from Greek orthography and re-dundant characters. In an effort to further unite thehuge territory of the Soviet Union, the governmentextended Cyrillic script to support the many non-Slavic languages of the various republics. Outsidethe Soviet Union, a certain number of Cyrillic type-faces were always available through the mainstreamtype foundries to satisfy the needs of academics inSlavic Studies and of emigrant communities.

Because of their common shapes and behavior,Greek and Cyrillic have always been easily accommo-dated on the same typesetting equipment as Latinscript. Of course, the compositor had to be familiarwith the script he was setting, with Polytonic Greekbeing the most demanding because of the variety ofdiacritics.

When it comes to typography, every script hasits intricate details, and consequently, its own storyof transition from handwriting to type. It wouldbe interesting to tell every story, but that would beenough to fill many books. Without going into everydetail, we can gain some understanding by survey-ing the types of problems that arose when adaptingtypesetting technology developed for Latin script tonon-Latin scripts. Because scripts have developedas families, they share many attributes with othermembers of the family. We can take advantageof the similarities to identify recurring obstacles inthe typographic development of non-Latin scripts.By citing judiciously chosen examples from a fewscripts, we can readily present the gamut of signifi-cant difficulties.

Before delving into the range of technical chal-lenges, we must first recognize the most commonhurdle shared by all who endeavored to bring a non-Latin script into the typographic age: the fear of theexotic. Typography developed in Europe and it was

The Non-Latin scripts & typography

276 TUGboat, Volume 41 (2020), No. 3

Europeans who first strove to take this craft to dis-tant lands. Before the advent of sociology, anthro-pology, and linguistics, non-European cultures wereseen as foreign, exotic, and difficult to fathom. Be-fore understanding the written form of a language,one must first learn the spoken language and, tosome extent, understand its culture.

To explore the gamut of typographic challenges,we will visit four scripts: Devanagari, Khmer, Ara-bic, and Chinese. In terms of visual impression andstructure, each of these scripts differs greatly fromthe others.

2 DevanagariWe begin with Devanagari — the most commonlyused script in contemporary India — that is usedprimarily for the Hindi and Marathi languages. De-vanagari has a long history of development that be-gan with the Sanskrit language centuries ago. Thestructure of Devanagari writing is based on the sylla-ble which, in its simplest form, can be composed ofa consonant with either an inherent or explicit vowelmark. As an example, let us consider the letter ka,which when written as क, includes the implied vowela. If we want to write kī, we need to explicitly addthe ī vowel ◌ी to ka क, resulting in क . Presentedat a larger size, we can clearly show how the twocomponents of the kī syllable overlap on the top:

क + ◌ी −→ कNow, if we want to write ku instead of kī, the u

vowel is placed below ka as follows:

Marks other than vowels can also appear aboveand below letters; for instance, if we want to indicatethat the vowel in ka sounds nasal, we would add adot above (natively, anusvara) as in:

कंSince Latin type was usually composed on a

single horizontal line with gaps between letters, De-vanagari presented quite a challenge because of itsneed for overlap between adjacent characters, in ad-dition to upper and lower marks.

3 KhmerNext, we consider Khmer, or Cambodian, writing.As a remote descendant of the Brahmi script, thestructure of Khmer is also broadly based on the syl-lable, including the presence of an implied or explicitvowel; however, the phonetic nature of the Khmerlanguage has also resulted in many additional fea-tures that go beyond the ancestral Brahmi model.

The orthographic norms of Khmer script representthe range of simple to complex syllables by allowingits components to stack both horizontally and ver-tically. Like Devanagari, the simplest syllable con-sists of a consonant with an implied or explicit vowel,but because of the tonal nature of Khmer, marks in-dicating tone are sometimes also required. Khmerhas the same needs as Devanagari for upper andlower marks, which result in more height and depthfor syllable clusters. While Devanagari allows multi-consonant syllables, it also allows them to stack hor-izontally, if need be. On the other hand, Khmerrequires the components of a multi-consonant sylla-ble cluster to stack vertically.

Following is an example of a relatively simplesyllable cluster consisting of a single consonant nofollowed by a vowel mark i above (u1793 + u17B7):

ន + ◌ិ −→ និAs an example where additional depth is re-

quired, the following syllable cluster consists of asingle consonantal letter followed by a subscript con-sonant that sits to its right while partially overlap-ping it below [pha + subscript sa ]. The use of asubscript form indicates that the preceding conso-nant is not followed by a vowel (u1795 + u17D2 +u179F):

ផ ◌្ ស −→ ផ្សThe character u17D2 indicates that the follow-

ing consonantal symbol should appear in its sub-script form.

There are also deeper cases where a vowel sym-bol ie surrounds and subtends a consonant ko fol-lowed by a subscript consonant lo (u1783 + u17D2+ u179B + u17C0). Note that the right part of ieis stretched to embrace the stacked consonants (ko+subscript lo + ie):

ឃ ល េ◌ៀ −→ េឃ្លៀThe above sequence could also carry an addi-

tional mark above it. Unlike Latin letters, Khmercharacters have a predominantly vertical aspect ra-tio when arranged into syllable clusters. One canimagine the numerous challenges posed by Khmerscript for those who wanted to implement it in metaltype. How does one stack so many vertical elementsalong the baseline? If Latin and Khmer letters areput side by side, what sort of size ratio should bemaintained between the two? Does one resort to us-ing the largest leading needed throughout one docu-ment, or does one increase it only when required?

Kamal Mansour


In digital fonts equipped with OpenType tech-nology — or some other shaping software — the built-in shaping logic allows the user to enter text at thecharacter level only, while watching as syllable clus-ters compose magically into their correct form. Bymoving from metal to digital type, we also leave theobstacles of metal behind while gaining new flexibil-ity and fluidity of form.

4 ArabicThe cursive form of Arabic script is the one trait thatdifferentiates it from most others. Arabic writingnever developed typographic “block” letters, but hasremained cursive from its inception to this day.

Spread over many centuries and a broad geo-graphic terrain, many different styles of Arabic scripthave developed. Under Ottoman rule, when it cametime to adapt one of the Arabic styles to type, theNaskh style prevailed. In due time, many otherstyles were implemented, but Naskh has remainedthe reference style for running text. Because of itsrich variety of contextually variant forms, Naskhstyle could only be typeset by a compositor wellversed in the intricacies of the style. Showing a va-riety of styles available for manual setting, the fol-lowing image is taken from an 1873 catalog of theAmiriyah Press in Cairo:

Each of the above styles required that the com-positor thoroughly master its shaping rules.

Most Arabic text is set without vowel marks,so setting fully vocalized text (with optional marksabove and below the letters) as in the above samplewas (and still is) laborious, and therefore avoidedbecause of the extra expense.

With the advent of mechanized typesetting inthe early to mid-twentieth century, styles had tobe simplified to fit the new size constraints of mag-azines and keyboards. This trend to simplify per-sisted, reaching its peak with the arrival of photo-typesetters where the limited number of characterslots was dictated by Latin fonts.

In the present day, one can say with good con-fidence that Arabic script stands to benefit greatlyfrom digital fonts equipped with OpenType technol-ogy in several ways. First of all, the oversimplifica-tion of the past century can be discarded becauseOpenType technology is capable of emulating thevarious calligraphic styles of Arabic script. Sincethe global adoption of the Unicode Standard in the1990s, Arabic text is stored as a sequence of alpha-betic characters that are independent of style. Theembedded shaping logic of each font reflects its style,while text input remains invariant. By setting astring of text in a particular font, one can expectit to take the shape mandated by the rules of thatstyle. For instance, in the following sample, each lineconsists of the same text set in three distinct fonts.Note that the second and third lines show words thatare set on an incline with respect to the baseline —a calligraphic feature ably rendered in OpenType.

5 ChineseWe now take a look at the fourth script, Chinese,which is nowadays shared mainly by the Chinese andJapanese languages, albeit with some stylistic differ-ences. Unlike the three previously examined scripts,Chinese characters require no additional marks norany contextual shaping. Once designed, a charactercan simply be typeset as is. The primary obstaclefor Chinese script lies in the thousands of charactersneeded to set everyday text. Chinese writing has a



…Inherent dignity and…inalienable rights of…the human family is the foundation of freedom

…Inherent dignity and…inalienable rights of…the human family is the foundation of freedom

Figure 1: Typesetting in a standard Latin font vs. Zapfino.

long history of exacting calligraphic styles that re-quire great effort to master. The following sampleshows Song, one of the most common styles.

Despite the graphic complexity of its characters,Chinese script is typically classified among the “sim-ple” scripts because the typesetting of Chinese texthas no complexities in terms of its character shapes.

To write Japanese, Chinese characters (kanji)are intermixed with two phonetic syllabaries calledhiragana and katakana to show grammatical affixes,as well as foreign words and names. In typical Japa-nese text, about half the characters consist of kana(hiragana or katakana). The following text is takenfrom the first line of the Universal Declaration of Hu-man Rights. The more fluid and curvier characterare hiragana.

In the twentieth century, ingenious methodsand typesetting equipment were devised to handlethe large character sets needed for Chinese and Ja-panese in their home lands. However, the keyboardsand magazines of the mechanical typesetters of Lino-type and Monotype were designed for limited char-acter sets. In addition, the cost of developing an ad-equate set of Chinese characters was high enough tostop any Western companies from entering the fielduntil the early era of digital typesetters in the late1980s, when Monotype entered the marketplace.

Once digital methods prevailed in the Chineseand Japanese markets, more local enterprises wereable to enter the typographic field and thrive. Dig-ital text entry now depends on sophisticated soft-ware that exploits the systematic structure of Chi-nese character in terms of its radicals, strokes, orphonetics to simplify the job for the user.

6 Degrees of complexityNow that we have taken a quick glimpse at severalscripts and their traits, we need to examine whatinferences we can draw for the world of scripts.

Scripts vary greatly in their level of complexity.A typical Latin typeface is quite simple because of itsrelatively small character set and the near absenceof shaping rules, while a calligraphic font such asZapfino depends on a set of complex rules; see fig. 1

This example clarifies how complexity can of-ten be based in a particular style, and is not nec-essarily fundamental to the structure of the scriptitself. Some scripts are relatively complex becauseof their intrinsic shaping rules, while their input re-quirements are simple due to a small character set.The line composition of Chinese text is relativelysimple because all characters fit within a square andrequire no additional marks. On the other hand, thelabor needed to create all the characters is dispro-portionately high, and input methods are complexby necessity because of the large character set. AnArabic font whose style requires at most four shapesfor each letter is simple in comparison to one whosestyle requires a rich set of contextual variants. Com-plexity in scripts is a matter of gradation. Thereis no need to simplistically put all scripts into twoheaps, simple or complex.

In this brief overview, we have seen some salientfeatures of each of the following scripts: Devanagari,Khmer, Arabic, and Chinese. Although there are novisual similarities among these scripts, they do sharea common history: each of them came into the twen-tieth century with a typographic tradition of varyingdurations and strengths. While this is also true ofother scripts, there are many others that came intothe twenty-first century with either a typographicfalse start or none at all. There are many languagecommunities who had their own traditional scriptbut whose populations were too small to initiate andsustain a typographic practice. Other communitieshad established a traditional practice of metal type-setting during the nineteenth or twentieth centuries,but could not transition to newer technology suchas phototypesetting as metal technology became ob-solete.

Let us now roll back the calendar by about fourdecades to recount how we got to the current stateof typography. As digital technology was matur-ing in the 1970s and 1980s, a silent software revo-lution was starting to overtake typography. DonaldKnuth’s METAFONT and TEX started making digital

Kamal Mansour


typography directly accessible to academics. Thistransition enabled a wave of academic publicationsthat had earlier been throttled by ever-rising costs ofprofessional typesetters. In the mid-1980s, the waveof WYSIWYG word-processing began at Xerox, butwas later made available to the masses through theApple Macintosh. The concurrent publishing of thePostScript language, and its use as a standard fontformat, wrested digital typography from the handsof proprietary typesetting equipment and broughtit into the public arena. High-resolution printing —well, 300 dpi at the time — suddenly appeared inbig and small offices alike, dragging along with itthe word font into everyday language.

The first stage of the typographic revolution en-abled the definition of character shapes through soft-ware instead of hardware. As affordable character-drawing software became available in the early 1990s,more typographic aficionados were drawn to the fieldas new practitioners. The new software made it pos-sible to produce consistent character sets, properlyformatted as fonts destined to create high-resolutionimprints on paper. Because of the notable differ-ence in resolution, what appeared on the computerscreen was only an approximation of the final imageproduced by a laser printer on paper. In the end,the paper image was the more important one. Afterall, until that time, typography’s aim was to leavean imprint on the relatively permanent medium ofpaper.

By making letter forms in software instead ofmetal, the wall of separation between Latin andnon-Latin characters collapsed. After all, outlinesdefined in the form of polynomials have no physicalbarriers like molds and grooves, and are even free tooverlap and intertwine. The domain of digital char-acter shapes proved itself to be adaptable to anyscript.

As impressive as the first stage of digital typog-raphy was, it was only a first step forward. Eventhough it allowed us to produce typographic imageswithout resorting to metal, it still had significant de-ficiencies which were not immediately apparent toall. Once stored on a computer, one could use atext document to produce numerous impressions ofit over time. In a sense, we can consider the docu-ment a digital equivalent of the typographic galleyfor metal type, but unlike the galley, the early digitaldocuments were somewhat unreliable. In the early1990s, typical text documents used inherently am-biguous character codes; for instance, the code 192(decimal) represented À in a Latin-1 code set, whileit meant ω̄ in a Greek set. With the right softwareand fonts in place, such a document could produce

a faithful image of the intended content, but it wasnot a dependable, unambiguous archive of the orig-inal text.

Around that time, a group of technologists in-volved with multilingual computing were discussinghow to achieve the until-then elusive goal of an un-ambiguous, global character set. They wanted adigital environment where code 192 would representonly one character, and every character of every lan-guage would be assigned a unique code. Gone wouldbe the world of ambiguous character codes, and allscripts, Latin and non-Latin, would be on equal foot-ing. This group of visionaries laid down the foun-dation for the Unicode Standard, which eventuallycame to be recognized as the one global character setby all the major makers of systems and software.

Once Unicode was accepted by Apple and Mi-crosoft, the slow transition away from ambiguouscodes began. Before and during this transition, thetransfer of text documents between different sys-tems was laborious. Any characters outside the 7-bitASCII range could be garbled in transit unless theyhad been deliberately filtered. For instance, code0xE8 in Mac Roman represented Ë, while in Win-dows Code Page 1252 it stood for è. As the use ofpersonal computers continued to penetrate every as-pect of life, it became clear that such a contradictorysituation was untenable in the long run. Change wason the horizon, but it was slow in coming.

In the mid-1990s, as the presence of the Uni-code Standard made itself felt across the globe, itmet resistance in many different regions because itwas perceived as foreign-born. To be accepted every-where, advocates and practitioners of the Standardhad to demonstrate its efficacy and suitability be-fore many different national and linguistic commu-nities. As this defense of the Standard was mounted,it soon became apparent that an aspect of Unicodehad turned out to be an unintended obstacle forsome non-Latin scripts.

“Characters, not glyphs” — one of Unicode’s pri-mary design principles — was aimed at keeping plainUnicode-encoded text free of the entanglements ofstyle. Characters represent abstract units such asletters of the alphabet, while glyphs depict the par-ticular style and size of the character, as dictated bya specific font. Since Unicode represents only thecharacters of a string, the final shape and appear-ance of the display glyphs are relegated to the “textrendering” process.

In the mid-1990s, competing systems had differ-ent “rendering” processes that typically supportedonly a few scripts. The rendering component of sys-tems turned out to be a bottleneck for “complex”



scripts such as Devanagari and Arabic; though theirtexts could be readily encoded in Unicode, it wasdifficult to achieve good typographic results. Inthat period, Apple introduced a sophisticated ty-pographic system called “Apple GX” which was ca-pable of rendering many scripts and styles, but itran only on Apple hardware. Soon thereafter, Mi-crosoft countered with a competing system called“TrueType Open”.

Every practitioner in the publishing communitywas faced with a difficult choice to make. It wasn’tuntil 1996 that Adobe and Microsoft first agreed onOpenType, a font specification that integrated bothPostScript Type 1 and TrueType font formats, in ad-dition to a unified system supporting advanced typo-graphic features. Finally, there was hope of produc-ing cross-platform fonts in the near term, even fornon-Latin fonts. As could be expected, this poten-tial was achieved gradually over the following years.

Today, we can rightly claim that a large num-ber of scripts can be correctly rendered and that thelevel of typographic sophistication is continually ris-ing. Certainly, scripts such as Devanagari, Arabic,and Khmer can be correctly rendered on multipledigital platforms without any worry about portabil-ity. The use of Unicode for numerous scripts, andthe languages they support, has been universally ac-cepted. Each of the characters À, ω̄, Ë, and è isrepresented by the same unique code in any docu-ment, on any platform.

At this auspicious juncture, one must ask ifthere is more to do. Have we reached a stage whereall the scripts in the world are on an even footing?In the Internet era, what does it take for a script tobecome “digitally enabled”? To prepare text docu-ments in a particular script, all its characters mustfirst be listed in the Unicode Standard, each of themassigned a unique code. Then, to make such doc-uments humanly readable, there must be at leastone functional font that renders the text in its com-monly recognizable form. Once these two require-ments are met, documents in a given script — andall the languages it supports — can be created, read,distributed, and archived.

Even though Version 13.0 of Unicode includes154 scripts, more than 100 scripts are known to beunder consideration for future inclusion. It mightcome as a surprise to some that so many scripts re-main unencoded. Some scripts might no longer bein use and require considerable research to producea definitive character set, while others might belongto a minority population that does not have the re-sources to prepare the documents needed to proposetheir inclusion in the Standard. Since it was estab-

lished in 2002, the Script Encoding Initiative (SEI)has been preparing and presenting such proposalsfor scores of less privileged scripts (see linguistics.berkeley.edu/sei/scripts-encoded.html). Overthe years, SEI has been instrumental in bringing atleast 70 scripts into Unicode. Without such spon-sorship, many scripts have little chance to enter thedigital era.

About 10 years ago, Google launched the NotoProject (google.com/get/noto) to build OpenTypefonts for the scripts in Unicode, and to make thefonts freely available to all under an open source li-cense. Noto fonts were required not only to supportthe character set for each script, but also to makeuse of advanced typographic features that render thescript in its expected native appearance. So far, atleast 120 scripts are supported by Noto fonts. Forscripts that were already widely covered, the Notofonts simply broadened the variety of styles avail-able, while for other scripts, they opened the doorto the digital domain.

Not all scripts are on even footing yet — a greatmany have just begun their “digital lives”. As al-ways, improvements will need to be made, and addi-tional scripts will have to be encoded. All the same,I can say with confidence that this is a fruitful andauspicious moment for the whole spectrum of globalscripts, from red to violet.

⋄ Kamal MansourKamal.Mansour (at) {monotype,me} dot com

Production notesKarl Berry

This article was typeset with X ELATEX, using thepolyglossia package and its commands \texthindiand \textkhmer for the Devanagari and Khmer ex-amples. The input text was UTF-8.

We used the Noto Serif Devanagari and NotoSerif Khmer fonts (regular weight). It was not fea-sible or desirable to install them as system fonts, sowe specified them to polyglossia as filenames:\setotherlanguages{hindi,khmer}\newfontfamily\devanagarifont[Script=Devanagari]

{NotoSerifDevanagari-Regular.ttf}\newfontfamily\khmerfont[Script=Khmer]

{NotoSerifKhmer-Regular.ttf}With the addition of ,Renderer=HarfBuzz in

the \newfontfamily calls, the results obtained withLuaLATEX were identical. ⋄

Kamal Mansour

tugboat, volume 41 (2020), no. 3 275 - tex

Documents