languages of the world · 2019-12-13 · our itinerary •i’ll introduce various language...
TRANSCRIPT
LanguagesoftheWorld
KevinDuh
0
Goalsofthislecture
1. Appreciatethediversity oflanguages
2. Discusssomeimportantlinguisticphenomenon andclassificationsmayhelpyouwithyourNaturalLanguageProcessingresearch
1
Outline
1. Whatisalanguage?2. LanguageChange3. WorldTour4. LanguageUniversals
2
Whatisalanguage?• Alanguageis“a productofthecollectivemindoflinguisticgroups”-- FerdinanddeSaussure
• “Alanguageisadialectwithanarmyandnavy”– MaxWeinreich– E.g.Chinese“dialects”,Scandinavian“languages”
From:http://en.wikipedia.org/wiki/File:Ferdinand_de_Saussure_by_Jullien.png
From:http://epyc.yivo.org/content/12_1.php
3
Definitionoflanguageintermsof“MutualIntelligibility”
• Twocaveats:– Dialectcontinuum: Astringofdialectsmaybemutuallyintelligible,butnottransitive• E.g.Dutch-Germandialectcontinuum
– It’samatterofdegree,noclear-cutintelligibilitytest
• There’snosuchthingas“languages”;“Dialects”areallthereis.– Onedialectdefinedas“standard”language• E.g.Tokyodialectas“Japanese”
4
NumberstoKnow:Howmanylanguagesintheworld?
• Conservativeestimate:6000– Peakofdiversity:10,000-15,000(~15,000BCE)
• SkeweddistributionPopulationrange #ofLanguages Percentageof
worldpopulation
100,000,000+ 8 40%
10,000,000-99,999,999 80 39%
1,000,000-9,999,999 305 14%
100,000-999,999 93 4%
10,000-99,999 1,811 0.9%
1,000-9,999 1,978 0.1%
100-999 1,062 0.007%
1-99 475 0.0002%
Source:Ethnologue - http://www.ethnologue.com/statistics/status5
Pauseandthinkaboutthisforabit
WhatIsayherecanbeexpressedequivalentlyin6000otherways,usingcompletelydifferentwordsandgrammar!
6
Numberstoknow:Largestlanguageby#ofspeaker
Language #ofL1Speakers (inmillions)
Chinese 1,197
Spanish 414
English 335
Hindi 260
Arabic 237
Portuguese 203
Bengali 193
Russian 167
Japanese 122
Javanese 84
Source:Ethnologue - http://www.ethnologue.com/statistics/status7
Numberstoknow:Whendidlanguagearise?
200,000yearsago:Anatomicallymodernhumans
50,000yearsago:Behaviorial ModernityLanguageenablescooperation&gossipà largersocialgroups
12,000yearsago:AgriculturalRevolutionDisclaimer:Datesareinexact.I’mnotanexpertandthereappearstobenodefinitiveanswer.
Languagearosehere?Orhere?
8
AndisthereaLanguageInstinct?
Outline
1. Whatisalanguage?2. LanguageChange3. WorldTour4. LanguageUniversals
9
Changeisthecauseofdiversity
• ChangebyNaturalEvolution– Slightdifferencesinspeaking(usuallyduetoLaziness)leadstolargedifferencesaftergenerations
– E.g.Soundchange,re-bracketing,semanticshift
• ChangebyContact(ArealEffect)– Borrowingofphonology,lexicon,andgrammarfromneighboringlanguages• E.g.BalkanSprachbund:Albanian,Greek,Romanian,Bulgarian,Macedonian
à verb-Not-verb,post-article,genitive&dativemerger
10
Soundchange
• Principleofleasteffort, e.g.:– “Godbewithyou”à Godb’wyà Goodbye– Lossofcase-endingsinLatinà NecessityofwordorderforgrammaticalfunctioninEnglish
– Loss/mergerofconsonantsinOldChineseàNecessityofTones
• Generalchange,e.g.:– GreatVowelShift(1350-1700,England)• “bite”bi:t à baIt;“beet”:be:tà bi:t
11
ExtensionofGrammaticalPatternsduetosoundchange
• Latinhadmultiplepluralrules:– sorōrēs “sisters”– fēminaà fēminae “women”– dominusà domini “master”
• InFrench,onlyonepluralendingwasleftduetosounderosion,so-swasextended
12
MorphologicalTypeChange
13From:http://languagesoftheworld.info/historical-linguistics/more-on-word-order-morphological-types-and-historical-change.html
Lossofinflection,e.g.OldEnglishàModernEnglish
Morphemesfuse
Wordsbecomegrammaticalized asaffixes
Outline
1. Whatisalanguage?2. LanguageChange3. WorldTour4. LanguageUniversals
14
OurItinerary
• I’llintroducevariouslanguagefamilieswhilewetourtheworld– Note:Don’tconfusegeographicalandgeneticclassification;e.g.LanguagesinEurasia!=Indo-Europeanlanguages
• Foreachlanguagefamily,I’llpointoutsomeinterestingphenomenaortrivia– Warning1:Thesephenomenaarebynomeansuniquetothelanguageunderdiscussion.Mayappearelsewhere.
– Warning2:Duetotimelimitation,notallimportantphenomenawillbediscussed.Ourtouris走馬看花 style:“viewingtheflowerswhileridingafasthorse”
15
Indo-EuropeanLanguageFamily
From:http://en.wikipedia.org/wiki/Indo-European_languages 16
17
Indo-European
Germanic:English,German,Swedish,etc.
Italic:Italian,French,Spanish,Romanian,etc.
Balto-Slavic:Lithuanian,Russian,Polish,Czech,etc.
Hellenic: Greek
Albanian:Albanian
Celtic:Gaelic,Scottish
Armenian:Armenian
Indo-Iranian: Farsi,Hindi,Bengali,Marathi,etc.
DiscoveryoftheIndo-EuropeanFamily1796:SirWilliamJones noticedsimilaritybetweenSanskrit&Latin
ComparativeReconstruction:- Cognatesfrombasicvocabulary
(bodyparts,kinship,nature)- Identifypatternsofsound
change&correspondence
18
From:http://en.wikipedia.org/wiki/William_Jones_(philologist)
1 2 3
Irish aon do tri
Greek hen duo treis
Latin unus duo tres
Italian uno due tre
French un deux trois
German einz zwei drei
Swedish en tva tre
Russian odin dva tri
Bengali ek dvi tri
Persian yak do se
ProtoIE? Hoi-no? duwo?
trei?
Turkish bir iki üc
Hebrew ‘exad šnaim šlosa
NotIE
IE
Finno-UgricFamily:Finnish,Hungarian,Estonian,etc.
19From:http://finno-ugric.com
Geographicdiscontinuityisinteresting:- Urals:probablehomeland- Finnic branchwaslargerbutencroachmentbySlavic- HungarianbranchduetoMagyarmigration(800CE)
Finno-Ugric:AgglutinativeMorphology14casesinEstonian,15casesinFinnish,21casesinHungarian:
Note:manyoftheseareencodedbyprepositionsinIndo-Europeanlanguages(average6cases)
Case HungarianWord Gloss
Nominative hajó ship[subject]
Accusative hajó-t ship[object]
Inessive hajó-ban inaship
Elative hajó-ból outofaship
Illative hajó-ba intoaship
Superessive hajó-n onaship
Delative hajó-ról aboutaship
Sublative hajó-ra ontoaship
Adessive hajó-nál byaship
Ablative hajó-tól fromaship
…
Basque
• Unrelatedtoanyotherlanguage?• Ergative-absolutive casesystem
TransitiveSentenceIntransitiveSentenceAgentPatientSubjectNominativeAccusativeNominative
TransitiveSentence IntransitiveSentenceAgentPatientSubjectErgativeAbsolutive Absolutive
21
From:http://en.wikipedia.org/wiki/Basque_language
DravidianLanguageFamily
22
DistinctfromIndo-EuropeaninnorthernIndiaSomeCharacteristics:- RigidSOVwordorder- Nounsgender:“rational”(referstohuman,deity)vs.“irrational”(referstochildren,animal,objects)
From:http://en.wikipedia.org/wiki/Dravidian_languages
LanguagesoftheCaucusesregion
Manydifferentlanguagefamiliesinthissmallarea!
Trivia:Chechenhas40-60consonants,~44vowels
From:http://en.wikipedia.org/wiki/Languages_of_the_Caucasus
AltaicLanguageFamily(?)
• Macro-familyconsistingofpossiblyTurkic,Mongolic,Tungustic– Korean&Japanese?– Similaritiesduetogeneticsorcontact?
24From:http://en.wikipedia.org/wiki/Altaic_languages
VowelHarmonyinTurkic
• Turkic:Turkish,Uzbek,Kazakh,Dolgan,etc.• VowelHarmony:– long-distanceassimilationwherevowelsbecomesimilaracrossinterveningconsonantsinsomeway
– E.g.back/front&rounded/unroundedharmonizationinTurkish:
Türkiye’dir “itisTurkey”kapıdır “itisthedor”gündür “itistheday”paltodur “itisthecoat”
25
SemiticLanguageFamily:Hebrew,Arabicdialects,Aramaic,Amharic,etc.
26From:http://en.wikipedia.org/wiki/Semitic_languages
Non-concatenative morphologyinSemitic(e.g.Arabic)
• Root:2-4consonant;Template:vowelsin-between• ktb "write"(asverb)
ti-ktib "shewrites"(prefixti- means"she",presentformis"- - i -")
katab-it"shewrote"(suffix-itmeans"she","pastformis"- a- a-")
kaatib "writing"(presentparticiple"- aa - i -”)
ma-ktuub "written”(pastparticiple"- - uu -")
• ktb "book"(asnoun)kitaab:(- i - aa – singular)kutub:(- u- u– plural)
27
LanguagesinSub-SaharanAfrica
• Nilo-Saharan• Niger-Congo• Khoisan
Characteristics:- Manyaretonal,havelargesoundinventoriesand“exotic”sounds,e.g.implosives,clicks- Largenounclasses(Shona:20)
28From:http://en.wikipedia.org/wiki/Languages_of_Africa
Sino-TibetanLanguageFamily
Siniticbranch:
29
Tibetanbranch:- e.g.Tibetan,Burmese
Characteristics:- Tone- Isolatingmorphology- NounClassifiers
numeral-classifier-nouninMandarinnoun-numeral-classifierinBurmese
From:http://en.wikipedia.org/wiki/Chinese_language
From:http://en.wikipedia.org/wiki/Austroasiatic_languagesFrom:http://en.wikipedia.org/wiki/Tai–Kadai_languages
Tai-Kadai Familye.g.Thai– tone(5),isolating,nounclassifier
30
Austro-AsiaticFamilye.g.Vietnamese– tone(6),isolating,nounclassifier,30%vocabviaChinesee.g.Munda – notone,agglutinative
Likelyarealeffects
AustronesianLanguages• Formosanbranch:~20languagesinTaiwan(manyendangered)• Malayo-Polynesianbranch:
– West:Javanese,Sundanese,Malay,Indonesian,Tagalog,Malagasy,etc.– East:Hawaiian,Maori,Fijian,etc.
31From:http://en.wikipedia.org/wiki/Austronesian_languages
AmazingSeafarers!!
Characteristics:- Ergative-Absolutive- Agglutinativemorphology- Smallsoundinventory:(13phonemeinHawaiian)- SomehaveVOS,VSOorder- Inclusive/Exclusive1st personpronoun:“we”includeshearer?- Reduplication
Reduplication
Soundrepetitionwithinawordforsemanticorgrammaticalpurposee.g.Tagalog:
sulat “write”à susulat “willwrite”hanap “seek”à hahanap “willseek”lakad “walk”à lalakad “willwalk”
e.g.Indoneasian:anak “child”à anak anak “allsortsofchildren”oraN “man”à oraN oraN “allsortsofmen”
32
LanguagesinPapuaNewGuinea:• 800+languages!(1languageper200-900km2)• Diversityduetomountains(naturalbarriers)andtribalsociety(culturalbarriers)
• Tok Pisin (oneoftheofficiallanguages):– PidginarosefromcontactbetweenEnglish&locals– PidginbecomescreolewhenchildrenlearnitasL1– LexiconismostlyfromEnglish.Syntaxisfromwhere?
LanguagesinAustralia:• 270languages,manynearextinction• Trivia- NounclassesinDyirbal:
I:masculine&animate;II:feminine,fire,fighting;III:alltreeswithediblefruit;IV:everythingelse
LanguagesofAmerica(thereareattemptstogroupthemintomacro-families,butcontroversial)
34From:http://en.wikipedia.org/wiki/Indigenous_languages_of_the_Americas
SomeInterestingPhenomena• MultipleArgumentAgreement inMohawk:– Verbnotonlyagreeswithsubjectbutalsoobject
• E.g.shako- prefix:agreementw/3rd personsubjectand3rdpersonobject;ra-:agreementwithjust3rd personsubject
– Nounincorporation:nounrootbecomespartoftheverb,andonelessargumenttoagreewith:• 3words:Wa’-k-hniui-’(1sg-subj-BUY)ne (part)ka-nakt-a’(prefix-BED-suffix)à 1word:Wa’-ke-nakta-hninu-’.
• Three-waycasemarkinginNezPerce:– Subjectsofintransitives,subjectsoftransitives,objectsoftransitivesà allgetdifferentcase
• OVSwordorder inCarib• Evidentialmarker inMakah 35
Outline
1. Whatisalanguage?2. LanguageChange3. WorldTour4. LanguageUniversals
36
LinguisticUniversalsandTypology
• Typology:classifieslanguageandaimstodescribecommonpropertiesanddiversity
• E.g.:ThefollowingWordOrdersarecommon.– SOV:Japanese,Tamil,Turkish(565languagesinwals.info)– SVO:Chinese,English,Fula (488languagesinwals.info)– VSO:Arabic,Tongan,Welsh(95languagesinwals.info)
• WhysofewVOS,OVS,OSV(total<5%)?– Hypothesis:SubjectstendtoprecedeObjects
• Why?Maybe:AgentbeforePatient=betterinfoflow– Note:somelanguageshaveV2ornodominantorder
37
TypologicalGeneralizations
• SOVtendencies:– havepostpositions– genitive-noun,etc.
• Analyticalmorphologytendencies:– mono-syllablewords– useoftones– useoffunctionwords– relativefixedwordorder
38
• SVOtendencies:– haveprepositions– noun-genitive,etc.
• Syntheticmorphologytendencies:– poly-syllablewords– nouseoftones– fewerfunctionwords– relativefreewordorder
CheckoutWorldAtlasofLanguageStructures(http://wals.info) formore!
39
IanMaddieson.2013.VowelQualityInventories.In:Dryer,MatthewS.&Haspelmath,Martin(eds.)TheWorldAtlasofLanguageStructuresOnline.Leipzig:MaxPlanckInstituteforEvolutionaryAnthropology.
VOWELQUALITYINVENTORY
CheckoutWorldAtlasofLanguageStructures(http://wals.info) formore!
40
Greville G.Corbett.2013.NumberofGenders.In:Dryer,MatthewS.&Haspelmath,Martin(eds.)TheWorldAtlasofLanguageStructuresOnline.Leipzig:MaxPlanckInstituteforEvolutionaryAnthropology.
NUMBEROFGENDERS
CheckoutWorldAtlasofLanguageStructures(http://wals.info) formore!
41
MatthewS.Dryer.2013.DefiniteArticles.In:Dryer,MatthewS.&Haspelmath,Martin(eds.)TheWorldAtlasofLanguageStructuresOnline.Leipzig:MaxPlanckInstituteforEvolutionaryAnthropology.
DEFINITEARTICLES
Summary
1. Whatisalanguage?2. LanguageChange3. WorldTour4. LanguageUniversals
42
GoodReferences• BernardComrie (Ed.)(2009)TheWorld’sMajorLanguages,
2nd ed.NewYork,NY:Routledge• Asya Pereltsvaig (2012)LanguageoftheWorld:An
Introduction.CambridgeUniv.Press• JohnMcWhorter(2001)ThePowerofBabel.HarperCollins
Press• MatthewDryer&MartinHaspelmath (Eds.)(2013)The
WorldAtlasofLanguageStructuresOnline.Leipzig:MaxPlanckInstituteforEvolutionaryAnthropology.(Availableonline:http://wals.in)
• BernardComrie,StephenMatthews,MariaPolinsky (Eds.)(1998)TheAtlasofLanguages.BloomsburyPublishing
43