“eve eat dust mop”mernst/advice/poster...“eve eat dust mop” measuring syntac6c development...
TRANSCRIPT
“EveEatDustMop”MeasuringSyntac6cDevelopmentinChildLanguage
withNaturalLanguageProcessingandMachineLearningShannonN.Lube-ch(PomonaCollege’15)
Ins-tuteforCrea-veTechnologies,UniversityofSouthernCalifornia
Automa6cIPSynGoal:Createaprogramthat,givenatranscript,automa-callycalculatesIPSynscore.
Results:Automa6cIPSyn
AcknowledgementsIwouldliketothankProfessorKenjiSagaeforintroducingmetothisproject,andforprovidinghisguidance,advice,support,andsharedknowledgealongtheway.IthankEvanSumafororganizingmyREU,andNSFforgenerouslyprovidingthefundingforit.IwouldalsothanktothankeveryoneatICTfortrea-ngmeaspartofthefamily.
FullyData-DrivenMeasurementGoal:EliminateexplicitIPSynItemsbytrainingamachineonexpertlevelannota-onsandscoressothatitcanproduceanIPSynscorebasedsolelyonfeaturesofatranscript.
SUBJ
n:prop
n:propv
n:propSUBJv
Eveeatdustmop.n:propvnn
SUBJ OBJ
MOD
Data-DrivenApproachtoAgePredic6on
Background:IndexofProduc6veSyntax(IPSyn)Thehigherthescore,themoregramma6callycomplexthelanguage
Arewegoingtotheparty?
N1:Noun
N4:2-wordNounPhrase
N5:Articlebeforeanoun
N6:2-wordNounPhraseafterprep
N2:Pronoun
V1:Verb
V2:Preposition V3:
Prepositionalphrase
V6:AuxiliaryV7:Progressivesuffix(-ing)
Q8:Y/Nquestionwithinvertedaux
S1:2-wordcombination
IPSynmeasuresgramma-calcomplexityoflanguagebyevalua-ng100successiveuWerancesandawardingpointsbasedondefined“IPSynItems.”Thereare60itemsthatcanearnupto2pointseach,resul-nginatotalpossiblescoreof120points.Thisscorecanbebrokendownintosubsec-onsofNounPhrases,VerbPhrases,Ques-onsandNega-ons,andSentenceStructures.
Pattern:N4Description:2-wordNPGR:DETMODQUANTParentrequirements:Index:forward-1POS:n
thisunicorn
fluffyunicorn
oneunicorn
DET
MOD
QUANT
DefiningIPSynItems Preprocessing
Scarborough1990:75transcriptsHoursofmanualscoring
Automa-cIPSyn(thiswork):593transcriptsMinutestocompute
Whenmeasuringchildlanguagedevelopment,researchersoeenfacechoicesbetweeneasilycomputablemetricsfocusedonsuperficialaspectsoflanguageandmoreexpressivemetricsthatrelyongramma-calstructurebutrequiresubstan-allabor.Toadvanceresearchinchildlanguagedevelopment,wepresentanautoma-cscoringsystemfacilita-ngeasyanalysisoflargenumbersoftranscripts.Addi-onally,weexploreamachinelearningapproachtoproducescoresofgramma-calcomplexitybasedonextrac-ngmorphologicalandsyntac-cfeaturesofchilduWerances.Bothtechniquesachieveaccuracysimilartothatoflanguageresearchersandrevealtrendsinsyntac-cdevelopment,offeringpromisingresultsforfutureresearchandapplica-on.Wecanfurtherapplyourdata-drivenapproachtopredic-ngage,whichdoesnotrequireapreviously-defined,language-specificinventoryofgramma-calstructures.
AverageabsolutedifferenceManual:~5points Automa-c:3.6points
Child(corpus) MeanAbsErr Pearson(r)
Adam(Brown) 2.5 0.93
Ross(MacWhinney) 3.7 0.84
Naomi(Sachs) 3.1 0.91
Child(corpus) MLUr IPSynr Regressionr
Adam(Brown) 0.37† 0.53† 0.85†
Ross(MacWhinney) 0.19 0.34* 0.79†
Naomi(Sachs) 0.27 0.52 0.82††p<0.0001
∗p<0.05
Forchildrenatleast3yrs4monthsold
FutureWork¡ Expandtodifferentlanguages(Spanish,Japanese,Hebrew)
¡ Improvepatterns¡ Trainclassifieronmoredata
MOR(MacWhinney2000)
MorphologicalanalyzerPartofspeech
POST(Parisse&LeNormand
2000)PartofspeechtaggerDisambiguate
MEGRASP(Sagaeetal2005/2007)
DependencyparserGramma-calrela-ons
Eveeatdustmop.n:propvnn
SUBJ OBJ
MOD
FeatureExtrac-on