computing meets language

29
Computing meets Language Kevin Duh Dept. of Computer Science & Human Language Technology COE Johns Hopkins University

Upload: others

Post on 22-Mar-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

ComputingmeetsLanguageKevinDuh

Dept.ofComputerScience&HumanLanguageTechnologyCOEJohnsHopkinsUniversity

WhatdoesaComputerScientistdo?

ComputerScienceismorethanjustprogramming&computers!

Imagesource:Almonroth,CCBY-SAviaWikimediaCommonshttps://commons.wikimedia.org/wiki/File:Typing_computer_screen_reflection.jpg

ComputationalThinking

Thinkinglikeacomputerscientistmeansmorethanbeingabletoprogramacomputer.

Itrequiresthinkingatmultiplelevelsofabstraction.

JeannetteWing(ColumbiaUniversity)CommunicationsoftheACM,2006https://www.cs.cmu.edu/~15110-s13/Wing06-ct.pdf

Imagesource:WorldEconomicForum,CCBY-SAviaWikimediaCommonshttps://en.wikipedia.org/wiki/File:Jeannette_Wing,_Davos_2013.jpg

Examplesofcomputationalthinkingatwork

• Builda“model”• Abstractsthekeypropertiesofwhatyou’restudying• Allowsyoutorunsimulationsandpredictions

• Examples:• ComputationalBiology• ComputationalFinance• ComputationalLinguistics

Imagesources:(1)Probkos13,CCBY-SAviaWikimediaCommonshttps://commons.wikimedia.org/wiki/File:Punnett_Square.svg(2)Garwood,Sharma,Dunlop,Giribet,CCBYviaWikimediaCommonshttps://commons.wikimedia.org/wiki/File:Phylogenetic_Analyses_of_Opiliones_2014-A.png

Modelinglanguage?

• Yousayyou”knowEnglish”• Whatexactlyisitthatyouknow?• Howwouldyouwriteitdown?Inwhatnotation?

• Howdotoddlerslearntheirfirstlanguage?• Canweprogramacomputertounderstandhumanlanguage?• Exploitlargeamountsofdata&buildprobabilisticmodelsoflanguage(e.g.viamachinelearning)

Example:whichphraseismorelikelyaccordingtodata?

Outline

1. Introduction:ComputerScienceà ComputationalThinking2. Myfield:ComputationalLinguistics3. Exampleresearchtopic:howGoogleTranslateworks4. HowtobeginCSresearchasahighschooler

ComputationalLinguistics,a.k.a NaturalLanguageProcessing• Wewanttostudy:• Howtomodelhumanlanguage• Howtoprogramcomputerstointerpretandprocesshumanlanguage

• Interdisciplinaryfieldà goodifyoulikebothSTEMandhumanities!• ComputerScience&Engineering• Linguistics,CognitiveScience• Statistics,MachineLearning

Mypathintothisfield

Modelinglanguageatmultiplelevels

• Sound • Word • Sentence

JerryhitTom.

TomwashitbyJerry.

Jerry

Tom

Thisisn’teasy!Unlikeprogramminglanguages,humanlanguagecanbeambiguous.

Imagesource:http://walkinthewords.blogspot.com/2010/07/syntax-with-sherlock-sentence-ambiguity.html

Sherlocksawthemanusingbinoculars

Whatapplicationsarepossible?

• Currentlywedon’tyethaveamodelthatreallyunderstandslanguagefully,butwehavesomeusableones

StrongAI vs Weak

AI

Applications:Analyzingonlinereviews

Applications:Extractinginfofromemails

Applications:Findinganswersinlongarticles(i.e.helpingyoudohomework)

Applications:Machinetranslation

世界には6000の⾔語があります。

Thereare6000languagesintheworld.

Outline

1. Introduction:ComputerScienceà ComputationalThinking2. Myfield:ComputationalLinguistics3. Exampleresearchtopic:howGoogleTranslateworks4. HowtobeginCSresearchasahighschooler

WhenIlookatanarticleinRussian,Isay:“ThisisreallywritteninEnglish,buthasbeencodedinsomestrangesymbols.”

WarrenWeaver,Americanscientist(1894-1978)

Imagecourtesy:BiographicalMemoirsofNationalAcademyofScience,vol.57

1a)evas dlrow-eht

1b)

2a)dlrow-eht si detcennoc

2b)

3a)hcraeser si tnatropmi

3b)

4a)ew eb-ot-mia tseb ni dlrow-eht

4b)

Yourmission:Wefound4sentencepairsfromtwoancientMartianlanguages.Figureoutwhich“word”translatestowhich

1a)evas dlrow-eht

1b)

2a)dlrow-eht si detcennoc

2b)

3a)hcraeser si tnatropmi

3b)

4a)ew eb-ot-mia tseb ni dlrow-eht

4b)

1a)evas dlrow-eht

1b)

2a)dlrow-eht si detcennoc

2b)

3a)hcraeser si tnatropmi

3b)

4a)ew eb-ot-mia tseb ni dlrow-eht

4b)

dlrow-eht

dlrow-eht

3

1

Frequency

si

si

2

1

Lifeinthedayofaresearcher

1. Thinkupanewmodelforlanguagetranslation2. Programit3. Feedthemodellotsofdata4. Testit.Readotherresearcher’spaperstogetmoreideas.5. Gobackto(1)untilsatisfied,thenpublish

Outline

1. Introduction:ComputerScienceà ComputationalThinking2. Myfield:ComputationalLinguistics3. Exampleresearchtopic:howGoogleTranslateworks4. HowtobeginCSresearchasahighschooler

PracticalsuggestionsforgainingComputerScience(CS)researchexperience• Reality:

1. CSisnotjustaboutprogramming,butstrongprogrammingskillisamust!2. TherearemanyresearchareasrelatedtoCS– enoughtofitanyone’s

interest,butalsosomanythatyoumightnotknowwhatisoutthere

• Suggestedplan:1. Improveyourprogrammingskills2. Contactprofessorsforinternopportunities

Improvingyourprogrammingskills

• Pickoneprogramminglanguageandbecomereallygoodatit• e.g.Java,Python,C++,Javascript

• Howtobegood?• Programalot.• Readotherpeople’scode.Workwithafriend,orjoinothers’GitHubprojects• Learnaboutdatastructures&algorithms.TakeComputerScienceclasses(atschoolorCoursera,etc.)

• Createaportfolio onGitHubthatyoucanshowduringapplications

Contactingprofessorsforopportunities

• Writeapoliteemail• Bespecific aboutwhatyouarelookingfor• AddlinktoyourGitHubrepoandexplainyourinterest&experience

• Don’texpectareply• Professorsgetsomanyemailslikethiseverydayfromaroundtheworld….• Professorshavechangingcommitments.No-gothisyeardoesn’tmeannochancefornextyear.

• Ifyou’reluckyandgetaproject:• Beproactive infiguringouthowyoucancontribute.• Becomfortableworkingonsomethingwhenyoudon’tknowallthedetails.• Beindependent.Learnwhentoaskquestionsandwhentoself-study.

Additionalcomments

• Structuredinternshipprogramsarealsogoodwaystolearn,e.g.• JohnsHopkinsAppliedPhysicsLab(APL)ASPIREprogram• Moreresources:https://cty.jhu.edu/resources/academic-opportunities/internships/math.html

• IfinterestedinMachineLearning&AIsubareasofCS,thenmathandprobability/statisticsarealsoimportant.

Summary

1. Introduction:ComputerScienceà ComputationalThinking2. Myfield:ComputationalLinguistics3. Exampleresearchtopic:howGoogleTranslateworks4. HowtobeginCSresearchasahighschooler

Questions/Comments?