cs378: natural language processing lecture 1: introduc:on · course requirements ‣ cs 429 ‣...

26
CS378: Natural Language Processing Lecture 1: Introduc:on Greg Durre= intro video actual course

Upload: others

Post on 17-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

CS378:NaturalLanguageProcessingLecture1:Introduc:on

GregDurre=

introvideo

actualcourse

Page 2: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Administrivia

‣ Coursewebsite(includingsyllabus):h=p://www.cs.utexas.edu/~gdurre=/courses/fa2020/cs378.shtml

‣ Piazza:linkonthecoursewebsite

‣ Myofficehours:seecoursewebsite

‣ TA:TanyaGoyal;Proctor:ShivangSingh.SeewebsiteforOHs

‣ Lecture:TuesdaysandThursdays9:30am-10:45am

‣ Allofficehoursstartnextweek,butIwillstayarounda\erthisclassifyouhaveques:ons

Page 3: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

CourseRequirements

‣ CS429

‣ Recommended:CS331,familiaritywithprobabilityandlinearalgebra,programmingexperienceinPython

‣ Helpful:ExposuretoAIandmachinelearning(e.g.,CS342/343/363)

Page 4: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Enrollment

‣ Assignment0isoutnow(op:onal):

‣ Ifthisseemslikeit’llbechallengingforyou,comeandtalktome(thisissmaller-scalethantheotherassignments,whicharesmaller-scalethanthefinalproject)

‣ Ifyouarepast25onthewaitlist,youhavealowchanceofgedngintotheclass,butwehavetoseehowitprogresses

Page 5: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

FormatandAccessibility

‣ Requiredequipment:devicetomakeZoomcallswith,somewaytodohomework

‣ LabmachinesavailableviaSSH

‣ AGPUisnotrequiredtocompletetheassignments!HavingaGPUorGCPcreditscouldbehelpfulifyouwanttopursueanindependentproject

‣ Lectureswillbuildin:mefordiscussion,in-classexercises,andques:ons.Addi:onalmaterialisavailableasvideostowatcheitherbeforeora\erlectures

‣ We’lldoplentyofdiscussiongroupsinclass.Piazzaisalsoavailabletofindteammates

Page 6: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

What’sthegoalofNLP?‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystemsSiri,what’syourfavoritekindof

movie?

Ilikesuperheromovies!

What’scomeoutrecently?

TheAvengers

Page 7: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

MachineTransla:on

中共中央政治局7⽉30⽇召开会议,会议分析研究当前经济形势,部署下半年经济⼯作。

ThePoli:calBureauoftheCPCCentralCommi=eeheldamee:ngonJuly30toanalyzeandstudythecurrenteconomicsitua:onandplaneconomicworkinthesecondhalfoftheyear.

People’sDaily,August10,2020

ThePoli:calBureauoftheCPCCentralCommi=ee July30 holdamee:ng

Translate

Page 8: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Ques:onAnsweringWhenwasAbrahamLincolnborn?

February12,1809

Name BirthdayLincoln,Abraham 2/12/1809

Washington,George 2/22/1732Adams,John 10/30/1735

Theparkhasatotaloffivevisitorcenters

five

HowmanyvisitorscentersarethereinRockyMountainNa:onalPark?

maptoBirthdayfield

Page 9: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

NLPAnalysisPipeline

Syntac:cparses

Coreferenceresolu:on

En:tydisambigua:on

Discourseanalysis

Summarize

Extractinforma:on

Answerques:ons

Iden:fysen:ment

‣ NLPisaboutbuildingthesepieces!(largelyusingsta:s:calapproaches)

Translate

TextAnalysis Applica=onsText Annota=ons

Page 10: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec=ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

S

NPVP

VBZ NNflightstoMiami

λx.flight(x)∧dest(x)=Miami

Page 11: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Howdoweusetheserepresenta:ons?

Labels

Sequences

Trees

TextAnalysisText

‣Mainques:on:Whatrepresenta:onsdoweneedforlanguage?Whatdowewanttoknowaboutit?Whatambigui:esdoweneedtoresolve?

Applica=ons

Learntree-to-treemachinetransla:onmodels

end-to-endmodels

Page 12: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Whyislanguagehard?(andhowcanwehandlethat?)

Page 13: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(nameda\erTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

‣ >5datasetsinthelasttwoyearsexaminingthisproblemandcommonsensereasoning

‣ Referen:alambiguity

Thecitycouncilrefusedthedemonstratorsapermitbecausetheyadvocatedviolence

Thecitycouncilrefusedthedemonstratorsapermitbecausetheyfearedviolence

Page 14: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

LanguageisAmbiguous!

examplecredit:DanKlein

‣ Syntac:candseman:cambigui:es:parsingneededtoresolvethese,butneedcontexttofigureoutwhichparseiscorrect

TeacherStrikesIdleKids

BanonNudeDancingonGovernor’sDesk

IraqiHeadSeeksArms

Page 15: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibili:eswhichareresolvedpragma:cally

‣ Combinatoriallymanypossibili:es,manyyouwon’tevenregisterasambigui:es,butsystemss:llhavetoresolvethem

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeau:fulItisreallybeau:fuloutsideHemakestrulybeau:fulItfactactuallyhandsome

Page 16: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Whattechniquesdoweuse?(tocombinedata,knowledge,linguis:cs,etc.)

Page 17: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

“AIwinter”rule-based,expertsystems

Unsup:topicmodels,grammarinduc:on

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2020

earlieststatMTworkatIBM

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,Sen:ment

Semi-sup,structuredpredic:on

Neural

Page 18: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresenta:onsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguis:cstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

‣ NLPencompassesallofthesethings

Page 19: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

NLPvs.Computa:onalLinguis:cs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputa:onaltoolstostudylanguage

Hamiltonetal.(2016)

Page 20: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

OutlineoftheCourse‣ Classifica:on:linearandneural,wordrepresenta:ons(3.5weeks)‣ Textanalysis:taggingandparsing(3weeks)<=takesustothemidterm

‣ Genera:on,applica:ons:languagemodeling,machinetransla:on(3weeks)

‣ Ques:onanswering,pre-training(2weeks)

‣ CoverfundamentaltechniquesusedinNLP

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheac:veresearchtopicsin2020?

‣ Understandhowtolookatlanguagedataandapproachlinguis:cphenomena

‣ Goals:

‣ Applica:onsandmiscellaneous(2.5weeks)

Page 21: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

OutlineoftheCourse‣ Throughoutthecourse:ethicsandfairness‣ BroadertopicinMLthanjustNLP

‣ Howcanwemakesureoursystemsbenefitsociety,andeveryoneinit?

‣ Partsoflecturesdevotedtotopicsinethics,comprehensivediscussiononthelastclassday

‣ Nov3:op:onallecture

‣ Balancealgorithms,linguis:cs,data,ethics

Page 22: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Coursework

‣ Fiveassignments,worth45%ofgrade(A1-4:10%,A5:5%)

‣ Mixofwri:ngandimplementa:on;

‣ Assignment0isoutnow,op:onaldiagnos:c

‣ ~2weeksperassignmentexceptforA5

‣ 5“slipdays”throughoutthesemestertoturninassignments24hourslate.Otherwise,youlose15%creditperdaytheassignmentislate

‣ SubmissiononGradescope

Theseassignmentsrequireunderstandingtheconcepts,wri:ngperformantcode,andthinkingabouthowtodebugcomplexsystems.Theyarechallenging;startearly!

Thecoursestaffarenotheretodebugyourcode!Wewillhelpyouunderstandtheconceptsfromlectureandcomeupwithdebuggingstrategies

Page 23: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Coursework

‣ Finalproject(25%ofgrade)‣ Groupsof1or2‣ Standardproject:neuralnetworkmodelsforques:onanswering

‣ Independentprojectsarepossible:thesemustbeproposedearlier(togetyouthinkingearly)andwillbeheldtoahighstandard!

‣ Midterm(20%ofgrade),take-homeOctober14-16

‣ Similartowri=enhomeworkproblems

‣ In-classproblems(10%ofthegrade)

‣ ThesewillbedoneviaUTInstapoll.Youdon’thavetocometoclasstodothem

‣ Dropthelowest5

Page 24: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

AcademicHonesty

‣ Youmayworkingroups,butyourfinalwriteupandcodemustbeyourown

‣ Don’tsharecodewithothers!

Page 25: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

A climate conducive to learning and creating knowledge is the right of every person in our community. Bias, harassment and discrimination of any sort have no place here. If you notice an incident that causes concern, please contact the Campus Climate Response Team: diversity.utexas.edu/ccrt

The College of Natural Sciences is steadfastly committed to enriching and transformative educational and research experiences for every member of our community. Find more resources to support a diverse, equitable and welcoming community within Texas Science and share your experiences at cns.utexas.edu/diversity

Conduct

Page 26: CS378: Natural Language Processing Lecture 1: Introduc:on · Course Requirements ‣ CS 429 ‣ Recommended: CS 331, familiarity with probability and linear algebra, programming experience

Survey