cs378: natural language processing lecture 1: introduc:on · course requirements ‣ cs 429 ‣...

Post on 17-Oct-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS378:NaturalLanguageProcessingLecture1:Introduc:on

GregDurre=

introvideo

actualcourse

Administrivia

‣ Coursewebsite(includingsyllabus):h=p://www.cs.utexas.edu/~gdurre=/courses/fa2020/cs378.shtml

‣ Piazza:linkonthecoursewebsite

‣ Myofficehours:seecoursewebsite

‣ TA:TanyaGoyal;Proctor:ShivangSingh.SeewebsiteforOHs

‣ Lecture:TuesdaysandThursdays9:30am-10:45am

‣ Allofficehoursstartnextweek,butIwillstayarounda\erthisclassifyouhaveques:ons

CourseRequirements

‣ CS429

‣ Recommended:CS331,familiaritywithprobabilityandlinearalgebra,programmingexperienceinPython

‣ Helpful:ExposuretoAIandmachinelearning(e.g.,CS342/343/363)

Enrollment

‣ Assignment0isoutnow(op:onal):

‣ Ifthisseemslikeit’llbechallengingforyou,comeandtalktome(thisissmaller-scalethantheotherassignments,whicharesmaller-scalethanthefinalproject)

‣ Ifyouarepast25onthewaitlist,youhavealowchanceofgedngintotheclass,butwehavetoseehowitprogresses

FormatandAccessibility

‣ Requiredequipment:devicetomakeZoomcallswith,somewaytodohomework

‣ LabmachinesavailableviaSSH

‣ AGPUisnotrequiredtocompletetheassignments!HavingaGPUorGCPcreditscouldbehelpfulifyouwanttopursueanindependentproject

‣ Lectureswillbuildin:mefordiscussion,in-classexercises,andques:ons.Addi:onalmaterialisavailableasvideostowatcheitherbeforeora\erlectures

‣ We’lldoplentyofdiscussiongroupsinclass.Piazzaisalsoavailabletofindteammates

What’sthegoalofNLP?‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystemsSiri,what’syourfavoritekindof

movie?

Ilikesuperheromovies!

What’scomeoutrecently?

TheAvengers

MachineTransla:on

中共中央政治局7⽉30⽇召开会议,会议分析研究当前经济形势,部署下半年经济⼯作。

ThePoli:calBureauoftheCPCCentralCommi=eeheldamee:ngonJuly30toanalyzeandstudythecurrenteconomicsitua:onandplaneconomicworkinthesecondhalfoftheyear.

People’sDaily,August10,2020

ThePoli:calBureauoftheCPCCentralCommi=ee July30 holdamee:ng

Translate

Ques:onAnsweringWhenwasAbrahamLincolnborn?

February12,1809

Name BirthdayLincoln,Abraham 2/12/1809

Washington,George 2/22/1732Adams,John 10/30/1735

Theparkhasatotaloffivevisitorcenters

five

HowmanyvisitorscentersarethereinRockyMountainNa:onalPark?

maptoBirthdayfield

NLPAnalysisPipeline

Syntac:cparses

Coreferenceresolu:on

En:tydisambigua:on

Discourseanalysis

Summarize

Extractinforma:on

Answerques:ons

Iden:fysen:ment

‣ NLPisaboutbuildingthesepieces!(largelyusingsta:s:calapproaches)

Translate

TextAnalysis Applica=onsText Annota=ons

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec=ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

S

NPVP

VBZ NNflightstoMiami

λx.flight(x)∧dest(x)=Miami

Howdoweusetheserepresenta:ons?

Labels

Sequences

Trees

TextAnalysisText

‣Mainques:on:Whatrepresenta:onsdoweneedforlanguage?Whatdowewanttoknowaboutit?Whatambigui:esdoweneedtoresolve?

Applica=ons

Learntree-to-treemachinetransla:onmodels

end-to-endmodels

Whyislanguagehard?(andhowcanwehandlethat?)

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(nameda\erTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

‣ >5datasetsinthelasttwoyearsexaminingthisproblemandcommonsensereasoning

‣ Referen:alambiguity

Thecitycouncilrefusedthedemonstratorsapermitbecausetheyadvocatedviolence

Thecitycouncilrefusedthedemonstratorsapermitbecausetheyfearedviolence

LanguageisAmbiguous!

examplecredit:DanKlein

‣ Syntac:candseman:cambigui:es:parsingneededtoresolvethese,butneedcontexttofigureoutwhichparseiscorrect

TeacherStrikesIdleKids

BanonNudeDancingonGovernor’sDesk

IraqiHeadSeeksArms

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibili:eswhichareresolvedpragma:cally

‣ Combinatoriallymanypossibili:es,manyyouwon’tevenregisterasambigui:es,butsystemss:llhavetoresolvethem

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeau:fulItisreallybeau:fuloutsideHemakestrulybeau:fulItfactactuallyhandsome

Whattechniquesdoweuse?(tocombinedata,knowledge,linguis:cs,etc.)

“AIwinter”rule-based,expertsystems

Unsup:topicmodels,grammarinduc:on

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2020

earlieststatMTworkatIBM

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,Sen:ment

Semi-sup,structuredpredic:on

Neural

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresenta:onsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguis:cstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

‣ NLPencompassesallofthesethings

NLPvs.Computa:onalLinguis:cs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputa:onaltoolstostudylanguage

Hamiltonetal.(2016)

OutlineoftheCourse‣ Classifica:on:linearandneural,wordrepresenta:ons(3.5weeks)‣ Textanalysis:taggingandparsing(3weeks)<=takesustothemidterm

‣ Genera:on,applica:ons:languagemodeling,machinetransla:on(3weeks)

‣ Ques:onanswering,pre-training(2weeks)

‣ CoverfundamentaltechniquesusedinNLP

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheac:veresearchtopicsin2020?

‣ Understandhowtolookatlanguagedataandapproachlinguis:cphenomena

‣ Goals:

‣ Applica:onsandmiscellaneous(2.5weeks)

OutlineoftheCourse‣ Throughoutthecourse:ethicsandfairness‣ BroadertopicinMLthanjustNLP

‣ Howcanwemakesureoursystemsbenefitsociety,andeveryoneinit?

‣ Partsoflecturesdevotedtotopicsinethics,comprehensivediscussiononthelastclassday

‣ Nov3:op:onallecture

‣ Balancealgorithms,linguis:cs,data,ethics

Coursework

‣ Fiveassignments,worth45%ofgrade(A1-4:10%,A5:5%)

‣ Mixofwri:ngandimplementa:on;

‣ Assignment0isoutnow,op:onaldiagnos:c

‣ ~2weeksperassignmentexceptforA5

‣ 5“slipdays”throughoutthesemestertoturninassignments24hourslate.Otherwise,youlose15%creditperdaytheassignmentislate

‣ SubmissiononGradescope

Theseassignmentsrequireunderstandingtheconcepts,wri:ngperformantcode,andthinkingabouthowtodebugcomplexsystems.Theyarechallenging;startearly!

Thecoursestaffarenotheretodebugyourcode!Wewillhelpyouunderstandtheconceptsfromlectureandcomeupwithdebuggingstrategies

Coursework

‣ Finalproject(25%ofgrade)‣ Groupsof1or2‣ Standardproject:neuralnetworkmodelsforques:onanswering

‣ Independentprojectsarepossible:thesemustbeproposedearlier(togetyouthinkingearly)andwillbeheldtoahighstandard!

‣ Midterm(20%ofgrade),take-homeOctober14-16

‣ Similartowri=enhomeworkproblems

‣ In-classproblems(10%ofthegrade)

‣ ThesewillbedoneviaUTInstapoll.Youdon’thavetocometoclasstodothem

‣ Dropthelowest5

AcademicHonesty

‣ Youmayworkingroups,butyourfinalwriteupandcodemustbeyourown

‣ Don’tsharecodewithothers!

A climate conducive to learning and creating knowledge is the right of every person in our community. Bias, harassment and discrimination of any sort have no place here. If you notice an incident that causes concern, please contact the Campus Climate Response Team: diversity.utexas.edu/ccrt

The College of Natural Sciences is steadfastly committed to enriching and transformative educational and research experiences for every member of our community. Find more resources to support a diverse, equitable and welcoming community within Texas Science and share your experiences at cns.utexas.edu/diversity

Conduct

Survey

top related