nautral langauge processing - basics / non technical
TRANSCRIPT
Welcome!
Why NLP?
lWe have to adopt to how computer wants data
land we still adopt to the way computer gives back
information.
lNLP is helping us to make computer understand one of the
most powerful interface to HUMANS : language.
lApple Siri , Google Now are cutting edge examples of how
NLP helps computer to fit humans.
lMore details : http://www.slideshare.net/yourfrienddhruv/apps-with-ears-and-eyes
Google Now vs. Siri vs. Cortana
https://www.stonetemple.com/great-knowledge-box-showdown/
Cutting edge NLP!
https://news.ycombinator.com/item?id=8426148
https://news.ycombinator.com/item?id=8428007
http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/
Cutting edge NLP!
https://news.ycombinator.com/item?id=8428418
AI Websites That Design
Themselves
thegrid.i
o
NLP in today's session
In this session we will focus more on how we
can deal with written language in software
products.
NLP for text analysis
lKnowledge is fundamental requirement for any
problem solving.
lAn intelligent decision making system needs 3
Major things.
lA) Lots of relevant knowledge
lB) A way to represent that knowledge
corresponding to current problem/question at
hand
lC) A way to represent the answer in human
language.
General Architecture of NLP systems
lBasic systems
l Tokenization-> [lemmatization] -> tagging ->
chunking -> domain mapping
l NLP systems requires per-created domain
specific corpora (dictionary+rule set handcrafted
by humans)
l Details: http://www.nltk.org/book/ch05.html
General Architecture of NLP systems
lAdvance Systems
http://nlp.stanford.edu/software/patternslearning.shtml
Relationship to Machine Learning
lNLP lAlgo and tooling are targeted to convert Text/Data to
Values
lML lAlgo and tooling are targeted to consume Values and
produce meaningful Values/Vectors
Few popular NLP toolkits
lPythonlhttp://www.nltk.org
lhttp://scikit-learn.org/
lhttps://textblob.readthedocs.org
lJavalhttp://nlp.stanford.edu/software/index.shtml
lhttps://gate.ac.uk/overview.html
lhttps://opennlp.apache.org/
l Rlhttp://cran.r-
project.org/web/views/NaturalLanguageProcessing.ht
ml
Interesting applications
lCoverd in this session
l1) Information summarization
l2) Information extraction
l3) Sentiment Analysis
l4) Dialog based systems
1) Information summarization
lCreates summary of big text.
l http://summly.com/
lYou can create highly personalized summary of same
content per user
lhttp://automatedinsights.com/wordsmith/
lRace is on between 'plagiarism detection' and 'automatic
paraphrasing'
l http://copyscape.com/
l https://oaps.eu/project/overview/
l http://plagcontrol.com
lHandy code :
l Python and related : https://github.com/miso-belica/sumy
l Java/Scala : https://github.com/MojoJolo/textteaser
lBasics:
Have to pick most interesting sentences with highest
2) Information extraction
lNamed Entity RecognitionlCommon entity types include ORGANIZATION,
PERSON, LOCATION, DATE, TIME, MONEY, and
GPE (geo-political entity).
lRelationship extractionlMainly between NERs
lhttp://www.cruxbot.com/
lHandy code :lhttp://www.nltk.org/book/ch07.html
lBasics:l Find interesting pair of words, and note adjoining
words to know relationship between them.
2.1) Information Retrieval
lLarge text needs to be search based on key words
lTraditional RDMS indexing don't work.
lUsing Full text search toolkits, which are good practical
example of NLP implementation.
lHandy Code:
lSolar:Java
lPostgresql:DB
lhttp://blog.lostpropertyhq.com/postgres-full-text-search-is-
good-enough/
l Basics:
lWhile storing large text, remove non value added words (e.g
verbs) and index only root of words.
3) Sentiment Analysis
lTo understand overall meaning/tone of text.le.g. Neutral vs. Polar. Positive vs. Negative.
lDemo lhttp://text-processing.com/demo/sentiment/
lhttp://nlp.stanford.edu:8080/sentiment/rntnDemo.html
lUse:lFinding twitter tread is positive or negative?
lFinding overall review for a product is positive or
negative?
lBasics:
l Have to pick most interesting phrases and co-
relate their meaning.
l Correlate/Group things with similar meaning
4) Dialog based systems
lUnderstand input given in natural language.
lGoogle search, Siri, Google Now
lBuilding interactive chat bots to handle customer support.
lDetails:http://www.nltk.org/book/ch10.html
lHandy code:
l We can convert a question to SQL Query!
lBasics:
lHave English grammar mapped to another grammar for input parsing
& vise-a-verse
Development & Testing/Verifying of NLP systems
l1) Understand Gold Set, Training Set , Test Set
l2) Seen vs Unseen Data
l3) Accuracy : Precision & Recall.
l4) Confusion Matrices
Session Summary
l1) NLP + ML capabilities are foundation for
intelligent systems working with / on consumer
data.
l2) Domain knowledge is the key differentiators
and MAJOR cost factor
l3) NLP system development requires different mid
set, as its not creation but its evolution of software
system.
l4) Lots and Lots of academic / research reading is
must.
What Next? Q&A? Are you sure?
lI have an Idea which might require NLPlGo reach out more people:
l@nikunjness , @yourfrienddhruv
lI am want to know how to develop such systems
lI think I want to research more possibilities!lRead this : http://www.nltk.org/book/ch01.html
lYes, It's python.
lI think its too complex.lYou are not alone.