natural language processing verbatim text coding and data mining report generation josef s.w. leung...
TRANSCRIPT
![Page 1: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/1.jpg)
Natural Language Natural Language ProcessingProcessing
Verbatim Text Coding andVerbatim Text Coding andData Mining Report GenerationData Mining Report Generation
Josef S.W. LeungJosef S.W. Leung (([email protected]@ieee.org))
Ching-Long YehChing-Long Yeh (([email protected]@cse.ttit.edu.tw))
NLP One of the Top Priority Funding Items
in Computer Science Research -- National Natural Science
Foundation, China
![Page 2: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/2.jpg)
Language
Listen
(Understand)Speak
(Generate)
![Page 3: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/3.jpg)
Natural Language
Internal Representatio
ns
GenerationGeneration
Analysis/ Analysis/ UnderstandingUnderstanding
Natural Language ProcessingNatural Language Processing
![Page 4: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/4.jpg)
Outline of PresentationOutline of Presentation
• NLP IntroductionNLP Introduction– Natural Language Analysis/UnderstandingNatural Language Analysis/Understanding
– Natural Language GenerationNatural Language Generation
• Case 1: Verbatim Text CodingCase 1: Verbatim Text Coding– May need NL analysis techniquesMay need NL analysis techniques
• Case 2: Data Mining Report GenerationCase 2: Data Mining Report Generation– May need NL generation techniquesMay need NL generation techniques
![Page 5: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/5.jpg)
Pre-processing
Tokens
Parsing
Syntactic structure
Semantic Interpretation Semantic
representation
Contextual Interpretation
Knowledge representati
on
Input sentence
Modules of NL Modules of NL UnderstandingUnderstanding
![Page 6: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/6.jpg)
Parsing for Syntactic Parsing for Syntactic AnalysisAnalysis
Grammar Grammar Rules:Rules:
S
NP
VP
NP + VP
ART + N
V + NP
Lexicon:Lexicon:
N
N
V
ART
dog
cat
chased
the
![Page 7: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/7.jpg)
s
NP VP
ART N V NP
dog chased the cat
ART N
the
Syntactic StructureSyntactic Structure
![Page 8: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/8.jpg)
Structural AmbiguityStructural Ambiguity
• Time flies like an arrow.Time flies like an arrow.
• The passage of time is as quick as The passage of time is as quick as an arrow.an arrow.
• A species of flies called ‘time flies’ A species of flies called ‘time flies’ enjoy an arrow.enjoy an arrow.
![Page 9: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/9.jpg)
Structural AmbiguityStructural Ambiguity
• The man saw the girl with The man saw the girl with telescope.telescope.
• The man saw the girl who possessed The man saw the girl who possessed the telescope.the telescope.
• The man saw the girl with the aid of The man saw the girl with the aid of the telescope.the telescope.
![Page 10: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/10.jpg)
User’s Goal
Surface Sentences
Strategic Component
Tactical Component
Domain KB
Planning Operators
User Model
Discourse Model
Linguistic Rules & Lexicon
Text Planning
Linguistic Realizatio
n
Natural Language Natural Language GenerationGeneration
![Page 11: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/11.jpg)
Unification GrammarUnification Grammar
the man sees a the man sees a sheepsheep
S [numb=X, S [numb=X, tense=T]tense=T]
NP [numb=X] VP [numb=X, NP [numb=X] VP [numb=X, tense=T]tense=T]VP[numb=N,tenseVP[numb=N,tense
=M]=M] V [numb=N, tense=M] NPV [numb=N, tense=M] NP
NP NP [numb=Y][numb=Y]
det [numb = Y] noun [numb = det [numb = Y] noun [numb = Y]Y]
manman : : noun [numb = sing]noun [numb = sing] a a :: det [numb = sing]det [numb = sing] the the : : detdetsheepsheep :: nounnounseessees : : [tense = pres, numb = sing][tense = pres, numb = sing]
![Page 12: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/12.jpg)
Migraine abortive Migraine abortive treatment is used to treatment is used to abort migraine.abort migraine.((cat clause)((cat clause) (process ((lex “ (process ((lex “useuse”) (type material)))”) (type material))) (partic ((affected ((cat proper) (partic ((affected ((cat proper) (lex “ (lex “migraine abortive treatmentmigraine abortive treatment”)))”))) (agent none))) (agent none))) (circum ((purpose ((cat clause) (circum ((purpose ((cat clause) (keep-in-order no) (keep-for no) (keep-in-order no) (keep-for no) (position end) (position end) (process ((lex “ (process ((lex “abortabort”)”) (effect-type creative) (effect-type creative) (type material))) (type material))) (partic ((created ((lex “ (partic ((created ((lex “migrainemigraine”)”) (countable no) (countable no) (cat common))))))))))) (cat common)))))))))))
![Page 13: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/13.jpg)
Verbatim Text CodingVerbatim Text Coding
• A text content classification problem.A text content classification problem.
• Group semantically similar answer items.Group semantically similar answer items.
• Develop a code list/tree to represent the Develop a code list/tree to represent the answer item groups.answer item groups.
• Simple NL analysis techniques may help.Simple NL analysis techniques may help.
• Details will be given in the first example of Details will be given in the first example of NLP application.NLP application.
![Page 14: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/14.jpg)
Data Mining Report Data Mining Report GenerationGeneration
• Data mining results are usually in Data mining results are usually in rule or tree formats with obscure rule or tree formats with obscure notations.notations.
• NL generation techniques may help NL generation techniques may help translate the data mining results translate the data mining results into plain natural languages.into plain natural languages.
• Details will be given in the second Details will be given in the second example of NLP application.example of NLP application.
![Page 15: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/15.jpg)
Codia for Verbatim Text Codia for Verbatim Text CodingCoding
Answer Items Code Tree
• Small Small screen/window/textscreen/window/text
• Long list of answer Long list of answer itemsitems
• Difficult to browse/viewDifficult to browse/view
• Worse than paper formWorse than paper form
![Page 16: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/16.jpg)
Codia for Verbatim Text Codia for Verbatim Text CodingCoding
Key Terms
![Page 17: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/17.jpg)
Ranking Answers by SimilarityRanking Answers by Similarity
Items with similar meaning
![Page 18: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/18.jpg)
Text Similarity MeasuresText Similarity Measures
StringString
SemanticsSemantics CoverageCoverage
Text Text Similarity Similarity ScoreScore
![Page 19: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/19.jpg)
Codia for Verbatim Text Codia for Verbatim Text CodingCoding
• A user-interface for classifying answer A user-interface for classifying answer items by drag-and-drop actions.items by drag-and-drop actions.
• NLP reduces time and effort in NLP reduces time and effort in searching, browsing, and selecting searching, browsing, and selecting multiple answer items for multiple answer items for classification.classification.
• There’s still limitations and not fully There’s still limitations and not fully automated.automated.
![Page 20: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/20.jpg)
Technical Issues of CodiaTechnical Issues of Codia
• Improve user-interface.Improve user-interface.
• Use only simple NLP techniques.Use only simple NLP techniques.
• Ambiguity resolution by human.Ambiguity resolution by human.
• Limited by thesaurus.Limited by thesaurus.
• Still cannot handle negatives ‘Not’. Still cannot handle negatives ‘Not’.
• Knowledge engineering is tedious.Knowledge engineering is tedious.
![Page 21: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/21.jpg)
Limitations and Future Limitations and Future ImprovementsImprovements
• Thesaurus has only Thesaurus has only 60,000 terms 60,000 terms classified into 3900 classified into 3900 semantic categories.semantic categories.
• Manual operation Manual operation (ambiguity (ambiguity resolution relies on resolution relies on human).human).
• Similarity measures Similarity measures are too mechanical.are too mechanical.
• Need to update and Need to update and incorporate incorporate frequently used frequently used terms/categories.terms/categories.
• Towards automation Towards automation by using more AI by using more AI such as NLP, GA and such as NLP, GA and NN.NN.
• More adaptive by More adaptive by rule-based or case-rule-based or case-based reasoning.based reasoning.
![Page 22: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/22.jpg)
Data Mining and Knowledge Data Mining and Knowledge DiscoveryDiscovery
PatternsPatterns
KnowledgeKnowledge
DataData
Data Data MiningMining
InterpretatioInterpretationn
KnowledgKnowledge e DiscoveryDiscovery
![Page 23: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/23.jpg)
IfIf q12 = 4 and q12 = 4 and
q31 = 6 and q31 = 6 and
q35 = 3 q35 = 3
thenthen q38 = 3 q38 = 3
![Page 24: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/24.jpg)
IfIf h/h_income = 4 h/h_income = 4
and and city = 6 and city = 6 and
car_owner = 3car_owner = 3
thenthen user = 3 user = 3
![Page 25: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/25.jpg)
say(feature,say(feature,[r1]).[r1]).
![Page 26: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/26.jpg)
The segment of respondents who are The segment of respondents who are product X users is characterized byproduct X users is characterized by
residence in Shanghai,residence in Shanghai,consumption of brand Y cigarettes,consumption of brand Y cigarettes,overseas travel in the past twelve months,overseas travel in the past twelve months,ownership of imported cars, andownership of imported cars, andhigh monthly household income.high monthly household income.
r1 say(feature, say(feature, [r1]).[r1]).
![Page 27: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/27.jpg)
say(general,say(general,[r1]).[r1]).
say(likely,[r1]).say(likely,[r1]).
say(reason,say(reason,[r1]).[r1]).
![Page 28: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/28.jpg)
Basically, the respondents who are Basically, the respondents who are product X users have product X users have
residence in Shanghai,residence in Shanghai,consumption of brand Y cigarettes,consumption of brand Y cigarettes,overseas travel in the past twelve months,overseas travel in the past twelve months,ownership of imported cars, andownership of imported cars, andhigh monthly household income.high monthly household income.
r1 say(general, say(general, [r1]).[r1]).
![Page 29: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/29.jpg)
The respondents who are product X users The respondents who are product X users because they have because they have
residence in Shanghai,residence in Shanghai,consumption of brand Y cigarettes,consumption of brand Y cigarettes,overseas travel in the past twelve months,overseas travel in the past twelve months,ownership of imported cars, andownership of imported cars, andhigh monthly household income.high monthly household income.
r1
say(reason, say(reason, [r1]).[r1]).
![Page 30: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/30.jpg)
It is likely that the people who have It is likely that the people who have
residence in Shanghai,residence in Shanghai,consumption of brand Y cigarettes,consumption of brand Y cigarettes,overseas travel in the past twelve months,overseas travel in the past twelve months,ownership of imported cars, andownership of imported cars, andhigh monthly household incomehigh monthly household income
are product X usersare product X users.
r1
say(likely, [r1]).say(likely, [r1]).
![Page 31: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/31.jpg)
Limitations and Future Limitations and Future ImprovementsImprovements
• Pre-defined syntactic Pre-defined syntactic category of code labels.category of code labels.
• Single sentence for each Single sentence for each rule.rule.
• Lack visualization.Lack visualization.
• Almost no text planning.Almost no text planning.
• English only.English only.
• Lack knowledge of Lack knowledge of explanation.explanation.
• Automatic recognition of Automatic recognition of the syntax.the syntax.
• Describe rule relationship Describe rule relationship in multiple coherent in multiple coherent sentences.sentences.
• Text + graphics or even Text + graphics or even multimedia generation.multimedia generation.
• Implement text planning.Implement text planning.
• Multilingual.Multilingual.
• Implement NL techniques Implement NL techniques for explanation.for explanation.
![Page 32: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/32.jpg)
Concluding RemarksConcluding Remarks
• NLP techniques are found useful in:NLP techniques are found useful in:– Verbatim text coding and Verbatim text coding and
– Data mining report generation.Data mining report generation.
• Group similar answer items.Group similar answer items.
• Write simple natural language text.Write simple natural language text.
• A pricey technology because few A pricey technology because few tools are available.tools are available.
![Page 33: Natural Language Processing Verbatim Text Coding and Data Mining Report Generation Josef S.W. Leung (j.leung@ieee.org) Ching-Long Yeh (chingyeh@cse.ttit.edu.tw)](https://reader035.vdocuments.site/reader035/viewer/2022062321/56649e155503460f94aff2ce/html5/thumbnails/33.jpg)
Natural Language Natural Language ProcessingProcessing
Josef Siu-Wai LeungJosef Siu-Wai Leung ([email protected])([email protected])
Ching-Long YehChing-Long Yeh ([email protected])([email protected])