semantic transforms using collaborative knowledge bases
DESCRIPTION
presented at WIN2012TRANSCRIPT
![Page 1: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/1.jpg)
Semantic Transforms Using Collaborative Knowledge Bases
Yegin Genc, Winter Mason, Jeffrey V. Nickerson
Stevens Institute of Technology
![Page 2: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/2.jpg)
Overview
• Automatically understand online information
• Using network artifacts, such as Wikipedia, to help
![Page 3: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/3.jpg)
Topic Models
Algorithms to understand and organize documents by uncovering semantic structure of a document collection
• Discover hidden themes – patterns of word use
• Connect documents that exhibit similar patterns
![Page 4: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/4.jpg)
Algorithms – 0.28Optimization – 0.28Algorithm – 0.14Computer – 0.14Techniques – 0.14….
Genetic – 0.18Natural – 0.18Evolution – 0.18Evolutionary – 0.09…
“In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.” 1
1http://en.wikipedia.org/wiki/Genetic_algorithm
Latent Dirichlet Allocation (LDA)
![Page 5: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/5.jpg)
Topics from LDA
Five topics from a 50-topic LDA model to fit Science from 1980 – 2002 (Blei and Lafferty, 2009)
computer chemistry cortex orbit infectionmethods synthesis stimulus dust immunenumber oxidation fig jupiter aids
two reaction vision line infectedprinciple product neuron system viraldesign organic recordings solar cells
methods k of the for the the operations thethe the objects of the o and the of
a of to a linear we of functional aof algorithm and to problem and to requires is
problems for the we problems a that and inTen randomly chosen topics from a 50-topic LDA model fit to abstracts from the Journal of the ACM (JACM) from the years 1987 to 2004 (Blei et al., 2010).
![Page 6: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/6.jpg)
The interpretation problem
1. Labeling the topics is difficult (J. Chang et al., 2009)
2. The relationships between topics are not identified
3. The information in the topics is based solely on the input corpus
4. The external validity of the topics may be limited
![Page 7: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/7.jpg)
Collaborative Knowledge Bases
1. Labeled topics 2. Connected to each other in a meaningful way3. Contain rich, focused information on
particular topics4. Contain fresh, up-to-date information about
practically everything
![Page 8: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/8.jpg)
Wikipedia Pages as Topics
orbitdust
jupiterline
systemsolargas
atmosphericmarsfield
Wikipedia Page
Solar System“The Solar System[a] consists of the Sun and the astronomical objects gravitationally bound in orbit around it, all of which formed from the collapse of a giant molecular cloud approximately 4.6 billion years ago…”
(http://en.wikipedia.org/wiki/Solar_System)
LDA topic
![Page 9: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/9.jpg)
Wikipedia Pages as TopicsTopics are characterized as distributions over observed words in Wikipedia pages
βk : Per-topic word distribution
Wikipedia Word Freq. orbit 34 0.12dust 7 0.02
jupiter 36 0.12line 0 0.00
system 76 0.26solar 110 0.38gas 11 0.04
atmospheric 1 0.00mars 8 0.03field 8 0.03
![Page 10: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/10.jpg)
*=
d
Z d,n
β (K x W)
Z d,n
d
n
W d,n
Wiki (W x K)
d
k
d
k
D: Documents K: TopicsW: Words
DOCUMENT – TOPICΘ (D x K)
DOCUMENT – W0RDW (D x W )
TOPIC - WORDLD
AW
IKI
![Page 11: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/11.jpg)
ExperimentData617 abstracts from Journal of the ACMClassified into 80 categories by their authors53 categories have corresponding Wikipedia Pages
Abstracts{Article Name: On the (Im)possibility of Obfuscating Programs, Category: D.4. Operating Systems Add. Category: F.1 Computation by Abstract Devices … }
Category Mappings Category Wikipedia Page
D.4 Operating Systems: Operating SystemF.1 Computation by Abstract Devices : Abstract Machine
![Page 12: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/12.jpg)
Three variations of our method
- Inbound links are Wikipedia pages that link to the topic page - Outbound links are Wikipedia pages linked to by the topic
page- Text-based method only uses word distributions in topic pages
![Page 13: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/13.jpg)
Results
Method Primary Primary or Additional
Text 182 (29.5%) 314 (50.8%)
Inbound links 131 (21.2%) 249 (40.0%)
Outbound links 79 (12.8%) 166 (26.9%)
The number (and percentage) of authors’ primary ACM topic labels, or authors’ primary + additional ACM topics successfully identified by each method.
LDA cannot be compared without an additional step mapping word distributions to ACM topics.
![Page 14: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/14.jpg)
Results (Qualitative)
![Page 15: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/15.jpg)
Concluding Remarks
The Wiki categories often match the categories that were chosen by the authors. When they don’t match, they generally appear plausible.
Among the variations of our method, the text based approach performed better than link based approaches.
Among the link based approaches, inbound links performed better than outbound links.
![Page 16: Semantic Transforms Using Collaborative Knowledge Bases](https://reader035.vdocuments.site/reader035/viewer/2022062616/5496edc9ac7959042e8b5202/html5/thumbnails/16.jpg)
Next Steps
Dependent topic structures
Combine heuristics with generative models: Wikipedia as a prior for the topic
distribution Learn from the documents observed.