arcomem training topic analysis models beginners
DESCRIPTION
This presentation on Topic Analysis Models is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.TRANSCRIPT
![Page 1: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/1.jpg)
Topic Analysis in ARCOMEM
Yahoo Research Barcelona
![Page 2: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/2.jpg)
What is Probabilistic Topic Modelling?
Exploring and retrieving meaningful information from large collections of textual documents is a challenging task
Probabilistic topic models are a suite of algorithms (a framework) that aim to discover and annotate large archives of documents
with thematic information.
They do not require any prior annotations or labeling of the documents.
Topics emerge from the statistical analysis of the original texts
![Page 3: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/3.jpg)
Probabilistic Topic ModelTopic models are based upon the idea that documents are mixtures
of topics, where a topic is a probability distribution over a fixed vocabulary.
A topic model is a generative model for documents: it specifies a simple probabilistic procedure by which documents can be generated.
The idea is to study the co-occurrence of words, assuming that words that tend to co-occur frequently, express, or belong to, the
same semantic concept.
Example: A document (d) can be represented by the following mixture of topics
Biology PhysicsMathemati
cs
0,6 0,3 0,1In the topic “Biology” words such as “Dna, genetic, evolution” have high probability
![Page 4: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/4.jpg)
Intuition behind topic modelling
Documents exhibit multiple topics
Each topic is individually interpretable, providing a probability distribution over words that picks out a coherent cluster of correlated terms
Evolution BiologyGeneticsStatistical Analysis
![Page 5: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/5.jpg)
The challenge is to identify, for each campaign, significant and important topics that are relevant to the two user cases, broadcasting
and parliament libraries.
Topic analysis provides semantic useful categories which allow end-users to search and browse content archives.
![Page 6: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/6.jpg)
Try out on SARA: Trending topics
![Page 7: Arcomem training Topic Analysis Models beginners](https://reader036.vdocuments.site/reader036/viewer/2022062313/5582ee8ad8b42a32168b48c7/html5/thumbnails/7.jpg)
Try out on SARA: Statistical Topic Models