monitoring term drift based on semantic consistency in an evolving vector field
TRANSCRIPT
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]
“This project has received funding from the European Union’sSeventh Framework Programme for research, technologicaldevelopment and demonstration under grant agreement no601138”.
P. Wittek, S. Daranyi, E. Kontopoulos, T. Moysiadis, I. Kompatsiaris
Propose a field approach to lexical analysis
Use evolving fields for expressing time-dependent changes in a vector space model ◦ Random indexing
◦ Evolving self-organizing maps (ESOM)
2
Semantic continuity hypothesis◦ Actual & potential word content
◦ Observable locations & “lexical gaps”
Continuity modelled as evolving field◦ Actual & potential word content constantly
dislocated over time
Time stamped data◦ Measure dislocations
◦ “Semantic drift”, an indicator of language change
3
Semantic similarity
Semantic fields
Measuring semantic consistency
Semantic drifts
4
1. Evaluate semantic consistency within single time periods of an evolving data set
2. Can semantic drift be detected by analysing the change in semantic consistency?
5
6
Distributional Similarity & Random Indexing◦ TFIDF vector space model of the corpus
◦ Random indexing
Semantic fields in ESOMs◦ Embed vector space on a 2D surface using ESOMs
◦ Resulting network reflects local topology of the high-dimensional space
WN-based similarity metrics◦ Path-based, content-based, feature-based, hybrid
7
12.8M Amazon book reviews over 18 yrs *
Lucene, SemanticVectors, Somoclu
WordNet 3.0 & WS4J
Wu and Palmer’s semantic similarity method
All experiments are open source
* Stanford University’s SNAP project: http://snap.stanford.edu/index.html
8
Proximity → sem-sim?
Neurons >1 term
avg-sim between terms
5-term neurons
Normal distribution
N ≥ 3, 5, 10
9
Terms within a neuron demonstrated significantly greater similarity in comparison to a randomly selected group of terms
For each of the 3 periods, the process was repeated
Percentages slightly decreased from period to period
Decrease was not statistically significant – no divergence
More periods needed
10
Models of evolving semantic content◦ Dynamic vector field
◦ Semantic continuity in the vocabulary
◦ Experiment confirmed that similarity of terms within ESOM grid neurons was significantly higher
Future work◦ Increase number of periods & grid granularity
◦ Smoothen transition between periods
◦ Interpretations of vector field model
◦ Application in other domains
11
12