introduction to distributional semantics
TRANSCRIPT
Introduction to Distributional Semantics
André Freitas Insight Centre for Data Analytics
Insight Workshop on Distributional Semantics
Galway, 2014
Based on the Great ESSLLI Tutorial from Evert & Lenci
Outline Contemporary Semantics Distributional Semantics Compositional-Distributional Semantics Take-away message
Contemporary Semantics
Shift in the Semantics Landscape
Corroboration
PraxisScientific / FormalPhilosophical
Semantics as a complex
phenomena
Semantics for a Complex World• Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.
Sahlgren, 2013
Formal World Real World
Baroni et al., 2012
What is Distributional Semantics?
Meaning
Word meaning is usually represented in terms of some formal, symbolic structure, either external or internal to the word
External structure- Associations between different concepts
Internal structure- Feature (property, attribute) lists
The semantic properties of a word are derived from the formal structure of its representation
- e.g. Inference algorithm, etc.
Semantics = Meaning representation model (data) + inference model
Formal Representation of Meaning Modelling fine-grained lexical inferences
Formal Representation of Meaning (Problems)
Different meanings- bat (animal), bat (artefact)
Meaning variation in context- clever politician, clever tycoon
Meaning evolution
Ambiguity, vagueness, inconsistency
Word meaning acquisition
Lack of flexibility
Scalability
Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend to be semantically similar”
He filled the wampimuk with the substance, passed it around and we all drunk some
We found a little, hairy wampimuk sleeping behind the tree
Weak and Strong DH (Lenci, 2008) Weak DH:
- Word meaning is reflected in linguistic distributions- By inspecting a sufficiently large number of
distributional contexts we may have a useful surrogate representation of meaning.
Strong DH:- A cognitive hypothesis about the form and origin of
semantic representations
Contextual Representation
Abstract structure that accumulates encounters with the words in various (linguistic) contexts.
For our purposes …- Context is equated with linguistic context
Distributional Semantic Models (DSMs)“The dog barked in the park. The owner of the dog put
him on theleash since he barked.”
Distributional Semantic Models (DSMs)“The dog barked in the park. The owner of the dog put
him on theleash since he barked.” contexts = nouns and verbs in the
same sentence
Distributional Semantic Models (DSMs)“The dog barked in the park. The owner of the dog put
him on theleash since he barked.”
bark
dog
park
leash
contexts = nouns and verbs in the same sentence
bark : 2park : 1leash : 1owner : 1
Distributional Semantic Models (DSMs)distributional matrix = targets x contexts
contexts
targets
Vector Space Model (VSM)
Semantic Similarity & Relatedness
θ
car
dog
cat
bark
run
leash
Semantic Similarity & Relatedness Semantic similarity - two words sharing a high number
of salient- features (attributes)- synonymy (car/automobile)- hyperonymy (car/vehicle)- co-hyponymy (car/van/truck)
Semantic relatedness (Budanitsky & Hirst 2006) - two words semantically associated without being necessarily similar
- function (car/drive)- meronymy (car/tyre)- location (car/road)- attribute (car/fast)
Distributional Semantic Models (DSMs) Computational models that build contextual semantic
representations from corpus data
Semantic context is represented by a vector
Vectors are obtained through the statistical analysis of the linguistic contexts of a word
Salience of contexts (cf. context weighting scheme)
Semantic similarity/relatedness as the core operation over the model
DSMs as Commonsense Reasoning
Commonsense is here
θ
car
dog
cat
bark
run
leash
DSMs as Commonsense Reasoning
DSMs as Commonsense Reasoning
θ
car
dog
cat
bark
run
leash
...
vs.
Semantic best-effort
Demonstration (EasyESA)
http://treo.deri.ie/easyesa/
Applications
Applications- Semantic search- Question answering- Approximate semantic inference- Word sense disambiguation- Paraphrase detection- Text entailment- Semantic anomaly detection...
Alternative Names for DSMs
Corpus-based semantics Statistical semantics Geometrical models of meaning Vector semantics Word (semantic) space models
Definition of DSMs
Building a DSM
Pre-process a corpus (target, context) Count the target-context co-occurrences Weight the contexts (optional) Build the distributional matrix Reduce the matrix dimensions (optional)
Parameters- Corpus- Context type- Weighting scheme- Similarity measure- Number of dimensions
A parameter configuration determines the DSM: (LSA, ESA, …)
Parameters
Corpus pre-processing- Stemming/lemmatization- POS tagging- Syntactic Dependencies
Context- Document- Paragraph- Passage- Word windows- Words- Linguistic features- Lingustic patterns
- Verbs : contexts nouns- Verbs : contexts adverbs - etc.
- Size- Shape
Context Engineering
Effect of Parameters
Context Weighting
Smoothing frequency differences: From raw counts to log-frequency.
Association measures (Evert 2005): are used to give more weight to contexts that are more significantly associated with a target word
Context WeightingMeasures
Kiela & Clark, 2014
Similarity Measures
Kiela & Clark, 2014
What is the best parameter configuration? The best parameter configuration depends on the
task.
Systematic exploration of the parameters
DSM Instances
Latent Semantic Analysis (Landauer & Dumais 1996) Hyperspace Analogue to Language (Lund & Burgess
1996) Infomap NLP (Widdows 2004) Random Indexing (Karlgren & Salhgren 2001) Dependency Vectors (Pad´o & Lapata 2007) Explicit Semanitc Analysis (Gabrilovich & Markovitch,
2008) Distributional Memory (Baroni & Lenci 2009)
CompositionalSemantics
Paraphrase Detection
I find it rather odd that people are already trying to tie the Commission's hands in relation to the proposal for a directive, while at the same calling on it to present a Green Paper on the current situation with regard to optional and supplementary health insurance schemes.
I find it a little strange to now obliging the Commission to a motion for a resolution and to ask him at the same time to draw up a Green Paper on the current state of voluntary insurance and supplementary sickness insurance.
=?
Compositional Semantics
Can we extend DS to account for the meaning of phrases and sentences?
Compositionality: The meaning of a complex expression is a function of the meaning of its constituent parts.
Compositional Semantics
Words in which the meaning is directly determined by their distributional behaviour (e.g., nouns).
Words that act as functions transforming the distributional profile of other words (e.g., verbs, adjectives, …).
dogs
ol
d
Compositional Semantics
Mixture
Function
Compositional Semantics
Take the syntactic structure to constitute the backbone guiding the assembly of the semantic representations of phrases.
(CHASE × cats) × dogs.
3rd order tensor vector
vector
(CHASE × cats)
Baroni et al., 2012
Formal Model
Distributional Semantics & Category Theory
Take-away message
Low acquisition effort
Simple way to build a commonsense KB
Semantic approximation as a built-in construct
Semantic best-effort
Simple to use
DSMs are evolving fast (compositional and formal grounding)
Distributional semantics brings a promising approach for building semantic models that work in the real world
Great Introductory References
Evert & Lenci ESSLLI Tutorial on Distributional Semantics, 2009. (many slides were taken or adapted from this great tutorial).
Turney & Pantel, From Frequency to Meaning:Vector Space Models of Semantics, 2010.
Baroni et al., Frege in Space: A Program for Compositional Distributional Semantics, 2012.
Kiela & Clark: A Systematic Study of Semantic Vector Space Model Parameters, 2014.