vectors and semantics - the proteus project at...
TRANSCRIPT
Vectors and SemanticsPeter Turney
November 2008
Vectors and Semantics
Peter Turney
2
Vision of the Future
� future of SKDOU: from text to knowledge
� input: web
� output: knowledge
� beyond search: QA with unconstrained questions and answers
� 24/7 continuous automatic learning from the web
� what will that knowledge look like? � default assumption:
� a giant expert system
� but generated automatically, no hand-coding
� what will that knowledge look like? � my opinion:
� expert systems are missing something vital
� expert systems are not a sufficient representation of knowledge
� we need vectors
Vectors and Semantics
Vectors and SemanticsPeter Turney
November 2008
3
Outline
� symbolic versus spatial approaches to knowledge
� logic versus geometry
� term-document matrix
� latent semantic analysis; applications
� pair-pattern matrix
� latent relational analysis; applications
� episodic versus semantic
� some hypotheses about vectors and semantics
� conclusions
� how to acquire knowledge; how to represent knowledge
Vectors and Semantics
4
Outline
� symbolic versus spatial approaches to knowledge
� logic versus geometry
� term-document matrix
� latent semantic analysis; applications
� pair-pattern matrix
� latent relational analysis; applications
� episodic versus semantic
� some hypotheses about vectors and semantics
� conclusions
� how to acquire knowledge; how to represent knowledge
Vectors and Semantics
Vectors and SemanticsPeter Turney
November 2008
5
Symbolic AI
� symbolic approach to knowledge
� logic, propositional calculus, graph theory, set theory ...
� GOFAI: good old-fashioned AI
� benefits
� good for deduction, reasoning about entailment, consistency
� crisp, clean, binary-valued
� good for yes/no questions
� does A entail B?
� costs
� not so good for induction, learning, theories from data
� aliasing: noise due to analog to digital conversion
� not good for questions about similarity
� how similar is A to B?
Symbolic versus Spatial (1 of 3)
6
Spatial AI
� spatial approach to knowledge
� vector spaces, linear algebra, geometry, ...
� machine learning, statistics, feature space, information retrieval
� benefits
� good for induction, learning, theories from data
� fuzzy, analog, real-valued
� good for questions about similarity
� similarity(A,B) = cosine(A,B)
� costs
� not so good for deduction, entailment, consistency
� messy, lots of numbers
� not convenient for communication
� language is digital
Symbolic versus Spatial (2 of 3)
Vectors and SemanticsPeter Turney
November 2008
7
Symbolic vs Spatial
� need to combine symbolic and spatial approaches
� symbolic for communication and entailment
� spatial for similarity and learning
� reference
� Peter Gärdenfors. (2000).Conceptual Spaces: The Geometry of Thought.MIT Press.
Symbolic versus Spatial (3 of 3)
8
Outline
� symbolic versus spatial approaches to knowledge
� logic versus geometry
� term-document matrix
� latent semantic analysis; applications
� pair-pattern matrix
� latent relational analysis; applications
� episodic versus semantic
� some hypotheses about vectors and semantics
� conclusions
� how to acquire knowledge; how to represent knowledge
Vectors and Semantics
Vectors and SemanticsPeter Turney
November 2008
10
Technicalities
� weighting the elements
� give more weight when a term ti is surprisingly frequent in a
document dj
� tf-idf = term frequency times inverse document frequency
� hundreds of variations of tf-idf
� smoothing the matrix
� problem of sparsity, small corpus
� Singular Value Decomposition (SVD), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Nonnegative Matrix Factorization (NMF), ...
� comparing the vectors
� many ways to compare two vectors
� cosine, Jaccard, Euclidean, Dice, correlation, Hamming, ...
Term-Document Matrix (2 of 9)
Vectors and SemanticsPeter Turney
November 2008
11
Information Retrieval
� how similar is document d1 to document d
2?
� cosine of angle between d1 and d
2 column vectors in matrix
� how relevant is document d to query q?
� make a pseudo-document vector to represent q
� cosine of angle between d and q
� references
� Gerard Salton and Michael J. McGill. (1983). Introduction to Modern Information Retrieval. McGraw-Hill.
� Scott Deerwester, Susan T. Dumais, and Richard Harshman. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407.
Term-Document Matrix (3 of 9)
12
Word Similarity
� how similar is term t1 to term t
2?
� cosine of angle between t1 and t
2 row vectors in matrix
� evaluation on TOEFL multiple-choice synonym questions
� 92.5% highest score of any pure (non-hybrid) algorithm
� 64.5% for average human
� references
� Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211�240.
� Rapp, R. (2003). Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth Machine Translation Summit, pp. 315-322.
Term-Document Matrix (4 of 9)
Vectors and SemanticsPeter Turney
November 2008
13
Essay Grading
� grade student essays
� latent semantic analysis
� commercial product, Pearson's Knowledge Technologies
� references
� Rehder, B., Schreiner, M.E., Wolfe, M.B., Laham, D., Landauer, T.K., and Kintsch, W. (1998). Using latent semantic analysis to assess knowledge: Some technical considerations. Discourse Processes, 25, 337-354.
� Foltz, P.W., Laham, D., and Landauer, T.K. (1999). Automated essay scoring: Applications to educational technology. Proceedings of the ED-MEDIA �99 Conference, Association for the Advancement of Computing in Education, Charlottesville.
Term-Document Matrix (5 of 9)
14
Textual Cohesion
� measuring textual cohesion
� latent semantic analysis
� reference
� Foltz, P.W., Kintsch, W., and Landauer, T.K. (1998). The measurement of textual coherence with latent semantic analysis. Discourse Processes, 25, 285-307.
Term-Document Matrix (6 of 9)
Vectors and SemanticsPeter Turney
November 2008
15
Semantic Orientation
� measuring praise and criticism
� latent semantic analysis
� small set of positive and negative reference words
� good, nice, excellent, positive, fortunate, correct, and superior
� bad, nasty, poor, negative, unfortunate, wrong, and inferior
� semantic orientation of a word X is sum of similarities of X with positive reference words minus sum of similarities of X with negative reference words
� reference
� Turney, P.D., and Littman, M.L. (2003), Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21 (4), 315-346
Term-Document Matrix (7 of 9)
16
Logic
� logical operations can be performed by linear algebra
� t1 OR t
2 = the vector space spanned by t
1 and t
2
� t1 NOT t
2 = is the projection of t
1 onto the subspace that is
orthogonal to t2
� bass NOT fisherman = bass in the sense of a musical instrument, not bass in the sense of a fish
� reference
� Dominic Widdows. (2004).Geometry and Meaning.CSLI Publications.
Term-Document Matrix (8 of 9)
Vectors and SemanticsPeter Turney
November 2008
17
Summary
� applications for a term-document (word-chunk) matrix
� information retrieval
� measuring word similarity
� essay grading
� textual cohesion
� semantic orientation
� logic
Term-Document Matrix (9 of 9)
18
Outline
� symbolic versus spatial approaches to knowledge
� logic versus geometry
� term-document matrix
� latent semantic analysis; applications
� pair-pattern matrix
� latent relational analysis; applications
� episodic versus semantic
� some hypotheses about vectors and semantics
� conclusions
� how to acquire knowledge; how to represent knowledge
Vectors and Semantics
Vectors and SemanticsPeter Turney
November 2008
19
Pair-Pattern Matrix
� pair-pattern matrix
� rows correspond to pairs of words
� X:Y = mason:stone
� columns correspond to patterns
� �X works with Y�
� element corresponds to the frequency of the given pattern in a corpus, when the variables in the pattern are instantiated with the words in the given pair
� �mason works with stone�
� row vector gives the distribution of the patterns in which the given pair appears
� a signature of the semantic relation between mason and stone
Pair-Pattern Matrix (1 of 8)
20
Technicalities
� exactly the same as with term-document matrices
� weighting the elements
� smoothing the matrix
� comparing the vectors
� many lessons carry over from term-document matrices
� good weighting approaches
� good smoothing algorithms
� good formulas for comparing
Pair-Pattern Matrix (2 of 8)
Vectors and SemanticsPeter Turney
November 2008
21
SAT Analogies
Pair-Pattern Matrix (3 of 8)
� relational similarity of two pairs is cosine of two row vectors
� cosine(traffic:street, water:riverbed) = 0.692
� reference
� Turney, P.D., and Littman, M.L. (2005), Corpus-based learning of analogies and semantic relations, Machine Learning, 60 (1-3), 251-278.
0.692 water:riverbed (e)
0.497 pedestrians:feet (d)
0.687 car:garage (c)
0.572 crop:harvest (b)
0.318 ship:gangplank (a)Choices:
Cosine traffic:street Stem pair:
22
Semantic Relations
� classify noun-modifier expressions according to their semantic relations
� 600 noun-modifier expressions labeled with semantic relations
� 30 classes or 5 classes
� Causality: "cold virus", "onion tear"
� Temporality: "morning frost", "summer travel"
� Spatial: "aquatic mammal", "west coast", "home remedy"
� Participant: "dream analysis", "mail sorter", "blood donor"
� Quality: "copper coin", "rice paper", "picture book"
� supervised nearest neighbour algorithm using cosine of row vectors
� reference
� Turney, P.D. (2006), Similarity of semantic relations, Computational Linguistics, 32 (3), 379-416.
Pair-Pattern Matrix (4 of 8)
Vectors and SemanticsPeter Turney
November 2008
23
Synonyms vs Antonyms
� ESL synonym versus antonym questions
� language test for students of English as a Second Language
� 136 synonym versus antonym questions
� dissimilarity - resemblance syn or ant? (ant)
� naive - callow syn or ant? (syn)
� commend - denounce syn or ant? (ant)
� expose - camouflage syn or ant? (ant)
� galling - irksome syn or ant? (syn)
� two-class supervised learning using row vectors
� reference
� Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.
Pair-Pattern Matrix (5 of 8)
24
Similar vs Associated
� similar versus associated
� 3 x 48 = 144 word pairs
� 3 classes: similar, associated, both
� Similar: table-bed, music-art, flea-ant
� Associated: cradle-baby, mug-beer, mold-bread
� Both: ale-beer, uncle-aunt, ball-bat
� three-class supervised learning problem using row vectors
� reference
� Turney, P.D. (2008), A uniform approach to analogies, synonyms, antonyms, and associations, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, pp. 905-912.
Pair-Pattern Matrix (6 of 8)
Vectors and SemanticsPeter Turney
November 2008
25
Systematic Analogies
� analogical mapping between sets of terms
� mapping from solar system to atom
� reference
� submitted but not yet published
Pair-Pattern Matrix (7 of 8)
Source A Mapping M Target B
solar system � atom
sun � nucleus
planet � electron
mass � charge
attracts � attracts
revolves � revolves
gravity � electromagnetism
26
Summary
� applications for a pair-pattern matrix
� proportional analogies
� semantic relations
� synonyms versus antonyms
� similar versus associated
� systematic analogies
Pair-Pattern Matrix (8 of 8)
Vectors and SemanticsPeter Turney
November 2008
27
Outline
� symbolic versus spatial approaches to knowledge
� logic versus geometry
� term-document matrix
� latent semantic analysis; applications
� pair-pattern matrix
� latent relational analysis; applications
� episodic versus semantic
� some hypotheses about vectors and semantics
� conclusions
� how to acquire knowledge; how to represent knowledge
Vectors and Semantics
28
Episodic vs Semantic
� episodic memory is memory of a specific event in one's personal past
� I remember when I first went hang gliding
� I remember when I saw the Great Pyramid of Giza
� semantic memory is memory of basic facts and concepts, unrelated to any specific event in one's personal past
� I remember that the speed of light in a vacuum is approximately3 x 108 meters per second
� I remember that a tesseract is a four-dimensional hypercube composed of eight three-dimensional cubes
� distinction from cognitive psychology
� types of explicit or declarative memory, as opposed to implicit or procedural memory
Episodic vs Semantic (1 of 4)
Vectors and SemanticsPeter Turney
November 2008
29
Episodic vs Semantic
� ACE Local Relation Detection and Recognition (LRDR) task
� �George Bush traveled to France on Thursday for a summit.�
� there is a Physical.Located relation between George Bush and France
� extraction of episodic information from a sentence
� Noun-Modifier Classification task
� acquisition of semantic knowledge from a corpus
� Causality: "cold virus", "onion tear"
� Temporality: "morning frost", "summer travel"
� Spatial: "aquatic mammal", "west coast", "home remedy"
� Participant: "dream analysis", "mail sorter", "blood donor"
� Quality: "copper coin", "rice paper", "picture book"
Episodic vs Semantic (2 of 4)
30
Posterior vs Prior
� posterior probability versus prior probability
� R(X,Y) = X and Y have relation R
� S(X,Y) = X and Y occur in sentence S
� prior probability = prob(R(X,Y)) = semantic
� posterior probability = prob(R(X,Y) | S(X,Y)) = episodic
� ACE Local Relation Detection and Recognition (LRDR) task
� R(X,Y) = there is a Physical.Located relation between George Bush and France
� S(X,Y) = �George Bush traveled to France on Thursday for a summit.�
� Noun-Modifier Classification task
� R(X,Y) = there is a Spatial relation between aquatic and mammal
Episodic vs Semantic (3 of 4)
Vectors and SemanticsPeter Turney
November 2008
32
Outline
� symbolic versus spatial approaches to knowledge
� logic versus geometry
� term-document matrix
� latent semantic analysis; applications
� pair-pattern matrix
� latent relational analysis; applications
� episodic versus semantic
� some hypotheses about vectors and semantics
� conclusions
� how to acquire knowledge; how to represent knowledge
Vectors and Semantics
Vectors and SemanticsPeter Turney
November 2008
33
Knowledge Representation
� need spatial representation
� for measuring similarity
� for estimating probabilities
� need symbolic representation
� for reasoning about entailment
� for communication
� input text and output text
� language is symbolic
Conclusions (1 of 4)
34
Knowledge Acquisition
� spatial approach is able to acquire knowledge from text
� term-document matrix:
� information retrieval, measuring word similarity, essay grading, textual cohesion, semantic orientation, logic
� pair-pattern matrix:
� proportional analogies, semantic relations, synonyms versus antonyms, similar versus associated, systematic analogies
Conclusions (2 of 4)
Vectors and SemanticsPeter Turney
November 2008
35
Knowledge Use
� symbolic representation
� useful for input and output
� compact storage if aliasing is tolerable
� useful for logical reasoning, entailment
� spatial representation
� useful for calculating similarity
� useful for calculating probability
� case-based reasoning, analogical reasoning
� learning
Conclusions (3 of 4)
36
Conclusion
� Information Extraction has focused on episodic information
� IE: NER, MUC, ACE, etc.
� episodic: posterior
� representation is symbolic
� Vector Space Models have focused on semantic information
� VSM: IR, LSA, LRA, cosine, etc.
� semantic: prior
� representation is spatial
� need to combine the two
� IE can use prior information from VSM
� VSM can use posterior information from IE
Conclusions (4 of 4)