chapter 6. hidden markov and maximum entropy model
DESCRIPTION
Chapter 6. Hidden Markov and Maximum Entropy Model. Daniel Jurafsky and James H. Martin 2008. Introduction. Maximum Entropy ( MaxEnt ) More widely known as multinomial logistic regression Begin from non-sequential classifier A probabilistic classifier - PowerPoint PPT PresentationTRANSCRIPT
Presented by Jian-Shiun Tzeng 4/9/2009
Chapter 6. Hidden Markov and Maximum Entropy Model
Daniel Jurafsky and James H. Martin2008
2
Introduction
• Maximum Entropy (MaxEnt)– More widely known as multinomial logistic regression
• Begin from non-sequential classifier– A probabilistic classifier– Exponential or log-linear classifier– Text classification– Sentiment analysis
• Positive or negative opinion
– Sentence boundary
3
Linear Regression
4
Linear Regression
• x(j): a particular instance• y(j)
obs: observed label in the training set of x(j)
• y(j)pred: predict value from linear regression model
sum square error
5
Logistic Regression – simplest case of binary classification
• Consider whether x is in class (1, true) or not (0, false)
w f‧ (-∞,∞)∈
∈ [0,∞)
∈ (-∞,∞)
∈ [0,1]
6
Logistic Regression – simplest case of binary classification
7
Logistic Regression – Classification
8
Advanced: Learning in logistic regression
9
Maximum Entropy Modeling
• Input: x (a word need to tag or a doc need to classify)– Features
• Ends in –ing• Previous word is “the”
– Each feature fi, weight wi
– Particular class c– Z is a normalizing factor, used to make the prob. sum
to 1
10
Maximum Entropy Modeling
C = {c1, c2, …, cC}
Normalization
fi: A feature that only takes on the values 0 and 1 is also called an indicator function
In MaxEnt, instead of the notation fi, we will often use the notation fi(c,x), meaning that a feature i for a particular class c for a given observation x
11
Maximum Entropy ModelingAssume C = {NN, VB}
12
Learning Maximum Entropy Model
13
HMM vs. MEMMHMM MEMM
MEMM can condition on any useful feature of the input observation; in HMM this isn’t possible
word
class
14
Conditional Random Fields (CRFs)
• CRFs (Lafferty, McCallum, et al. 2001) constitute another conditional model based on maximal entropy
• Like MEMM, CRFs are able to accommodate many possibly correlated features of the observation
• However, CRFs are better able to trade off decisions at different sequence positions
• MEMM were found to suffer from the label bias problem
15
Label Bias
• The problem appears when the MEMM contains states with different output degrees
• Because the probabilities of transitions from any given state must sum to 1, transitions from lower degree states receive higher probabilities than transitions from higher degree states
• In the extreme case, transition from a state with degree 1 always gets probability 1, effectively ignoring the observation
• CRFs do not have this problem because they define a single ME-based distribution over the whole label sequence