genre and style ee517/cse574 reading group...
TRANSCRIPT
GenreStyle
Genre and StyleEE517/CSE574 Reading Group Discussions
Brian HutchinsonJan 11, 2011
Genre and Style EE517/CSE574 Reading Group Discussions
GenreStyle
The Form is the Substance:Classification of Genres in Text
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan
Proc. Workshop on Human Language Technology andKnowledge Management, 2001
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
What is genre?
Here, genre describes how the “information is presented"FormattingStyle of language
Examples of genres:NewswirePHOENIX, Arizona (Reuters) - A troubled 22-year-old college dropout made his first court appearance...
Classified advertisementsSEASONED FIREWOOD $200/cord 2 cords for $340. 253-709-****
Television news talk show transcriptThanks so much Larry and good evening all of you and thanks so much for joining us tonight...
Topics can vary within a genre
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Why is genre useful?
Genre can be used to filter documents (e.g. for web search)Example: searching for houses for sale
Only 2% of docs returned using keyword “house" are relevantFiltering by genre ads improves that to 43%Both of these find about 80% of the relevant docs in the corpusUsing “house" and (“sale" | “rent") not as good (19%)
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Document features
Extract two sets of features per doc. for genre classification:Word featuresPresentation features
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Word features
“Traditional" bag-of-words word featuresThe i th entry contains a weighted count of word type i
Feature selection using the information gain criterionWarning: their feature selection looks at test data
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Presentation features
89 new features to aid genre classification, includingTense features: transition counts between tensesFrequencies of keyword sets (e.g. days of the week)Mean, variance sentence length, complexity metricsPunctuation, emoticons, caseWhitespace, indentation, line-spacing
Features are normalized into range [0, 1]
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Classifiers
Three classifiers were compared for this task1 Naive Bayes
Choose most probable genre using Bayes ruleAssume independent features
2 C4.5 decision treeBuild, prune decision tree: leafs are class labels
3 Support vector machineMaximum margin hyperplane classifier, use kernel trick
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Experiments
Task: classify documents into one of seven genres
Use 10-fold cross validationSelect 323 word features using information gain
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Results
Presentation features are usefulIn all cases, combined feature sets do bestNaive Bayes performs poorly with pres. feats. alone
Feature dependence, among other things
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Take Aways
It can be useful to detect the genre of a documentE.g. to improve retrieval performance
For genre detection, how information is presented is just asimportant as the lexical content itself
Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text
GenreStyle
Extracting Social Meaning:Identifying Interactional Style in Spoken
Conversation
Dan Jurafsky, Rajesh Ranganath and Dan McFarland
Proc. NAACL/HLT, 2009
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
What is Interactional Style?
Interactional style describes how the speaker interactsIs the speaker being...
FriendlyFlirtatiousAwkwardFunnyAssertive
This paper focuses on friendly, flirtatious, and awkward styles
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Why is interactional style useful?
Useful for...High level analysis of conversations
E.g. detecting interactional problemsMaking more natural dialogue agents
Respond in an appropriate conversational style
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
The SpeedDate Corpus
This research uses the SpeedDate Corpus.Graduate students at a private American universityCollected in 20051,100 4-minute datesBoth parties wear microphones on shoulder sashPre- and post-date surveys collected
Including date perceptionsProfessionally transcribed
Turn-segmented, disfluencies marked
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Features
What are the cues for conversational style?
Four sets of features considered:1 Prosodic2 Lexical3 Dialogue act and adjacency pair4 Disfluency
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Prosodic features
Pitch features (F0):F0 (min|max|mean) (standard deviation)Pitch range
Amplitude features (root mean square):RMS (min|max|mean) (standard deviation)
Duration features:Average turn durationTotal speaking timeRate of speech (words per second)
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Lexical features
Lexical features are counts of word in relevant word classes:
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Dialog Act and Adjacency Pair features
Example collaborative completion:
FEMALE: The driving range.MALE: And the tennis court, too.
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Disfluency features
Disfluency features include...Total number of filled pauses (e.g. “uh", “um")Total number of disfluent restarts
E.g. “Uh, I - there’s a group of us that came in ..."
Number of turns that contain overlapping speech
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Classifier
For each interactional style a binary classifier is trainedLogistic regression (x = features, θ = parameters):
P(y |x ; θ) =1
1 + e−θT x
Parameters learned via
θ∗ = arg maxθ
∑i
log p(y i |x i ; θ) − α‖θ‖1
Regularization parameter α fit on tuning set`1 regularization encourages sparsity in the weights
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Experiments
Reference labels are obtained by1 Mean normalize the (human labeled) interactional style ratings2 Top 10% for style marked as positive, bottom 10% negative
Features are obtained by1 Extracting features using one conversational side2 Mean and variance normalize features3 Remove features with correlation greater than 0.7
Training/testing using 5-fold cross validation3/5 train, 1/5 tuning, 1/5 testRandomized these splits, repeated 25 times
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Results
Speaker contains features only from speaker’s conversational side.“+other" adds features from other speaker, too. Chance is 50%.
Men are easier to predictEasiest to detect friendliness
Now let’s take a look at which features are informative...
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Feature analysis: men
Friendly men use “you", do collaborative completions and laugh, butdon’t backchannel or use appreciations. They have shorter turnsand are quieter.
Flirty men ask questions, use “you", laugh and use more sexual andnegative emotion words. They don’t use backchannel or useappreciations. They speak quietly, with higher / more variable pitch.
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Feature analysis: women
Friendly women have more collaborative completions, repairquestions, laughter and appreciations. They use more words,particularly “I", are talkative but less likely to swear.
Flirty women speak faster, louder, and with higher and morevariable pitch. They use more words, swear more, and don’t ask asmany questions.
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Feature analysis: awkward men
Awkward men are more disfluent, speak less, overlap less,use fewer collaborative completions, and use fewer instancesof past-tense words and “you."
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
GenreStyle
Take Aways
Interactional style is an informative aspect of speech
There are various complementary cues to interactional styleE.g. prosodic, lexical, dialog act / adjacency pair, and disfluency
Even cues that are easy to automatically extract can yieldgood performance on interactional style detection
Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation
Thank You
Discussion...
Genre and Style
Genre and Style