genre and style ee517/cse574 reading group...

29
Genre Style Genre and Style EE517/CSE574 Reading Group Discussions Brian Hutchinson Jan 11, 2011 Genre and Style EE517/CSE574 Reading Group Discussions

Upload: others

Post on 24-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Genre and StyleEE517/CSE574 Reading Group Discussions

Brian HutchinsonJan 11, 2011

Genre and Style EE517/CSE574 Reading Group Discussions

Page 2: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

The Form is the Substance:Classification of Genres in Text

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan

Proc. Workshop on Human Language Technology andKnowledge Management, 2001

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 3: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

What is genre?

Here, genre describes how the “information is presented"FormattingStyle of language

Examples of genres:NewswirePHOENIX, Arizona (Reuters) - A troubled 22-year-old college dropout made his first court appearance...

Classified advertisementsSEASONED FIREWOOD $200/cord 2 cords for $340. 253-709-****

Television news talk show transcriptThanks so much Larry and good evening all of you and thanks so much for joining us tonight...

Topics can vary within a genre

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 4: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Why is genre useful?

Genre can be used to filter documents (e.g. for web search)Example: searching for houses for sale

Only 2% of docs returned using keyword “house" are relevantFiltering by genre ads improves that to 43%Both of these find about 80% of the relevant docs in the corpusUsing “house" and (“sale" | “rent") not as good (19%)

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 5: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Document features

Extract two sets of features per doc. for genre classification:Word featuresPresentation features

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 6: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Word features

“Traditional" bag-of-words word featuresThe i th entry contains a weighted count of word type i

Feature selection using the information gain criterionWarning: their feature selection looks at test data

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 7: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Presentation features

89 new features to aid genre classification, includingTense features: transition counts between tensesFrequencies of keyword sets (e.g. days of the week)Mean, variance sentence length, complexity metricsPunctuation, emoticons, caseWhitespace, indentation, line-spacing

Features are normalized into range [0, 1]

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 8: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Classifiers

Three classifiers were compared for this task1 Naive Bayes

Choose most probable genre using Bayes ruleAssume independent features

2 C4.5 decision treeBuild, prune decision tree: leafs are class labels

3 Support vector machineMaximum margin hyperplane classifier, use kernel trick

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 9: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Experiments

Task: classify documents into one of seven genres

Use 10-fold cross validationSelect 323 word features using information gain

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 10: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Results

Presentation features are usefulIn all cases, combined feature sets do bestNaive Bayes performs poorly with pres. feats. alone

Feature dependence, among other things

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 11: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Take Aways

It can be useful to detect the genre of a documentE.g. to improve retrieval performance

For genre detection, how information is presented is just asimportant as the lexical content itself

Nigel Dewdney, Carol VanEss-Dykema and Richard MacMillan Classification of Genres in Text

Page 12: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Extracting Social Meaning:Identifying Interactional Style in Spoken

Conversation

Dan Jurafsky, Rajesh Ranganath and Dan McFarland

Proc. NAACL/HLT, 2009

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 13: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

What is Interactional Style?

Interactional style describes how the speaker interactsIs the speaker being...

FriendlyFlirtatiousAwkwardFunnyAssertive

This paper focuses on friendly, flirtatious, and awkward styles

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 14: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Why is interactional style useful?

Useful for...High level analysis of conversations

E.g. detecting interactional problemsMaking more natural dialogue agents

Respond in an appropriate conversational style

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 15: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

The SpeedDate Corpus

This research uses the SpeedDate Corpus.Graduate students at a private American universityCollected in 20051,100 4-minute datesBoth parties wear microphones on shoulder sashPre- and post-date surveys collected

Including date perceptionsProfessionally transcribed

Turn-segmented, disfluencies marked

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 16: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Features

What are the cues for conversational style?

Four sets of features considered:1 Prosodic2 Lexical3 Dialogue act and adjacency pair4 Disfluency

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 17: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Prosodic features

Pitch features (F0):F0 (min|max|mean) (standard deviation)Pitch range

Amplitude features (root mean square):RMS (min|max|mean) (standard deviation)

Duration features:Average turn durationTotal speaking timeRate of speech (words per second)

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 18: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Lexical features

Lexical features are counts of word in relevant word classes:

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 19: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Dialog Act and Adjacency Pair features

Example collaborative completion:

FEMALE: The driving range.MALE: And the tennis court, too.

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 20: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Disfluency features

Disfluency features include...Total number of filled pauses (e.g. “uh", “um")Total number of disfluent restarts

E.g. “Uh, I - there’s a group of us that came in ..."

Number of turns that contain overlapping speech

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 21: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Classifier

For each interactional style a binary classifier is trainedLogistic regression (x = features, θ = parameters):

P(y |x ; θ) =1

1 + e−θT x

Parameters learned via

θ∗ = arg maxθ

∑i

log p(y i |x i ; θ) − α‖θ‖1

Regularization parameter α fit on tuning set`1 regularization encourages sparsity in the weights

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 22: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Experiments

Reference labels are obtained by1 Mean normalize the (human labeled) interactional style ratings2 Top 10% for style marked as positive, bottom 10% negative

Features are obtained by1 Extracting features using one conversational side2 Mean and variance normalize features3 Remove features with correlation greater than 0.7

Training/testing using 5-fold cross validation3/5 train, 1/5 tuning, 1/5 testRandomized these splits, repeated 25 times

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 23: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Results

Speaker contains features only from speaker’s conversational side.“+other" adds features from other speaker, too. Chance is 50%.

Men are easier to predictEasiest to detect friendliness

Now let’s take a look at which features are informative...

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 24: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Feature analysis: men

Friendly men use “you", do collaborative completions and laugh, butdon’t backchannel or use appreciations. They have shorter turnsand are quieter.

Flirty men ask questions, use “you", laugh and use more sexual andnegative emotion words. They don’t use backchannel or useappreciations. They speak quietly, with higher / more variable pitch.

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 25: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Feature analysis: women

Friendly women have more collaborative completions, repairquestions, laughter and appreciations. They use more words,particularly “I", are talkative but less likely to swear.

Flirty women speak faster, louder, and with higher and morevariable pitch. They use more words, swear more, and don’t ask asmany questions.

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 26: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Feature analysis: awkward men

Awkward men are more disfluent, speak less, overlap less,use fewer collaborative completions, and use fewer instancesof past-tense words and “you."

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 27: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

GenreStyle

Take Aways

Interactional style is an informative aspect of speech

There are various complementary cues to interactional styleE.g. prosodic, lexical, dialog act / adjacency pair, and disfluency

Even cues that are easy to automatically extract can yieldgood performance on interactional style detection

Dan Jurafsky, Rajesh Ranganath and Dan McFarland Identifying Interactional Style in Spoken Conversation

Page 28: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

Thank You

Discussion...

Genre and Style

Page 29: Genre and Style EE517/CSE574 Reading Group Discussionsssli.ee.washington.edu/courses/ee517/discussTalks/genre.pdfGenre and Style EE517/CSE574 Reading Group Discussions. Genre Style

Genre and Style