141028 parlor slides

40
Image: © flickr/srqpix CC BY 2.0 GENDER/GENRE: GENDER DIFFERENCES IN PROFESSIONAL WRITING Brian N. Larson 29 October 2014 Current Research in Writing Studies

Upload: brian-larson

Post on 18-Jul-2016

14 views

Category:

Documents


1 download

DESCRIPTION

Partial report of results of empirical study "Gender/Genre: Gender differences in disciplinary language." Study used methods from statistics and natural language processing to examine lexical and quasi-syntactic features of writing in a professional genre.

TRANSCRIPT

Imag

e: ©

flic

kr/s

rqpi

x C

C B

Y 2.

0

GENDER/GENRE: GENDER DIFFERENCES IN PROFESSIONAL WRITING

Brian N. Larson 29 October 2014

Current Research in Writing Studies

www.Rhetoricked.com @Rhetoricked

Housekeeping

•  www.Rhetoricked.com (these slides + some additional)

•  Communicate with me: –  @Rhetoricked –  [email protected]

•  Research supported by: –  Graduate Research Partnership Program fellowship (U of M

CLA), 2012 –  James I. Brown Summer Research Fellowship, 2014

www.Rhetoricked.com @Rhetoricked

Gender, sex, and research constructs

•  When I talk about my own data, I’ll refer to – Gender F authors/writers – Gender M authors/writers

•  These categories may or may not correspond to other researchers’ –  {woman, female, feminine} –  {man, male, masculine}

•  That’s the subject of another talk (or for Q&A)

www.Rhetoricked.com @Rhetoricked

Many researchers have asked

•  Do men and women communicate differently?

•  Much work inspired by Robin Lakoff (1975) •  Scholarly and popular works by Deborah

Tannen (e.g. 1990[2001]) and others •  Much of this research in oral/face-to-face

communication

www.Rhetoricked.com @Rhetoricked

Writing: Process and product

•  In writing studies, we can (roughly) divide process and product – Do men and women produce writing using

different processes? –  Is the writing they produce distinguishable

based on author gender?

www.Rhetoricked.com @Rhetoricked

Previous studies: Process research

•  Focus on interpersonal communications in mixed-gender contexts – Lay, 1989 (Schuster); Rehling, 1996; Raign

& Sims, 1993; Ton & Klecun, 2004; Wolfe & Alexander, 2005; Brown & Burnett, 2006; Wolfe & Powell, 2006, 2009.

www.Rhetoricked.com @Rhetoricked

Previous studies: Product research

•  In technical and professional communication – Sterkel, 1988 (20 stylistic chars) – Smeltzer & Werbel, 1986 (16 stylistic and

evaluative measures) – Tebeaux, 1990 (quality of responses) – Allen, 1994 (markers of authoritativeness)

•  Manual methods, small samples

www.Rhetoricked.com @Rhetoricked

Enter computational methods

•  Natural language processing (NLP) •  Allows processing of large quantities of

text data •  Study that attracted my attention

– Koppel, Argamon & Shimoni, 2002 (machine-learning algorithms)

– Argamon et al., 2003 (statistical analysis) –  I’ll focus on Argamon et al. in this talk

www.Rhetoricked.com @Rhetoricked

Argamon et al. 2003

•  Used 500 published texts from BNC •  Mean 34,000 words (‘tokens’) per text •  Statistical analysis showed

correspondence to Biber’s (1995) “informational/involved” dimension

www.Rhetoricked.com @Rhetoricked

Gender in computer-mediated communication (CMC)

•  CMC popular for NLP studies –  Data are readily available –  Data are voluminous

•  Examples –  Herring & Paolillo, 2006 (blog posts, stat analysis) –  Yan & Yan, 2006 (blog posts, MLA analysis) –  Argamon et al., 2007 (blog posts, MLA analysis) –  Rao et al., 2010 (Twitter, MLA analysis) –  Burger et al., 2011 (Twitter, MLA analysis)

www.Rhetoricked.com @Rhetoricked

Rationale: Why is the question important?

•  Lend support to one or more theories of gender –  ‘Two cultures’ (Maltz & Borker, 1982) –  ‘Standpoint’ (Barker & Zifcak, 1999) –  ‘Performative’ (Butler 1993, 1999, 2004) – Others

•  Sorting out methodological problems, particularly use of gender as a variable

www.Rhetoricked.com @Rhetoricked

Study design goals

•  Research questions –  Did Gender F and Gender M writers in a disciplinary

genre in which they are being trained use lexical and quasi-syntactic stylistic features with relative frequencies that varied with their genders?

–  If so, did the differences appear in interpretable patterns?

•  Examine a corpus of texts –  All of the same genre –  Where we can be confident of single authorship –  Where author gender is self-identified

www.Rhetoricked.com @Rhetoricked

Data collection

•  Major writing project at end of first year of law school – Students address hypothetical problem

(writing in same ‘genre’) – Students not allowed to collaborate – Plagiarism difficult (but still possible)

•  Students self-identified gender* •  193 texts (mean word tokens = 3764) *This study IRB-approved (UMN Study #1202E10685)

www.Rhetoricked.com @Rhetoricked

Text genre: Memorandum regarding motion to dismiss

•  Written to hypothetical court •  Supporting or opposing a motion before

the court •  High-level organization is formulaic

www.Rhetoricked.com @Rhetoricked

r

•  t

www.Rhetoricked.com @Rhetoricked

Memorandum Sections

•  Caption** •  Introduction/summary* •  Facts •  Legal standard of review* •  Argument •  Conclusion •  Signature block**

* Not always present. **I did not analyze (content is highly formulaic)

www.Rhetoricked.com @Rhetoricked

Feature (“variable”) selection

•  For now, those of Argamon et al. 2003 •  Relative frequencies of

– 429 “function words” (Argamon used 405) – 45 parts of speech from the Penn

Treebank tagset (Argamon used 76 BNC POS tags)

– 100 common part-of-speech bigrams – 500 common POS trigrams

www.Rhetoricked.com @Rhetoricked

‘Part-of-speech’ tags? ‘Bigrams & trigrams’?

•  First, ‘tokenize’ each sentence (automated): –  ‘My aunt’s pen is on the table.’

www.Rhetoricked.com @Rhetoricked

POS tags

•  Purple words are function words

•  Tag the parts of speech (automated) •  Then calculate relative frequency of

function words and POS tags (automated)

www.Rhetoricked.com @Rhetoricked

POS bigrams and trigrams •  A bigram or trigram is a 2- or 3-token

‘window’ on the sentence. –  Automated calculation

www.Rhetoricked.com @Rhetoricked

Feature (“variable”) selection

•  First-person pronouns (total) –  Singular: I, me, my, mine, myself. –  Plural: We, us, our, ours, ourselves.

•  Second-person pronouns: You, your, yours, yourself. •  Third-person pronouns (total)

–  Singular (total) •  Feminine: She, her, hers, herself. •  Masculine: He, him, his, himself.

–  Plural: They, them, their, theirs, themselves. •  Contractions: Including all instances of n’t, ’ld, ’ve, etc. •  All relative frequencies calculated (automated)

www.Rhetoricked.com @Rhetoricked

Each student’s text is represented by variables

•  A series of numerical values expressing each feature (variable), i.e., the relative frequency of: –  Function words / total tokens –  POS tags / total tokens –  Bigrams / total bigrams* –  Trigrams / total trigrams* –  Pronouns –  Automated calculation

*Multiplied by a factor.

www.Rhetoricked.com @Rhetoricked

t

•  T

www.Rhetoricked.com @Rhetoricked

Example 1

•  Tokens of the function word-type “all” in paper 1007 account for less than 7/100 of 1% of all tokens in that paper.

www.Rhetoricked.com @Rhetoricked

Example 2

•  Bigrams made up of a plural common noun (NNS) followed by a coordinating conjunction (CC) accounted for 1/10 of 1% of bigrams in paper 1009.

www.Rhetoricked.com @Rhetoricked

Mean relative frequencies calculated

•  For each feature – Mean frequency (SD) for Gender F authors – Mean frequency (SD) for Gender M

authors – Statistical significance assessed with

Mann-Whitney U test (expressed as p-value)

•  A priori threshold for significance: 0.05

www.Rhetoricked.com @Rhetoricked

What Argamon et al. 2003 found: Men

•  Males used significantly more – Determiners, a, the, these – Determiner+noun bigrams: the books, a

dog, these Tories – Attributive-adjective+noun bigrams: great

leaders, old form – Prepositions: at, from, for, of, behind –  Its

www.Rhetoricked.com @Rhetoricked

What Argamon et al. 2003 found: Women

•  Females used significantly more – Pronouns (all)

•  1st person sing.: I, my, mine •  2nd person: you, yours •  3rd person: they, them, theirs

– Present tense verbs: walks, eradicates – Contractions – Negation with “not”

www.Rhetoricked.com @Rhetoricked

Informational/involved

•  Biber (1995) labeled this a dimension of register variation after doing cluster analyses on frequencies to identify co-varying features as “dimensions”

•  Consistent with popular conceptions and works such as Tannen (1990 [2001]) that characterize women as “affiliative” and men as “informative”

www.Rhetoricked.com @Rhetoricked

What I found: Nouns & determiners

•  Nouns – Some categories showed non-significant

Gender F preference (weakly contradicting Argamon)

•  Determiners and determiner+noun – Only significant: DET-NNP (proper noun) – But all showed non-significant Gender M

preference –  (Overall, weakly supporting Argamon)

www.Rhetoricked.com @Rhetoricked

What I found: Adjectives & prepositions

•  Attributive-adjective+noun – Non-significant Gender M preference

(weakly supporting Argamon) •  Prepositions

– Non-significant Gender M preference (weakly supporting Argamon)

www.Rhetoricked.com @Rhetoricked

What I found: Pronouns (i.e., a mess)

•  All pronouns: Non-significant Gender M preference (weakly contradicting Argamon)

•  1st p sing., 2nd p., 3rd p. overall, 3rd s. fem: Non-significant Gender F preference (weakly supporting Argamon)

•  3rd p. plural: Significant Gender M preference (contradicting Argamon)

•  Its: Non-significant Gender F preference (weakly contradicting Argamon)

www.Rhetoricked.com @Rhetoricked

What I found: Verbs, contractions, “not”

•  Present-tense verbs – Significant Gender M preference for 3rd p.

singular (contradicting Argamon) – Non-significant Gender M preference for the

rest (weakly contradicting Argamon) •  Contractions: Non-significant Gender F

preference (weakly supporting Argamon) •  Negation with “not”: (weakly supporting

Argamon)

www.Rhetoricked.com @Rhetoricked

The take-away?

•  Statistics: The non-significant differences should probably be regarded as non-significant –  In that case, M-informational/F-involved is not

confirmed in this study •  If the non-significant differences are real,

evidence for M-informational/F-involved is still mixed, especially in pronouns and present-tense verbs

www.Rhetoricked.com @Rhetoricked

Explaining the findings with relevance theory

•  Relevance theory (Sperber & Wilson 1995) recognizes the effects of habituation

•  If boys and girls are acculturated to writing in certain genres and certain topics in their youths . . .

•  . . . they may unconsciously habituate to certain (appropriate) word choices

•  . . . and may not be completely free to vary their word choices consciously later.

www.Rhetoricked.com @Rhetoricked

Situating the findings within gender & language theories

•  Findings weakly support or contradict – Two sociolinguistic cultures view (Maltz &

Borker 1982; Tannen 1990 [2001]) –  Intersectionality/performativity views (Barker &

Zifcak 1999; Butler; many others) •  Some gendered linguistic habits appeared

to resist retraining and conscious efforts to conform to register conventions . . .

•  . . . others were apparently overcome.

www.Rhetoricked.com @Rhetoricked

I’m left with more questions than answers . . .

•  But you are entitled to ask some questions now . . .

www.Rhetoricked.com @Rhetoricked

THANK YOU!

•  www.Rhetoricked.com (these slides + some additional)

•  Communicate with me: –  @Rhetoricked –  [email protected]

•  Research supported by: –  Graduate Research Partnership Program fellowship (U of M

CLA), 2012 –  James I. Brown Summer Research Fellowship, 2014

www.Rhetoricked.com @Rhetoricked

Works cited Allen, J. (1994). Women and authority in business/technical communication scholarship: An analysis of writing... Technical Communication Quarterly, 3(3), 271. Argamon, S., Koppel, M., Fine, J., & Shimoni, A. R. (2003). Gender, genre, and writing style in formal written texts. Text, 23(3), 321–346. Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2007). Mining the Blogosphere: Age, gender and the varieties of self-expression. First Monday, 12(9). Retrieved from http://firstmonday.org/issues/issue12_9/argamon/index.html Armstrong, C. L., & McAdams, M. J. (2009). Blogs of information: How gender cues and individual motivations influence perceptions of credibility. Journal of Computer-Mediated Communication, 14(3), 435–456. Barker, R. T., & Zifcak, L. (1999). Communication and gender in workplace 2000: creating a contextually-based integrated paradigm. Journal of Technical Writing & Communication, 29(4), 335. Biber, D. (1995). Dimensions of register variation  : a cross-linguistic comparison. Cambridge  ;;New York: Cambridge University Press. Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python (1st ed.). O’Reilly Media. Brown, S. M., & Burnett, R. E. (2006). Women hardly talk. Really! Communication practices of women in undergraduate engineering classes (pp. T3F1–T3F9). Presented at the 9th International Conference on Engineering Education, San Juan, Puerto Rico: International Network for Engineering Education & Research. Retrieved from http://ineer.org/Events/ICEE2006/papers/3219.pdf Burger, J., Henderson, J., Kim, G., & Zarrella, G. (2011). Discriminating gender on Twitter. Bedford, MA: MITRE Corporation. Retrieved from http://www.mitre.org/work/tech_papers/2011/11_0170/

Butler, J. (1993). Bodies that matter: on the discursive limits of“ sex.” New York: Routledge. Butler, J. (1999). Gender trouble. New York: Routledge. Butler, J. (2004). Undoing gender. New York: Routledge. Cunningham, H., Maynard, Diana, Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., … Peters, W. (2012, December 28). Developing Language Processing Components with GATE Version 7 (a User Guide). GATE: General Architecture for Text Engineering. Retrieved January 1, 2013, from http://gate.ac.uk/sale/tao/split.html Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Computational Biology, 9(2), e1002854. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1), 10–18. Herring, S. C., & Paolillo, J. C. (2006). Gender and genre variation in weblogs. Journal of Sociolinguistics, 10(4), 439–459. Koppel, M., Argamon, S., & Shimoni, A. R. (2002). Automatically categorizing written texts by author gender. Literary and Linguistic Computing, 17(4), 401 –412. Lakoff, R. T. (1975/2004). Language and Woman’s Place: Text and Commentaries. (M. Bucholtz, Ed.) (Revised and expanded ed.). New York: Oxford University Press.

www.Rhetoricked.com @Rhetoricked

Works cited Lay, M. M. (1989). Interpersonal conflict in collaborative writing: What we can learn from gender studies. Journal of Business and Technical Communication, 3(2), 5–28. Maltz, D. N., & Borker, R. (1982). A cultural approach to male-female miscommunication. In J. J. Gumperz (Ed.), Language and social identity (pp. 196–216). Cambridge U.K.: Cambridge University Press. Pakhomov, S. V., Hanson, P. L., Bjornsen, S. S., & Smith, S. A. (2008). Automatic classification of foot examination findings using clinical notes and machine learning. Journal of the American Medical Informatics Association, 15, 198–202. Raign, K. R., & Sims, B. R. (1993). Gender, persuasion techniques, and collaboration. Technical Communication Quarterly, 2(1), 89–104. Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010). Classifying latent user attributes in Twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37–44). Toronto, ON, Canada: ACM. Rehling, L. (1996). Writing together: Gender’s effect on collaboration. Journal of Technical Writing and Communication, 26(2), 163–176. Smeltzer, L. R., & Werbel, J. D. (1986). Gender differences in managerial communication: Fact or folk-linguistics? Journal of Business Communication, 23(2), 41–50. Sperber, D., & Wilson, D. (1995). Relevance: Communication and Cognition (2nd ed.). Wiley-Blackwell. Sterkel, K. S. (1988). The relationship between gender and writing style in business communications. Journal of Business Communication, 25(4), 17–38. Tannen, D. (2001). You Just Don’t Understand: Women and Men in Conversation. William Morrow Paperbacks. Tebeaux, E. (1990). Toward an understanding of gender differences in written business communications: A suggested perspective for future research. Journal of Business and Technical Communication, 4(1), 25–43.

Tong, A., & Klecun, E. (2004). Toward accommodating gender differences in multimedia communication. Professional Communication, IEEE Transactions on, 47(2), 118–129. Wolfe, J., & Alexander, K. P. (2005). The computer expert in mixed-gendered collaborative writing groups. Journal of Business and Technical Communication, 19(2), 135–170. Wolfe, J., & Powell, B. (2006). Gender and expressions of dissatisfaction: A study of complaining in mixed-gendered student work groups. Women & Language, 29(2), 13–20. Wolfe, J., & Powell, E. (2009). Biases in interpersonal communication: How engineering students perceive gender typical speech acts in teamwork. Journal of Engineering Education, 98(1), 5–16. Yan, X., & Yan, L. (2006). Gender classification of weblog authors. In AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs (pp. 228–230).