multilinguals and wikipedia editing

25
Multilinguals and Wikipedia Editing Scott A. Hale Oxford Internet Institute http://www.scotthale.net/pubs/?websci2014 25 June 2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Upload: scott-a-hale

Post on 26-Jun-2015

445 views

Category:

Data & Analytics


0 download

DESCRIPTION

http://www.scotthale.net/pubs/?websci2014 This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present.

TRANSCRIPT

Page 1: Multilinguals and Wikipedia Editing

Multilinguals and Wikipedia Editing

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/pubs/?websci2014

25 June 2014

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 2: Multilinguals and Wikipedia Editing

Background, Motivations

Wikipedia is global platform covering hundreds of languagesdespite evidence of balkanization (Taneja & Wu, in press)

Past studies generally concentrate on one edition (usually English)

Important variations across languages

Content is diverse across languages (Hecht & Gergle, 2010)

Each edition of Wikipedia shows a self-focus bias with more articlesabout regions where the language is spoken (Hecht & Gergle, 2009)

Multilingual users may act as unconscious translators bridging languagedivides (Herring et al., 2007; Eleta & Golbeck, 2012)

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 3: Multilinguals and Wikipedia Editing

Related work

Why edit Wikipedia in a foreign language?

Increased audience size (Crystal, 2003; Zuckerman, 2013)

In a Uzbekistan survey, Internet users reported accessing content inforeign languages even while simultaneously reporting poor foreignlanguage skills (Wei & Kolko, 2005)

Editors of many editions of Wikipedia come from a wide variety oftimezones suggesting that bilingual editors are present (Yasseri, Sumi,& Kertesz, 2012)

In a survey of editors, half of all editors reported editing in multiplelanguages and 72% reported reading more than one language edition ofWikipedia.†

†https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/

Location %26 Language&oldid=8409990

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 4: Multilinguals and Wikipedia Editing

Related work

Why edit Wikipedia in a foreign language?

Increased audience size (Crystal, 2003; Zuckerman, 2013)

In a Uzbekistan survey, Internet users reported accessing content inforeign languages even while simultaneously reporting poor foreignlanguage skills (Wei & Kolko, 2005)

Editors of many editions of Wikipedia come from a wide variety oftimezones suggesting that bilingual editors are present (Yasseri et al.,2012)

In a survey of editors, half of all editors reported editing in multiplelanguages and 72% reported reading more than one language edition ofWikipedia.†

†https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/

Location %26 Language&oldid=8409990

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 5: Multilinguals and Wikipedia Editing

Hypotheses

1 Most editors will edit only one language edition

2 Multilingual users will edit different articles than monolingual users

3 When a user edits an article in another language that same user willusually also edit the corresponding article in his native language

4 Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions

5 Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 6: Multilinguals and Wikipedia Editing

Data

All edits to any of the top 46 language editions (all editions with atleast 100,000 articles)

Recorded via the IRC stream(code at http://www.scotthale.net/pubs/?websci2014)

32 days (8 July to 9 August 2013)

Edit meta-datadatetimeeditionarticle title

usernamesize of editflags (minor, bot, etc.)

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 7: Multilinguals and Wikipedia Editing

Data cleaning

Non-minor edits by registered, human users to articles

Only edits to main (article) namespace

Removed articles flagged as being created by ‘bots’

Removed anonymous users

Removed undeclared bots and users with only one edit session in themonth

Require at least four edits and at least 2 edits to one edition

Matching users and articles across languages

Look for common usernames across language editions

Check usernames are indeed linked global accounts

WikiData dump to match articles across languages

55,568 users with a total of 3,518,955 edits (excluding the Simple Englishedition).

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 8: Multilinguals and Wikipedia Editing

Data summary

Language Edits Articles Users NPusers

NPedits

English 1,389,647 518,405 27,476 18% 3%German 256,495 125,647 5,967 18% 2%French 250,828 106,027 4,549 25% 3%Spanish 191,934 66,848 4,338 24% 3%Russian 239,267 92,326 3,961 16% 1%Japanese 106,848 56,406 3,551 11% 2%Italian 160,191 69,534 2,919 25% 2%Chinese 112,888 42,937 2,309 14% 1%Portuguese 67,505 32,753 1,730 29% 4%Dutch 80,535 39,463 1,500 33% 3%Polish 67,038 37,393 1,454 30% 3%

Top language editions: The Users column includes all users who edited the editionduring the data collection period. A percentage of these users (NP users) arenon-primary users who edited a different language edition more frequently.

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 9: Multilinguals and Wikipedia Editing

Multilinguals vs Monolinguals

15.4% of users (8,544) edited multiple language editions.

Figure: Density plot comparing the number of edits made by monolingual andmultilingual Wikipedia users.

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 10: Multilinguals and Wikipedia Editing

Hypotheses

X Most editors will edit only one language edition

2 Multilingual users will edit different articles than monolingual users

3 When a user edits an article in another language that same user willusually also edit the corresponding article in his native language

4 Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions

5 Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 11: Multilinguals and Wikipedia Editing

What do multilinguals edit?

Only 2.6% of edits arefrom users writing in theirnon-primary languages.44% of the articles editedby multilingual users intheir non-primarylanguages were not editedby any monolingual user

2D density plot of the number of multilingualusers editing articles in a non-primary languageagainst the number of monolingual users editingthe articles.

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 12: Multilinguals and Wikipedia Editing

What do multilinguals edit?

Histogram showing the distribution with which multilingual users edited articles inother languages that they also edited in their primary languages. The distribution isbimodal. A large number of users did not edit any of the same articles in theirprimary languages, but a large number of users always edited the same articles intheir primary languages.

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 13: Multilinguals and Wikipedia Editing

What do multilinguals edit?

Histogram showing the distribution with which multilingual users edited articles inother languages that they also edited in their primary languages after removingedits to articles that do not exist in users’ primary languages.

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 14: Multilinguals and Wikipedia Editing

Hypotheses

X Most editors will edit only one language edition

X Multilingual users will edit different articles than monolingual users

Ö When a user edits an article in another language that same user willusually also edit the corresponding article in his native language

4 Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions

5 Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 15: Multilinguals and Wikipedia Editing

Variations by language

Scatter plot of language size (number of unique users) and percentage of users whoare multilingual (edit more than one language edition). The three editions with lessthan 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray).

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 16: Multilinguals and Wikipedia Editing

Language crossings

ar

bg

ca

cs

dade

en

es

fa

fifr

he

hu

id

it

ja

ko

nl

nopl

pt

ro

ru

sv

truk

zh

Co-editing network graph

Nodes represent languageeditions

Directed, weighted edges showthe log of the number of usersprimarily editing one languageedition who edited anotheredition

Only edges with weights over1.96 standard deviations abovethe mean are shown

Colors indicate communitiesfound by the infomap communitydetection algorithm

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 17: Multilinguals and Wikipedia Editing

Language crossings (English removed)

ca

cs

de

es

fr

it

ja

nl

pl

pt

ru

sv

uk zh

Co-editing network graph

Nodes represent languageeditions

Directed, weighted edges showthe log of the number of usersprimarily editing one languageedition who edited anotheredition

Only edges with weights over1.96 standard deviations abovethe mean are shown

Colors indicate communitiesfound by the infomap communitydetection algorithm

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 18: Multilinguals and Wikipedia Editing

Hypotheses

X Most editors will edit only one language edition

X Multilingual users will edit different articles than monolingual users

Ö When a user edits an article in another language that same user willusually also edit the corresponding article in his native language

X Users writing primarily in smaller-sized language editions will be morelikely to cross-language boundaries than users writing primarily inlarger-sized language editions

X Larger-sized language editions, English chief among them, will be morelikely to have contributions from editors of different languages thansmaller-sized language editions

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 19: Multilinguals and Wikipedia Editing

Simple English

No big changes if Simple English edition is considered

Largest editor overlap with English edition

Dedicated group of editors:45% of editors editing Simple most frequently do not edit any otheredition (similar to Esperanto)

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 20: Multilinguals and Wikipedia Editing

Comparison with Twitter

Similar percentages of users multilingual (11% in Twitter)

Similar correlation between activity level and multilingualism

Language size not correlated with multilingualism on Twitter;some language consistencies (Japanese, English) and some variations

Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network.http://www.scotthale.net/pubs/?chi2014

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 21: Multilinguals and Wikipedia Editing

Implications and future directions

Implications

Multilingual users found in alleditions; correlation with activity

Design for multilingual users(universal language selector andglobal accounts already progressin this direction)

Important per languagevariations

Inverse correlation betweenmultilingual users and self-focusbias as measured by Hecht(2009)

Further work

Move from edit meta-data toedit content itself

What type of edits are usersmaking in non-primarylanguages?Variations by topic/theme?Correlations with link/imageoverlap?

Viewing vs. editing behavior(survey results show much higherpercentage of users read multipleeditions)

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 22: Multilinguals and Wikipedia Editing

Multilinguals and Wikipedia Editing

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/pubs/?websci2014

25 June 2014

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as

well as the anonymous reviewers who provided helpful comments on previous versions of

this research.

Page 23: Multilinguals and Wikipedia Editing

Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge:Cambridge University Press.

Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:How Multilingual Users of Twitter Connect Language Communities.Proceedings of the American Society for Information Science andTechnology, 49(1), 1–4. Available fromhttp://dx.doi.org/10.1002/meet.14504901327

Hale, S. A. (2014). Global Connectivity and Multilinguals in the TwitterNetwork. In Proceedings of the sigchi conference on human factors incomputing systems (pp. 833–842). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/2556288.2557203

Hecht, B., & Gergle, D. (2009). Measuring self-focus bias incommunity-maintained knowledge repositories. In Proceedings of thefourth international conference on communities and technologies (pp.11–20). New York, NY, USA: ACM. Available fromhttp://doi.acm.org/10.1145/1556460.1556463

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 24: Multilinguals and Wikipedia Editing

Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:User-generated content and its applications in a multilingual context.In Proceedings of the 28th international conference on human factorsin computing systems (pp. 291–300). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1753326.1753370

Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,Stoerger, S., et al. (2007). Language Networks on LiveJournal. InProceedings of the 40th annual hawaii international conference onsystem sciences. Washington, DC, USA: IEEE Computer Society.Available from http://dx.doi.org/10.1109/HICSS.2007.320

Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Languageand Internet diffusion patterns in Uzbekistan. New Review ofHypermedia and Multimedia, 11(2), 205–220.

Yasseri, T., Sumi, R., & Kertesz, J. (2012). Circadian Patterns of WikipediaEditorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091.Available fromhttp://dx.doi.org/10.1371%2Fjournal.pone.0030091

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

Page 25: Multilinguals and Wikipedia Editing

Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age ofConnection. London: W. W. Norton & Company.

Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing