data geeks paris - cherchez la femme

15
WHAT’S IN A NAME? - IDENTITY & CULTURE - GENDER (LE SEXE) NamSor Applied Onomastics 1 2014-06-26

Upload: elian-carsenat

Post on 16-Apr-2017

1.022 views

Category:

Presentations & Public Speaking


3 download

TRANSCRIPT

Page 1: Data Geeks Paris - Cherchez la Femme

WHAT’S IN A NAME?

- IDENTITY & CULTURE

- GENDER (LE SEXE)

NamSor Applied Onomastics

1

2014-06-26

Page 2: Data Geeks Paris - Cherchez la Femme

Où sont les Femmes? 2

Male 86%

Female 14%

Paris DataGeek Gender Gap

Page 3: Data Geeks Paris - Cherchez la Femme

Où sont les Femmes? ex. le CINOCHE Mining 5M names to assess GENDER*

3

IMDB File THE CINEMATOGRAPHERS LIST

Name Origin Male Female Unknown

France 82% 16% 2%

Tunisia 77% 16% 8%

Morocco 80% 15% 5%

Algeria 86% 11% 3%

Ireland 89% 10% 1% *Using NamSor GendRE API v0.0.13

Page 4: Data Geeks Paris - Cherchez la Femme

What’s in a name? What’s a name? 4

Elena Rossini

@_Elena (Twitter)

Elian Carsenat

@ElianCarsenat (Twitter)

[email protected]

[email protected]

tioulpanov (Skype)

NamSor.com

+ Social Network (LinkedIn, Twitter, FB …) : more names

Onomastics = the science of proper names

Page 5: Data Geeks Paris - Cherchez la Femme

NamSor socio linguistics algorithm 5

FN LN

Mette Andersen

Lene Andersson

Eva Arndt-Riise

Heidi Astrup

Mie Augustesen

Margot Bærentzen

Louise Bager Nørgaard

Marie Bagger Rasmussen

Yutta Barding

Ulla Barding-Poulsen

FN LN

Xian Dongmei

Zheng Dongmei

Jin Dongxiang

Xu Dongxiang

Li Dongxiao

Qin Dongya

Li Dongying

Han Duan

Li Duihong

Jiang Fan

Training set : Athletes

Step 1 – Learn stereotypes bitao gong

biwang jiang

birgitta agerberth

birgitte l. eriksen

bitao gong

bitten thorengaard

biwang Jiang

birgitta agerberth

birgitte l. eriksen

bitten thorengaard

Data set : Actors

Step 2 – Classify

Page 6: Data Geeks Paris - Cherchez la Femme

Decrypting IDENTITY 6

Source: Commonwealth WWI Casualties

Page 7: Data Geeks Paris - Cherchez la Femme

Mining 3M Geo-Tweets to map FLOWS

7

Source Target Type Id Onoma Weight

United Kingdom France Directed 16 Great Britain 37

Spain France Directed 55 Spain 14

United States France Directed 75 Great Britain 12

Turkey France Directed 79 Turkey 11

Brazil France Directed 87 Portugal 10

United Kingdom France Directed 112 Ireland 9

Italy France Directed 152 Italy 7

Switzerland France Directed 226 France 5

Belgium France Directed 247 France 5

United Kingdom France Directed 258 France 5

Mexico France Directed 287 Spain 4

Ireland France Directed 317 Great Britain 4

United Kingdom France Directed 333 Italy 4

United States France Directed 375 France 4

Source: Twitter

Page 8: Data Geeks Paris - Cherchez la Femme

Isn’t predicting gender SIMPLE? 8

Can you tell: Andrea/Rossini vs. Andrea/Parker

O./Sokolova

Kjell/Bergqvist

声涛/周

נתניהו/בנימין

المرعبي/معين

Our target, globally for all countries/lang./cultures:

99% precision, 99% recall for both Male & Female

Page 9: Data Geeks Paris - Cherchez la Femme

We’re getting there, combining classic baby name statistics with our unique algorithm

9

100% of objectives reached for 10 countries

75% of objectives reached for 28 countries

Currently, each version brings

~30% improvement!

Page 10: Data Geeks Paris - Cherchez la Femme

Want to play?

10

Android Gadget RapidMiner Extension

Page 11: Data Geeks Paris - Cherchez la Femme

11

Improve your targeting, increase your open and

click rates by saying "Hello Sir", "Hello Madam"

without mistakes in your emailing

Page 12: Data Geeks Paris - Cherchez la Femme

Conclusions 12

We recognize names in any language, any place, any database; we can classify and we can sort

Onomastic class is no ‘hard fact’ like a place of birth, a nationality, etc. but it’s accurate and fine-grain

Our sociolinguistic approach surpasses the traditional geo-demographics or ‘dictionary’ approach used in the US/UK

Our unique capability to decrypt identity and gender in high growth / emerging countries (Russia, Africa, India, Indonesia…) can be put to work in a wide range of applications

Page 13: Data Geeks Paris - Cherchez la Femme

Elian Carsenat

http://fdimagnet.com/

http://namsor.com/

13

Juillet 2013, Ambassade de Lituanie à Paris

[email protected]

+33 6 52 77 99 07

Twitter @NamsSor_com

Page 14: Data Geeks Paris - Cherchez la Femme

APPENDICES

14

Page 15: Data Geeks Paris - Cherchez la Femme

NamSor sorts names : functions, use cases 15

2.Name Transliteration & Matching

3.Named Entity Extraction, Parsing

1.Name Ling. Classification

Multilingual Text Mining

Control Watch Lists

Social Networks Analytics

Geo demographics

Migration Studies

Gender Studies