![Page 1: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/1.jpg)
Spatio-temporal linkage of real and virtual identity
Muhammad Adnan (and Paul Longley)University College London
![Page 2: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/2.jpg)
Geodemographics
• “Analysis of people by where they live [places]”(Sleight, 1993:3)
• Social similarity, not locational proximity
HomeAddressPerson
Area
![Page 3: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/3.jpg)
![Page 4: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/4.jpg)
Identity of individuals in the real world
• Name (Forename & Surname)
• Surnames have geographic concentrations
• Prospects for linkage with socio-economic data
• E.g. Analysing the socio-economic circumstances of different ethnic groups
![Page 5: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/5.jpg)
An example – gbnames.publicprofiler.org
Longley Cheshire
![Page 6: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/6.jpg)
An example – Output Area Classification
Kingston upon Hull Hereford
![Page 7: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/7.jpg)
A socio-economic and ethnic classification
![Page 8: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/8.jpg)
A socio-economic and ethnic classification
![Page 9: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/9.jpg)
![Page 10: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/10.jpg)
Wu
![Page 11: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/11.jpg)
Source: Cheshire and Longley (2011)
![Page 12: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/12.jpg)
12
Courtesy: James Cheshire
![Page 13: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/13.jpg)
Wordle.net
![Page 14: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/14.jpg)
The European scale
16 countries.
400 million people.
5.95 million unique surnames
Courtesy: James Cheshire
![Page 15: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/15.jpg)
Onomap classification
Surnames
UK Electoral Roll
Forenames
Pablo Mateos
Garcia
Pérez
...Juan
Rosa
Marta
...
Sánchez
Rodríguez
...– Several iterations until self-contained cluster is exhausted– Cluster assigned a cultural, ethnic & linguistic Onomap type– Probability of ethnicity assigned to each name
Mateos et al (2007) CASA Working Paper 116
Forename-Surname clustering (based on Hanks and Tucker, 2000)
![Page 16: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/16.jpg)
WorldNames CEL clusters
Source: Mateos et al (2011)
![Page 17: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/17.jpg)
![Page 18: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/18.jpg)
![Page 19: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/19.jpg)
Uncertainty and virtual identity
• Identity increasingly shaped by online activities– => value may be leveraged from the fusion of physical
and virtual data sources• Data fusion and generalisation to relate physical
and virtual properties• Use of residence alongside activity patterns and
social network information
![Page 20: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/20.jpg)
Most of us have virtual identities
• Email address; social media accounts
• People use different procedures and providers to establish virtual identities
• Harvesting these data has interesting potential applications• Cyber crime• Cyber geodemographics (Facebook has already started
this)
![Page 21: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/21.jpg)
Most of us have virtual identities
• Facebook data mining engine• Analyses the words you use and tailors advertisement
accordingly
![Page 22: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/22.jpg)
Starting Point
http://worldnames.publicprofiler.org
• Worldnames holds data for approximately 1 billion population around 28 countries of the world
• Approximately 1.6 million unique users have visited the website since 2008
![Page 23: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/23.jpg)
Starting Point
http://worldnames.publicprofiler.org
• Worldnames has been archiving ‘Surname search’, ‘Email Address’, ‘Gender’, and ‘IP Address’ for searches over the past 6 months• c. 175,000 records: email validation• 150,000 usable ‘IP Address’ entries
![Page 24: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/24.jpg)
IP Address to Latitude/Longitude conversion
http://quova.com
An API to convert “IP addresses” to their corresponding latitude / longitude values
![Page 25: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/25.jpg)
IP Address to Latitude/Longitude conversion
http://quova.com
A search for an IP Address in UCL (128.40.214.196)
![Page 26: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/26.jpg)
Top CountriesWebsite was searched from 155 countries over the past
6 months
UNITED STATES
UNITED KIN
GDOM
CANADA
GERMANYITALY
AUSTRALIA
BRAZIL
FRANCE
ARGENTINA
SPAIN
NEW ZEALAND
NETHERLANDS
GREECE
SWITZERLAND
BELGIU
M
POLAND
AUSTRIA
MEXICO
IRELA
ND
SWEDEN0
10000
20000
30000
40000
50000
60000
70000
80000
90000
UNITED STATES 76708UNITED KINGDOM 21892CANADA 8154GERMANY 7158ITALY 4058AUSTRALIA 2978BRAZIL 2440FRANCE 2028ARGENTINA 1958SPAIN 1830NEW ZEALAND 1236NETHERLANDS 1074GREECE 1040SWITZERLAND 992BELGIUM 940POLAND 880AUSTRIA 874MEXICO 834IRELAND 710SWEDEN 630
![Page 27: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/27.jpg)
UK and Ireland
![Page 28: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/28.jpg)
Europe
![Page 29: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/29.jpg)
North America
![Page 30: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/30.jpg)
South America
![Page 31: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/31.jpg)
India, China, Japan, Singapore
![Page 32: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/32.jpg)
Popular Surname Searches
SMITH
JONES
JOHNSON
ANDERSON
WILLIA
MS
MILLER
MARTIN
WILSON
BROWN
MOORE
THOMAS
TAYLOR
CLARK
LEE
ROBERTS
DAVIS
CAMPBELL
LEWIS
HARRIS
MITCHELL0
100
200
300
400
500
600
700
800
SMITH 708JONES 306JOHNSON 258ANDERSON 224WILLIAMS 222MILLER 218MARTIN 202WILSON 194BROWN 194MOORE 188THOMAS 178TAYLOR 170CLARK 164LEE 160ROBERTS 156DAVIS 152CAMPBELL 144LEWIS 138HARRIS 138MITCHELL 136
![Page 33: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/33.jpg)
Popular Email Domains
GMAIL.COM
HOTMAIL.COM
YAHOO.COM
AOL.COM
COMCAST.NET
HOTMAIL.CO.U
K
MSN.COM
WEB.DE
YAHOO.CO.U
K
GMX.DE
SBCGLOBAL.N
ET
BTINTERNET.C
OM
HOTMAIL.IT
VERIZON.NET
GOOGLEMAIL.
COM
LIVE.C
OM
COX.NET
ATT.NET
MAILINATOR.C
OM
LIBERO.IT
0
5000
10000
15000
20000
25000
30000
35000
GMAIL.COM 31842HOTMAIL.COM 22098YAHOO.COM 15542AOL.COM 5550COMCAST.NET 2696HOTMAIL.CO.UK 1948MSN.COM 1624WEB.DE 1522YAHOO.CO.UK 1290GMX.DE 1260SBCGLOBAL.NET 1246BTINTERNET.COM 860HOTMAIL.IT 844VERIZON.NET 798GOOGLEMAIL.COM 742LIVE.COM 742COX.NET 708ATT.NET 632MAILINATOR.COM 616LIBERO.IT 616
![Page 34: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/34.jpg)
Popular Email Domains by Surnames
Smith (English)GMAIL.COMYAHOO.COMHOTMAIL.COMAOL.COMMAILINATOR.COM
Jones (Welsh)GMAIL.COMHOTMAIL.COMYAHOO.COMCOMCAST.NETGOOGLEMAIL.COM
Johnson (English)GMAIL.COMHOTMAIL.COMYAHOO.COMMSN.COMVERIZON.NET
Perez (Spanish) Gupta (Indian)GMAIL.COMHOTMAIL.COMYAHOO.COMGOOGLAMAIL.COMINDIATIMES.COM
Meyer (German)
GMAIL.COMHOTMAIL.COMYAHOO.ESCHARTER.NETGRANDECOM.NET
GMAIL.COMHOTMAIL.COMYAHOO.COMAOL.COMGMX.DE
![Page 35: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/35.jpg)
Popular Email Domains by Country
UK USA France
Germany Brazil JapanYAHOO.COMYAHOO.CO.JPGMAIL.COMHOTMAIL.COMMSN.COM
GMAIL.COMYAHOO.COMHOTMAIL.COMAOL.COMCOMCAST.NET
HOTMAIL.FRGMAIL.COMHOTMAIL.COMYAHOO.FRLAPOSTE.NET
GMAIL.COMHOTMAIL.COMHOTMAIL.CO.UKYAHOO.CO.UKYAHOO.COM
WEB.DEGMX.DET-ONLINE.DEYAHOO.DEGMAIL.COM
HOTMAIL.COMGMAIL.COMYAHOO.COM.BRIG.COM.BRBOL.COM.BR
![Page 36: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/36.jpg)
Top GoogleMail.com users
BINDERWATKINSWHITEWOODSROBINSONSLEEMANBENNETTRITCHIESHARPROLLINGS
Top Surnames
![Page 37: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/37.jpg)
GoogleMail.com users• Surname ‘Binder’
Germany Switzerland
![Page 38: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/38.jpg)
GoogleMail.com users• Surname ‘Binder’
Germany Switzerland
![Page 39: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/39.jpg)
GoogleMail.com users• Surname ‘Blackbourn’
New Zealand
![Page 40: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/40.jpg)
Who use their surnames as part of their email address• Approximately 40% of the users have their surname
as part of their email address• [email protected] (Surname: Harper)• [email protected] (Surname: Kempe)
• Top Countries
SOUTH AFRIC
A
SLOVENIA
UNITED KIN
GDOM
IRELA
NDIN
DIA
MALAYSIA
PORTUGAL
GERMANY
COSTA RIC
A
AUSTRIA
LUXEMBOURG
BELGIU
M
CANADA
NEW ZEALAND
AUSTRALIA
CHINA
TURKEY
CROATIA
SWITZERLAND
UNITED STATES
05
101520253035404550
![Page 41: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/41.jpg)
Who use long email addresses ? • Grand mean average email length of 8 characters
• Number of characters on the left side of ‘@’• United Kingdom, USA, Canada, and other European countries
• People from South American countries and India have long email addresses (Average length: 13 characters)
• South Indians have longer email address than North Indians
BRAZIL [email protected] (14 characters)CHILE [email protected] (25 characters)URUGUAY [email protected] (17 characters)INDIA [email protected] (18 characters)ARGENTINA [email protected] (13 characters)
![Page 42: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/42.jpg)
What else we can infer from email addresses• Internet service provider
• A.GOODEVE@AOL. COM• [email protected]• [email protected] (Person lives in a rural area of northeast Oregon)
• Country of origin• [email protected] • [email protected]
• Probable temporal aspects• [email protected] • [email protected]• [email protected]
![Page 43: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/43.jpg)
What else we can infer from email addresses• Probable forename of a person
• [email protected] • [email protected] • [email protected]
• How up to date someone is with technology• [email protected]• [email protected]
• Professional Affiliations• [email protected]
![Page 44: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/44.jpg)
What else we can infer from email addresses• Work Locations
• [email protected] • [email protected]• [email protected]
• Studying• [email protected]• [email protected]• [email protected]
![Page 45: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/45.jpg)
• There are some interesting patterns found in the study of email addresses• some problems (accuracy of geocoding techniques)
• Prospect of data linkage of data coded to unit postcode level• cluster analysis and data mining techniques
• Future work may involve the data mining of Facebook and Twitter data• issues of generalisation
• Visualisation of the data
Conclusion and future work
![Page 46: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/46.jpg)
Any Questions ?
Thanks for Listening
![Page 47: Spatio-temporal linkage of real and virtual identity](https://reader033.vdocuments.site/reader033/viewer/2022061118/545b0229af79594e128b5841/html5/thumbnails/47.jpg)
A research agenda
1 Acquire relevant real and virtual data sources and devise DBMS2 Devise GB-wide classification of NICT usage at neighbourhood scale3 Devise GB-wide classification of social network traffic4 Develop enhanced worldnames site to harvest real and virtual user data5 Undertake text analysis of worldnames user data and use to link
classifications (2) and (3)6 Devise, implement and analyse social networking application and
cybergeodemographic classification