1
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Mapping the Blogosphere in AmericaCS406 Assignment – Group Presentation
Brian McGeeCraig MurrayPiers ThorogoodEmlyn Whittick
2
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Agenda• Summary of the paper• Paper’s key focuses
– Geolocation of blogs– Indexing blogs to city units
• Related Work– Geolocation in general– Alternative mapping of the blogosphere
• Conclusion• Questions
3
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Summary of the Paper
“Mapping the Blogosphere in America”
• Presented at the WWW2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics
• Dr. Alexander Halavais & Dr. Jia Lin• University of Buffalo School of Informatics, NY
4
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Summary of the Paper• Initial phase of a long-term project• Long- term goals:
– Examination of American urban culture– Based on information found in personal blogs– Observe localised political agenda and opinion
• Short- term goals:– Extracting geographic information from blogs– Indexing blogs to ‘city units’
5
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Geolocation of Blogs• No single method to calculate the location
of a blog...• Self- hosted blogs (dedicated domain
name):– Registrant’s address found in domain registry
• Hosted using blog- hosting service:– Location perhaps included in user-profile– Blog perhaps registered with regional blog-
hosting service e.g. ‘NYCblogger.com’
6
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Geolocation of Blogs
• What if there is no explicit location information?
• Answer: Data Mining...– Links to a CV or biography containing
location information– Location found from links to local
weather, school, church or other communities
7
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Geolocation of Blogs• Manual pilot run on 1500 US blogs
– 60% successful identification for self-hosted blogs
– 30% for blogs on blog-hosting sites
• Working on an automatic algorithm• Current approach...
1. GeoURL Metadata, if available2. Whois query for unrecognised domains3. Profile information, if available4. Blogchalking, if available 5. Text on index page (Bio / resume / regionalised links)
8
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Indexing Blogs to City Units• How do we standardise geolocation data?• Varying levels of detail...
– Self- hosted blog: Precise• Street address• 9- digit zip code
– Blog-hosting site: Can be vague• City, state, or even nation• Local links can provide telephone area codes
9
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Indexing Blogs to City Units• How to convert this to a standard unit?• Labelling of by city is vague
– Expansion of city limits– Emergence of ‘second cities’ between big cities
• Requirement for an urban unit– “Geographic clusters consisting of certain sizes
of population sharing physical proximity” [1]
10
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Indexing Blogs to City Units
• The 3- digit zip code– Widely used in marketing and political
strategies– Represents 4 different types of area:
• Metropolitan city• Cluster of suburban cities and towns• Cluster of cities not immediately adjacent to a metropolitan
area• Metropolitan cities plus embedded cities and towns
11
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Indexing Blogs to City Units• Preliminary examination of blog
distribution in the US– Users taken from Livejournal and Diaryland– Both services include location in user profile
• 797 different 3- digit zip codes found• Overall distribution consistent with
population distribution and concentrations of high socio-economic status
12
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Indexing Blogs to City Units
Figure 1. Distribution of blogs in sample [4]
13
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Limitations of the Paper• Authenticity of quoted geographic
information is questionable• 3- digit zip codes
– Overstate the number of bloggers in metropolitan cities
– Many small cities can be grouped into one unit, despite no evidence of common traits or social cohesion.
– Paper suggests dividing units by socio-economic profile
14
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Related Work• Geolocation in general • Non- Geographical Mapping of the
Blogosphere:– Hyperlink Maps– Kohonen Self- Organising Maps
15
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Geolocation• Geographic Information Systems (GIS)
– Geoparsing– Geocoding
• Methods – “Whois” records– Blogging sites requiring registration– Postal addresses and telephone numbers– Geographic feature names– Hyperlinks– Meta data
16
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Geolocation• Uses
– Information retrieval based on geographic criteria
– Tailoring of advertising– Sociological and political trends, mapping the
‘buzz’ of a topic can see which areas are most interested in it
• Problems– Increasing number of mobile devices
17
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Geolocation• Trends
– Blogging hotspots
– More widespread blogging in Eastern US
Figure 2. Blogging Hotspots [6]
18
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
• Trends– Analysing blogs by
geography shows where interest lies.
– You can see a correlation between blogs and restaurant locations
Geolocation
Figure 3. Steak n Shake Restaurants [6]
19
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Mapping Blogosphere
• Other methods of mapping the blogosphere:– Mapping hyperlinks– Self- organising maps– Mapping communities
20
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Mapping Hyperlinks
Figure 5. Outbound links [1] Figure 6. Inbound links [1]
• Cybermap showing outbound and inbound links from www.littlegreenfootballs.com in 3D hyperbolic space
21
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Self- Organising Maps• Neural Network like Kohonen SOM’s can
be used to map blogosphere• Advantages
– Performs clustering of input data
– Maps this onto 2D surface for easy visualisation
Figure 7. Kohonen Map of Blogs [7]
22
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Mapping Communities • Location,
friendships and communities are all interrelated
Figure 4. The importance of location interest and age in forming blogging
communities [5]
23
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
Conclusion• Summary of the Paper• Geolocation
– Methods– Uses– Trends
• Alternative Mapping Methods– Hyperlink Mapping– Self- Organising Maps
24
Mapping the Blogosphere in AmericaCS406 Assignment - Group Presentation
References[1] R. Ackland, “Mapping the U.S. Political Blogosphere: Are Conservative Bloggers
More Prominent?”, 2005.
[2] O. Buyukokkten et al., “Exploiting geographical location information of Webpages” In Proceedings of WebDB-99, the 1999 ACM SIGMOD Workshop on the Weband Databases, 1999.
[3] B. Gueye et al., “Contraint-Based Geolocation of Internet Hosts,” In Proceedings of IMC ’04, Sicily, 2004.
[4] A. Halavais and J. Lin, “Mapping the Blogosphere in America,” In Proceedings of the Thirteenth International World Wide Web Conference (WWW2004), New York, 2004.
[5] R.Kumar et al., “Structure and Evolution of Blogspace,” In Communications of the ACM, 47:12 pp.35- 39, 2004.
[6] L. Lloyd, P. Kaulgud, and S. Skiena, “Newspapers vs. blogs: Who gets the scoop?” In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs(AAAI- CAAW), California, 2006.
[7] J. Merelo-Guervos et al., “Mapping weblog communities,” Depto. Arquitectura y Technologia de Computadores, Universidad de Granada, 2006.