what is web link mining? ?

23
What is web link mining? ? Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Knowledge Studio (VKS) Information Studies

Upload: jules

Post on 25-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Virtual Knowledge Studio (VKS). Information Studies. What is web link mining? ?. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. 1. Definition and scope. Link analysis is: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What is web link mining?  ?

What is web link mining? ?

Mike ThelwallStatistical Cybermetrics Research GroupUniversity of Wolverhampton, UK

Virtual Knowledge Studio (VKS) Information Studies

Page 2: What is web link mining?  ?

1. Definition and scopeLink analysis is:

mapping and measuring hyperlink networks for collections of web pages or sites

a flexible toolkit of methods and software rather than a field or single technique

A new source of information about: relationships between people, organisations and

information - via the web the impact of information and ideas

Used in: media studies, information science, politics,

marketing, sociology

Page 3: What is web link mining?  ?

Link Analysis: MotivationIndividual hyperlinks reflect concrete creation reasons such as connections between web page contents or creatorsCounts of large numbers of hyperlinks may reflect wider underlying social processes Links may reflect phenomena that have previously been difficult to study; e.g., informal scholarly communication informal news discussions friendship patterns “amateur” politics

Page 4: What is web link mining?  ?

But link patterns vary by context…Commercial web sites tend not to link muchAcademic and government web sites link moreDisciplinary differences: e.g., History Web use is very low, Chemistry is very highIndividual projects/resources can have an enormous impact upon web sites E.g. Arts web sites are often for specific exhibitions

or for digital media projectsLinks often not frequent enough to reliably reveal underlying patterns

Page 5: What is web link mining?  ?

Link Type Definitions

Inlink – a hyperlink to a web page from anywhereSite inlink – a hyperlink to a web page from a different web siteOutlink – a hyperlink from a web page to any otherSite outlink – a hyperlink from a web page to a page in a different site

A

B

Page 6: What is web link mining?  ?

Indirect link types - colinksUseful when direct links rare Indirect connectionCo-inlinks B and C co-inlinked

Co-outlinks D and E co-outlinked

B C

A

D E

F

Lennart Björneborn’s terminology

Page 7: What is web link mining?  ?

What to count?Links between individual pagesLinks between entire web sites Site A links to site B if any page in site

A links to any page in site B

A B

Page 8: What is web link mining?  ?

2. Link Networks – Methods

Draw a network diagram LexiURL Searcher, Issue Crawler, SocSciBot

(web networks) Pajek, UCINET, NetMiner (generic networks) About 10-50 sites/pages is recommended Diagrams should reveal patterns in the data

Social Network Analysis statistics E.g., density, degree centrality

Page 9: What is web link mining?  ?

Direct link networksStart with list of web sites (or pages)Build from many linkdomain:A site:B Yahoo searches Powerful and free way to scan the entire web

for links! Returns pages in web site B that link to web

site A Can be automated with LexiURL Searcher Or use SocSciBot to crawl web sites and get

linkse.g., linkdomain:ox.ac.uk site:pku.edu.cn

Page 10: What is web link mining?  ?

Top ASEAN universities network

Direct linksexample

(withHan WooPark)

arrowsrepresent> 100 links

unconnecteduniversitiesremoved

Page 11: What is web link mining?  ?

Co-inlink networksStart with a list of web sites or pagesBuild from many linkdomain:A linkdomain:B -site:A -site:B Yahoo searches

can be automated in LexiURL SearcherSuitable for commercial or competitive web sites that do not interlink

normally better than direct link diagramsA web environment (co-inlink) network for a single web site

finds web sites that link to it picks the top 50 web sites liked to by these web sites draws a co-inlink diagram of these web sites

Page 12: What is web link mining?  ?

Indirect linksexample

The web environment ofZigZagMag

Page 13: What is web link mining?  ?

Another example –no patternsbut interesting

Page 14: What is web link mining?  ?

3. Link Impact - MethodsInlink counts often used as an impact/visibility indicator Impact = “The effect or impression of

one thing on another”, “to have an effect” *

Compare links to web sites to assess which site/organisation has the most online impact

* http://www.thefreedictionary.com/impact, definition 3

Page 15: What is web link mining?  ?

Link Impact ReportsStandardised comparative analysis of the link impact of web sitesExample audit:http://cybermetrics.wlv.ac.uk/audit/101/Similar reports can be created for non-link impact (citation impact)http://cybermetrics.wlv.ac.uk/audit/books/

Page 16: What is web link mining?  ?

Total impact example

Page 17: What is web link mining?  ?

impact spreadexample

Page 18: What is web link mining?  ?

4. ToolsE.g., …

Page 19: What is web link mining?  ?

Links to UK universities

against their research

productivityThe reason for the strong correlation is the quantity of Web publication, not its quality

5. Statistical analyses…

Page 20: What is web link mining?  ?

More statistical analyses…

Universities tend to link to neighbours

Page 21: What is web link mining?  ?

6. Content analysis

Content analysis of random sample of links recommended to get contextExample of usefulness of content analysis results: 90% of links between UK university sites relate to

scholarly activity But less than 1% are equivalent to citations

Link counts do not measure research but are a natural by-product of scholarly activity Use link counts to track (an aspect of) communication

Page 22: What is web link mining?  ?

7. SummaryLink networks To investigate relationship patterns within

collections of web sitesLink impact Compare impact of web sites using inlinks

Methods Toolkit of visual and statistical methods Specialist software like LexiURL Searcher &

Issue CrawlerUse to investigate web phenomena or offline phenomena reflected online in web sites

Page 23: What is web link mining?  ?

BooksThelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool.Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press.Thelwall, M. (2004). Link analysis: An information science approach. San Diego: Academic Press. http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk http://www.issuecrawler.net