what is web link mining? ?
DESCRIPTION
Virtual Knowledge Studio (VKS). Information Studies. What is web link mining? ?. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. 1. Definition and scope. Link analysis is: - PowerPoint PPT PresentationTRANSCRIPT
What is web link mining? ?
Mike ThelwallStatistical Cybermetrics Research GroupUniversity of Wolverhampton, UK
Virtual Knowledge Studio (VKS) Information Studies
1. Definition and scopeLink analysis is:
mapping and measuring hyperlink networks for collections of web pages or sites
a flexible toolkit of methods and software rather than a field or single technique
A new source of information about: relationships between people, organisations and
information - via the web the impact of information and ideas
Used in: media studies, information science, politics,
marketing, sociology
Link Analysis: MotivationIndividual hyperlinks reflect concrete creation reasons such as connections between web page contents or creatorsCounts of large numbers of hyperlinks may reflect wider underlying social processes Links may reflect phenomena that have previously been difficult to study; e.g., informal scholarly communication informal news discussions friendship patterns “amateur” politics
But link patterns vary by context…Commercial web sites tend not to link muchAcademic and government web sites link moreDisciplinary differences: e.g., History Web use is very low, Chemistry is very highIndividual projects/resources can have an enormous impact upon web sites E.g. Arts web sites are often for specific exhibitions
or for digital media projectsLinks often not frequent enough to reliably reveal underlying patterns
Link Type Definitions
Inlink – a hyperlink to a web page from anywhereSite inlink – a hyperlink to a web page from a different web siteOutlink – a hyperlink from a web page to any otherSite outlink – a hyperlink from a web page to a page in a different site
A
B
Indirect link types - colinksUseful when direct links rare Indirect connectionCo-inlinks B and C co-inlinked
Co-outlinks D and E co-outlinked
B C
A
D E
F
Lennart Björneborn’s terminology
What to count?Links between individual pagesLinks between entire web sites Site A links to site B if any page in site
A links to any page in site B
A B
2. Link Networks – Methods
Draw a network diagram LexiURL Searcher, Issue Crawler, SocSciBot
(web networks) Pajek, UCINET, NetMiner (generic networks) About 10-50 sites/pages is recommended Diagrams should reveal patterns in the data
Social Network Analysis statistics E.g., density, degree centrality
Direct link networksStart with list of web sites (or pages)Build from many linkdomain:A site:B Yahoo searches Powerful and free way to scan the entire web
for links! Returns pages in web site B that link to web
site A Can be automated with LexiURL Searcher Or use SocSciBot to crawl web sites and get
linkse.g., linkdomain:ox.ac.uk site:pku.edu.cn
Top ASEAN universities network
Direct linksexample
(withHan WooPark)
arrowsrepresent> 100 links
unconnecteduniversitiesremoved
Co-inlink networksStart with a list of web sites or pagesBuild from many linkdomain:A linkdomain:B -site:A -site:B Yahoo searches
can be automated in LexiURL SearcherSuitable for commercial or competitive web sites that do not interlink
normally better than direct link diagramsA web environment (co-inlink) network for a single web site
finds web sites that link to it picks the top 50 web sites liked to by these web sites draws a co-inlink diagram of these web sites
Indirect linksexample
The web environment ofZigZagMag
Another example –no patternsbut interesting
3. Link Impact - MethodsInlink counts often used as an impact/visibility indicator Impact = “The effect or impression of
one thing on another”, “to have an effect” *
Compare links to web sites to assess which site/organisation has the most online impact
* http://www.thefreedictionary.com/impact, definition 3
Link Impact ReportsStandardised comparative analysis of the link impact of web sitesExample audit:http://cybermetrics.wlv.ac.uk/audit/101/Similar reports can be created for non-link impact (citation impact)http://cybermetrics.wlv.ac.uk/audit/books/
Total impact example
impact spreadexample
4. ToolsE.g., …
Links to UK universities
against their research
productivityThe reason for the strong correlation is the quantity of Web publication, not its quality
5. Statistical analyses…
More statistical analyses…
Universities tend to link to neighbours
6. Content analysis
Content analysis of random sample of links recommended to get contextExample of usefulness of content analysis results: 90% of links between UK university sites relate to
scholarly activity But less than 1% are equivalent to citations
Link counts do not measure research but are a natural by-product of scholarly activity Use link counts to track (an aspect of) communication
7. SummaryLink networks To investigate relationship patterns within
collections of web sitesLink impact Compare impact of web sites using inlinks
Methods Toolkit of visual and statistical methods Specialist software like LexiURL Searcher &
Issue CrawlerUse to investigate web phenomena or offline phenomena reflected online in web sites
BooksThelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool.Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press.Thelwall, M. (2004). Link analysis: An information science approach. San Diego: Academic Press. http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk http://www.issuecrawler.net