using search engines and web crawlers in social science research mike thelwall head, statistical...
Post on 21-Dec-2015
218 views
TRANSCRIPT
![Page 1: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/1.jpg)
Using Search Engines and Web Crawlers in Social
Science Research
Mike Thelwall
Head, Statistical Cybermetrics Research Group
University of Wolverhampton, UK
http://linkanalysis.wlv.ac.ukRC33 August 2004
![Page 2: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/2.jpg)
Link Analysis in Social Science Research Use to study web phenomena
E.g. NGO web site interlinking E.g. university web site interlinking
Use to study offline phenomena with web aspects E.g. scholarly communication E.g. the perception of news events
The web is a free, accessible massive data source for information about many aspects of life
![Page 3: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/3.jpg)
What use is hyperlink data to qualitative researchers?
Part of a mixed methodology Numbers to back up theories To obtain samples of types of Web pages for
qualitative analyses Background information on how the Web
is used
![Page 4: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/4.jpg)
Quick example 1:
UK universityinterlinkingwith geographicclusters indicated
![Page 5: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/5.jpg)
Quick example 2:
Asia-Pacific university interlinking.
{Research with Alastair Smith, VUW, NZ}
![Page 6: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/6.jpg)
Quick example 3:
Geographic interlinking trends for UK universities.
![Page 7: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/7.jpg)
Talk overview A social science approach for link analysis Data collection with commercial search
engines Data collection and analysis with
SocSciBot
![Page 8: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/8.jpg)
A social science approach for link analysis 1: Preliminary steps1. Formulate an appropriate research question,
taking into account existing knowledge of web structure
2. Conduct a pilot study3. Identify web pages or sites that are appropriate to
address a research question4. Collect link data from a commercial search
engine or a personal crawler taking appropriate safeguards to ensure that the
results obtained are accurate
![Page 9: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/9.jpg)
A social science approach for link analysis 2: Validation
5. Partially validate the link count results through correlation tests
6. Partially validate the interpretation of the results through a link classification exercise or web author interviews
![Page 10: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/10.jpg)
A social science approach for link analysis 3: Reporting8. Report results with an interpretation
consistent with link classification exercise include either a detailed description of the
classification or exemplars to illustrate the categories
9. Report the limitations of the study and parameters used in data collection and processing
![Page 11: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/11.jpg)
Link data from commercial search engines
Commercial search engines can give information about the existence of links in the web Can be used for data collection Advanced interfaces are usually needed, or
special commands
![Page 12: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/12.jpg)
Google Can find all links to a given web page with
the link: command E.g. link:http://www.siswo.uva.nl/rc33/
![Page 13: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/13.jpg)
Yahoo! site-specific searches Yahoo! allows searching for links between
pairs of web sites/web spaces E.g. linkdomain:db.dk +site:ac.uk returns
web pages in the ac.uk domain that link to the db.dk site
…ac.uk/… …db.dk/…
![Page 14: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/14.jpg)
SocSciBot Personal crawler for link research Available free at socscibot.wlv.ac.uk Crawls sets of web sites and analyses the
links between them, producing: Link lists Link counts Network diagrams
![Page 15: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/15.jpg)
![Page 16: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/16.jpg)
![Page 17: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/17.jpg)
![Page 18: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/18.jpg)
![Page 19: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/19.jpg)
![Page 20: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/20.jpg)
![Page 21: Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical Cybermetrics Research Group University of Wolverhampton,](https://reader030.vdocuments.site/reader030/viewer/2022032521/56649d585503460f94a38412/html5/thumbnails/21.jpg)
Reprise: Link Analysis in Social Science Research Use to study web phenomena
E.g. NGO web site interlinking E.g. university web site interlinking
Use to study offline phenomena with web aspects E.g. scholarly communication E.g. the perception of news events
The web is a free, accessible massive data source for information about many aspects of life
But don’t forget the need for validation!