the ethics of large-scale web data analysis (webmetrics) mike thelwall, statistical cybermetrics...
TRANSCRIPT
![Page 1: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/1.jpg)
The Ethics of Large-Scale Web Data Analysis (Webmetrics)
Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK
Rob Ackland, Australian Demographic and Social Research Institute, Australian National University
Virtual Knowledge Studio (VKS) Information Studies
![Page 2: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/2.jpg)
Contents
What is webmetrics?Context: Online access to personal informationResearchers’ use of personal informationConfidentiality and anonymityResource issues
What ethical considerations apply to collecting and analysing web data on a large scale from unaware web “publishers” ?
![Page 3: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/3.jpg)
1. What is webmetrics?
Large-scale analysis if web-based dataCollecting and quantitatively analysing online informationObjective is not to find information about individuals but identify trendsData gathered with VOSON, SocSciBot, Issue Crawler, LexiURL,…
![Page 4: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/4.jpg)
Example
VOSON Hyperlink
network ofpolitical partiesfrom 6 countries(Ackland andGibson, 2006).Node size prop.to outdegree.76 nodes.
![Page 5: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/5.jpg)
Normalised linking, smallest countries removed
Geopoliticalconnected
SwedenFinland
Norway
UK
Germany
Austria Switzerland
Poland
Italy
Belgium
Spain
France
NL
Example:Links betweenEU universities
AltaVista link searches
![Page 6: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/6.jpg)
Link associations between social network sites
![Page 7: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/7.jpg)
Example: Blog searching
![Page 8: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/8.jpg)
2. Context: Online access to personal information
Blogs, social network sites, personal web sites contain information that is: Private and protected (invisible to
researchers) Intentionally public Publicly private1 (intended for friends
but allowed to be public) Unintentionally public (public but
believed by owner to be private)
1. Lang (2007)
![Page 9: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/9.jpg)
Accessing “public” information
Commercial search enginesWeb crawlersInternet Archive (includes deleted info)
![Page 10: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/10.jpg)
Who is using Dataveillance?
Dataveillance1: Downloading or otherwise gathering data on internet users in order to influence their behaviourGoogle – can use email, searching, blogging, social network activities to target advertising (& may report to US government)Amazon – can use past activities to target adverts or improve web site
1. Zimmer (2008)
![Page 11: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/11.jpg)
3. Researchers’ use of personal information
Key issue: for large scale research, data from/about the unaware is used without their approval, and possibly for purposes that they might disagree withWhich ethical safeguards should be taken for this kind of research?
![Page 12: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/12.jpg)
Issue 1: People vs. Documents
Traditionally, documents can be researched without approval, but people can’t Even harsh criticism is fair practice
(e.g., book review/analysis)
Since web pages are documents, researching them without permission is normally OK
![Page 13: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/13.jpg)
Issue 2: Invasion of privacy? Natural vs. normative
A situation is naturally private1 if a reasonable person would expect privacyA situation is normatively private1 if a reasonable person would expect others to protect their privacyNon-secure web pages/data are typically naturally private Accessing is not normally invading privacy,
even if undesired by page owners and with negative consequences
1. Moor (2004)
![Page 14: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/14.jpg)
4. Confidentiality and anonymity
When should anonymity be granted to research “subjects” (page owners)? When a possibly undesired label attached
(e.g., hate group, terrorist) When undesired groups might benefit? (e.g.,
league table of hate groups) When publicly private individuals singled
out (e.g., detailed analysis of “average” blogger)
Should data be anonymised – as for Census data used for research?
![Page 15: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/15.jpg)
5. Resource issues
Accessing a web page uses the owner’s server time/bandwidthCrawling a web site can use a lot of the owner’s server time/bandwidth May incur charges or loss of service
quality
![Page 16: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/16.jpg)
Robots.txt protocol
This file lists pages/folders in a web site may not be crawledIt does not restrict crawling speedIt should be obeyed in researchMost individual users are probably unaware of this and so don’t use its protection
![Page 17: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/17.jpg)
Crawling speed
Web crawlers should not run too fast that they cause service issuesFull speed is probably OK on a UK university web site but not on a Burkina Faso library web siteUse judgement to decide how quickly to crawl – length of pauses in crawling
![Page 18: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/18.jpg)
How many pages to crawl?
Crawling too many pages puts unnecessary strain on the server crawledUse judgement to decide the minimum number of pages/crawl depth that is enoughUse search engine queries as a substitute, if possible
![Page 19: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/19.jpg)
Automatic search engine searches
Research can piggyback off the crawling of commercial search enginesNo resource implications for site ownersUses search engine “Applications Programming Interfaces”Search engines specify the maximum number of searches per dayResults limited to the imperfect web crawling/coverage of search engine crawlers
![Page 20: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/20.jpg)
Summary
Researchers need to be aware of potential issues when doing large scale data analysis researchJudgement is called for in all issuesResearch does not normally need participant permissionBe sensitive to impact of findings and any need for anonymity
![Page 21: The Ethics of Large-Scale Web Data Analysis (Webmetrics) Mike Thelwall, Statistical Cybermetrics Research Group, University of Wolverhampton, UK Rob Ackland,](https://reader035.vdocuments.site/reader035/viewer/2022062314/56649ce15503460f949ab0e1/html5/thumbnails/21.jpg)
References
Lange, P. G. (2007). Publicly private and privately public: Social networking on YouTube. Journal of Computer-Mediated Communication, 13(1), Retrieved May 8, 2008 from: http://jcmc.indiana.edu/vol2013/issue2001/lange.htmlZimmer, M. (2008). The gaze of the perfect search engine: Google as an infrastructure of dataveillance. In A. Spink & M. Zimmer (Eds.), Web search: Multidisciplinary perspectives (pp. 77-99). Berlin: Springer.Moor, J. H. (2004). Towards a theory of privacy for the information age. In R. A. Spinello & H. T. Tavani (Eds.), Readings in CyberEthics (2nd ed., pp. 407-417). Sudbury, MA: Jones and Bartlett.