dmi12 workshops - crawling and scraping
DESCRIPTION
The workshop serves as an introduction to two classic digital methods techniques for issue mapping and analysis. A discussion of the Issue Crawler and the Lippmannian device is followed by a short exercise in which we'll study the presence of skeptics among top sources of information related to climate change.TRANSCRIPT
![Page 1: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/1.jpg)
Crawling and ScrapingThe Issuecrawler and the Lippmannian device.
Erik BorraMichael Stevenson
![Page 2: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/2.jpg)
“Reworking method for Internet research”
![Page 3: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/3.jpg)
Issuecrawler.
![Page 4: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/4.jpg)
Body text
Body Text
Site
A
B
C
CRAWL STARTING POINTS
![Page 5: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/5.jpg)
Body text
Body Text
Site
A
B
C
CRAWL STARTING POINTS
Site
A
B
C
D
CRAWL DEPTH ONEfollow all starting points' outlinks
![Page 6: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/6.jpg)
Body text
Body Text
Site
A
B
C
CRAWL STARTING POINTS
Site
A
B
C
D
CRAWL DEPTH ONEfollow all starting points' outlinks
Site
A
B
C
D
E
F
G
H
CRAWL DEPTH TWOfollow all outlinks from the pages found in the previous depth
![Page 7: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/7.jpg)
Body text
Body Text
ANALYSIS SNOWBALLretain all links and sites discovered during the crawl
Site
A
B
C
D
E
F
G
H
![Page 8: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/8.jpg)
Body text
Body Text
ANALYSIS INTER-ACTOR
retain only links between the starting points
Site
A
B
C
![Page 9: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/9.jpg)
Body text
Body Text
ANALYSIS CO-LINK
retain sites that receive links from at least two other sites
Site
B
D
![Page 10: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/10.jpg)
Issuecrawler. Modes of analysis
![Page 11: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/11.jpg)
Issuecrawler.Micro-politics of association
Pharmaceutical multinational and environmental NGO link to (inter)governmental organizations, but these do not link back.
Pharmaceutical multinational links to environmental NGO, but NGO does not link back.
(Govcom.org, 1999)
![Page 12: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/12.jpg)
Issuecrawler.Micro-politics of association
Clusters of Armenian and international organizations, latter do not link back.
(Audrey Selian, 2004)
![Page 13: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/13.jpg)
Issuecrawler.Macro-politics of association
Democratic Presidential Primary Web Campaigns (Betsy Sinclair 2007; 2008)
![Page 14: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/14.jpg)
Issuecrawler.Macro-politics of association
![Page 15: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/15.jpg)
Issuecrawler.Macro-politics of association
![Page 16: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/16.jpg)
Issuecrawler.Network composition over time
![Page 17: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/17.jpg)
Issuecrawler.Micro-politics of associationMacro-politics of association
Network composition over time
However... “Doesn’t do content analysis”
![Page 18: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/18.jpg)
Lippmannian device.Modes of analysis
![Page 19: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/19.jpg)
Walter Lippmann (1889-1974).“A Test of the News,” 1920
Public Opinion, 1922The Phantom Public, 1927
‘The problem is to locate by clear and coarse objective tests the actor in a controversy who is most worthy of public support.’ (p120)
-The Phantom Public
![Page 20: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/20.jpg)
Lippmannian device. Showing the partisanship of an actor.
Showing the issue agenda of an organization.
Source cloud Issue cloud
Partisanship or commitment. Which sources mention the expert’s name?
Issue agenda. Which issues are on the agenda of an organization or movement?
![Page 21: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/21.jpg)
Lippmannian device. “Source cloud”Showing the partisanship or
commitment of sources to one name
Craig Venter's presence in the Synthetic Biology issue space, March 2008. Top sources on "synthetic biology" according to a Google query, with number of mentions of Venter per source, ordered.
![Page 22: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/22.jpg)
Lippmannian device. “Source cloud”
Method for showing the partisanship or commitment of sources to names
1. Gather source list (e.g. through IssueCrawler)2. Query source list for one or more experts
![Page 23: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/23.jpg)
Lippmannian device. “Source cloud”Showing the partisanship or
commitment of sources to names
Climate Change Skeptics: Who recognizes them?
(Digital Methods Initiative, 2007)https://wiki.digitalmethods.net/Dmi/ClimateChangeSkeptics
![Page 24: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/24.jpg)
Lippmannian device. “Making an Issue cloud”
An organization’s issue agenda (or commitment)
Public Knowledge, a digital rights NGO, has issues. Which are they most committed to?
![Page 25: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/25.jpg)
![Page 26: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/26.jpg)
![Page 27: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/27.jpg)
![Page 28: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/28.jpg)
Lippmannian device. “Issue cloud”
Showing the issue commitments of the NGO, Public Knowledge
Public Knowledge's issue commitment. Lower six issues on Public Knowledge's issue list, ranked according to number of mentions of issues on publicknowledge.org, 2 October 2009.
![Page 29: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/29.jpg)
![Page 30: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/30.jpg)
Lippmannian device. “Making an Issue cloud”
Greenpeace issues, http://www.greenpeace.org/international/campaigns.
Stop climate changeProtect ancient forestsDefending our OceansSay no to genetic engineeringEliminate toxic chemicalsDemand Peace and DisarmamentEnd the nuclear ageEncourage sustainable trade
Keep most significant issue language.
"climate change""ancient forests"oceans"genetic engineering""toxic chemicals"disarmament"nuclear power""sustainable trade"
![Page 31: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/31.jpg)
![Page 32: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/32.jpg)
Lippmannian device. “Issue cloud”
Greenpeace’s issue agenda (distribution of commitment)
Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of mentions of issues on greenpeace.org, 11 October 2009.
![Page 33: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/33.jpg)
Lippmannian device. “Making an Issue cloud”
Multiple sources, multiple issues
What is the agenda of the global human rights network?
Which issues are at the top and
at the bottom of the agenda?
What is the current level of commitment to a particular issue?
![Page 34: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/34.jpg)
Lippmannian device. “Making an Issue cloud”
Multiple sources, multiple issues
This is more complicated, but still doable(Govcom.org, University of Pittsburg, UMass Amhearst, ongoing)
![Page 35: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/35.jpg)
Lippmannian device. “Making an Issue cloud”
Take three good lists of human rights organizations (global south, global north, UN’s)
![Page 36: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/36.jpg)
Lippmannian device. “Making an Issue cloud”
Make a list of all issues listed on all Websites
![Page 37: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/37.jpg)
![Page 38: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/38.jpg)
![Page 39: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/39.jpg)
Lippmannian device. “Issue cloud”
Showing the issue commitments of global human rights network
Global human rights issue agenda. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
![Page 40: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/40.jpg)
Lippmannian device. “Issue cloud”
Showing the issue commitments of global human rights network
Global human rights issue agenda, bottom. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
![Page 41: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/41.jpg)
Lippmannian device.
Partisanship check. Which side of the controversy is an actor on?
Use the source cloud
![Page 42: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/42.jpg)
Lippmannian device.
1. Check an organization’s issue agenda. What are its current commitments?
2. Check a national or global movement’s issue agenda. What are its current commitments?
Use the issue cloud
![Page 43: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/43.jpg)
Questions.
![Page 44: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/44.jpg)
Exercise: Sourcing Climate Change
Skeptics.
![Page 45: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/45.jpg)
Body text
Body Text
Climate Change Sceptics on the Web (Frederick Seitz)
Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web?Findings_There is distance between the skeptics and the top of the search engine returns.
Source_google.comQuery_“Frederick Seitz”Method_Search for query “Frederick Seitz” in top 100. Organized in order.Tools_Google Scraper and Tag Cloud GeneratorDate_30 July 2007
Product_of the Digital Methods Initiative, dmi.mediastudies.nl. Analysis_by Bram Nijhof, Richard Rogers and Laura van der Vlies. Design_Anne Helmond.
CC_BY:NC:SA
campaigncc.org (1)
climateark.org (4)marshall.org (8)
realclimate.org (35)sourcewatch.org (21)
abc.net.au (0)
acfonline.org.au (0)
bbc.co.uk (0) bom.gov.au (0)
cbc.ca (0)
ciel.org (0)
climatechallenge.gov.uk (0)
climatechange.ca.gov (0)
climatechange.com.au (0)
climatechangecentral.com (0)
climatechangecollege.org (0)
climatecrisis.net (0)
climatescience.gov (0)
dar.csiro.au (0)
davidsuzuki.org (0)
defra.gov.uk (0)
dfat.gov.au (0)
ec.gc.ca (0)
ecn.ac.uk (0)
ecokids.ca (0)
ecy.wa.gov (0)
eea.europa.eu (0)
eldis.org (0)
energy.gov (0)
envirolink.org (0)
epa.gov (0)
exploratorium.edu (0)
faqs.org (0)
foe.co.uk (0)
ft.com (0)
g8.gov.uk (0)
gcrio.org (0)
greenpeace.org (0)
grida.no (0)
guardian.co.uk (0)
iea.org (0)
iisd.org (0)
ipcc.ch (0)
iucn.org (0)
ltscotland.org.uk (0)
metoffice.gov.uk (0)
mfe.govt.nz (0)
mofa.go.jp (0)
nature.com (0) nature.org (0)
ncdc.noaa.gov (0)
open2.net (0)
panda.org (0)
pewclimate.org (0)
royalsoc.ac.uk (0)
scidev.net (0)
scienceagogo.com (0)
state.gov (0)
theglobeandmail.com (0)
ucar.edu (0)
un.org (0)
unep.org (0)
who.int (0)
whoi.edu (0)
worldwildlife.org (0)
CLIMATE CHANGESCEPTICS
![Page 46: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/46.jpg)
Research Question:Which climate change issue actors mention the skeptics, and what kinds of actors are more likely to mention them?
Method:Comparative Query: skeptics in three source sets (‘top’ sources, climate change blogs and climate change science network), outputting source cloud for each.
![Page 47: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/47.jpg)
Source Sets:
(1) Top ten Google returns for “climate change” (mix of media as well as governmental organizations)
![Page 48: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/48.jpg)
Source Sets:
(2) Climate change blogs network (IssueCrawler results - mix of blogs, social media, traditional media and governmental and non-governmental organizations)
![Page 49: Dmi12 workshops - crawling and scraping](https://reader034.vdocuments.site/reader034/viewer/2022052621/5580bb65d8b42ac6088b5001/html5/thumbnails/49.jpg)
Source Sets:
(3) Climate change science network (IssueCrawler results - governmental, non-governmental, educational and media organizations)