wwsss intro2016-final
TRANSCRIPT
Steffen Staab Web Science 1Institute for Web Science and Technologies · University of Koblenz-Landau, GermanyWeb and Internet Science Group · ECS · University of Southampton, UK &
Introduction to Web Science
Steffen StaabUniversity of Southampton &Universität Koblenz-Landau
Steffen Staab Web Science 2
Welcome in Koblenz!
Steffen Staab Web Science 3
9.00 am 10.30 am 3.00 pm
Thu Introduction to Web Science
Noshir Contractor Tutorial Poster session
Fri Internet and Law Nikolaus Forgo Tutorial Project work
Sat Web and Politics Stéphane Bazan Tutorial Project work
Mon Social Machines Jim Hendler Tutorial Project work
Tue Web Entrepreneurship Simon Köhl Tutorial Project work
Wed Computational Social Science
Nuria Oliver Tutorial Project presentation
Program Overview
Details at http://wwsss16.webscience.org/schedule
Steffen Staab Web Science 4
• Work independently and self-organised• Get feedback from your advisor • Choose your working space:
– D 238, D 239, A 308, B 005 on weekdays– E 412, E 413, E 414, A 308, B 005 on Saturday– or anywhere you like
• Final presentation on Wednesday afternoon (10 minutes per team)
Project work
Details at http://wwsss16.webscience.org/schedule
Steffen Staab Web Science 5
Social EventsThu Cocktails at city beach 6.00 pm Meet at main campus entry
Fri SommerUni party 8.00 pm Campus
Sat Free time Old town, KaleidosKOp festival, fortress, beer gardens, ...
Sun Excursion (few seats available!) 10.45 am Meet at main campus entry
Quarter final France vs. Island 9.00 pm Campus
Mon Free time Old town, fortress, beer gardens, ...
Tue Barbecue 7 pm Campus
SommerUni party 9 pm Campus
Wed SommerUni barbecue 6 pm Campus
Steffen Staab Web Science 6
Thursday
Noshir Contractor:Leveraging Web/Internet/Network Sciences (WINS) to address Grand Societal Challenges
9.00 am
10.30 amSteffen Staab and Jérôme Kunegis: Introduction to Web Science
3.00 pmPoster session6.00 pm
Cocktails at city beachmeeting point: main campus entrance
Breaks:• 10 am, • 12 am, • 2.30 pm
Steffen Staab Web Science 7
Friday
Nikolaus Forgó:Privacy Law and the Web: A Story of Love and Hate?
9.00 am
10.30 amRüdiger Grimm: Internet and Law
3.00 pmSupervised project work8.00 pm
SommerUni partyAt the campus – impossible to miss!
Breaks:• 10 am, • 12 am, • 2.30 pm
Steffen Staab Web Science 8
Thanks to our sponsors!
Steffen Staab Web Science 9
Let‘s get started!
Steffen Staab Web Science 10
Produce
Consume
Cognition
EmotionBehavior
SocialisationKnowledge
Observable Micro-
interactions in the Web
AppsProtocols
Data & InformationGovernance
WWW
Observable Macro-
effects in the Web
Web Science
Steffen Staab Web Science 11
Tertium Datur or Where Dijkstra was wrong
Computer Science
Science about
Computers
Astronomy
Telescope Science
Web Science
Science about the
Web
What is Web Science?
Steffen Staab Web Science 12
Web science is an emerging interdisciplinary field concerned with the study of large-scale socio-technical systems, such as the World Wide Web. It considers the relationship between people and technology, the ways that society and technology co-constitute one another and the impact of this co-constitution on broader society.
Wikipedia, 2016-06-29
Definition of Web Science
Steffen Staab Web Science 13
Agenda• What is Web Science?• What is the Web?
– Aspects of the Web at Large• How to investigate the Web?
– Observing the Web– An example using the architecture: bias in the Web– How to model aspects of the Web
• What is the past and the future of the Web?
Steffen Staab Web Science 14
I try to• classify, describe commonalities, find generalizations
I will not• be 100% correct, 100% complete
BUT• Please ask, suggest, complement,...WHENEVER you feel like it!DON‘T HESITATE ASKING!
A Note
Steffen Staab Web Science 15
What is the Web?
Steffen Staab Web Science 16
What is the Web: Aspects of the Web at LargeArchitecture methodology:• Draw pictures in different dimensions• Pictures do not compete with, but complement each
other
Not only technical pictures!
Steffen Staab Web Science 17
The Web as a Device
Software• Browsers
– IE, Firefox, Chrome,...• Web Servers
– Apache, Tomcat,..• Content Management
and Data Delivery– Wordpress, drupal,
databases...• Search Engines• ...
Standards• Uniform Resource
Locator (URL)• HyperText Transfer
Protocol (http)• HyperText Markup
Language (html)• Domain name service • ...+ many more
Steffen Staab Web Science 18
The Web as Content
For human consumpton (primarily)
Text, HypertextImagesVideoAudioMultimediaInteraction (Games...)BrailleMathematics
For machine consumption (primarily)MetadataDataOntologies
Steffen Staab Web Science 19
The Web as Content
Free• Informational• Advertisements• Goods & services• Web 2.0
– Wikipedia– facebook
Paid• Individual payments• Subscription• Micropayment
Freemium
Steffen Staab Web Science 20
The Web and its Stakeholders
People• Citizens• Customers• Leisure seekers• Workers• Software developers
Internet providers• Landline• Mobile• Nested providers
(internet cafe...)• Website hosts• Peer2peer networks (Bittorrent...)
Platform operators• Shops• News• Web 2.0• Payment• Advertisement networks• Trust centers
Government• Police• Military• Secret service• Law• Citizen services• Administration• Politics
Steffen Staab Web Science 21
The Web as a Process
Governing• Standards processes
– W3C– Internet Engineering
Task Force (IETF)– RFC
• Domain name registration
• Internet routing– E.g. „great Chinese
firewall“
Regulation• Legal
– copyright • where enforced
– hate speech– ...
• Private– E.g. Facebook,
Instagram ... Pictures, nipple double standard
Steffen Staab Web Science 22
Observing the Web as a Medium and Mirror of (anti-)social Practices
• (Self-)Expression
• Dark Web– Crime– Gold farming– Violence– Pornography– Identity theft
• Sex lifes– Fetishes– prostitution
• Relationships– Breakups– Mobbing– Stalking– Advising– Counseling– Democracy
Steffen Staab Web Science 23
The Web as Medium & Mirror of NEW (?) (anti-)Social Practices
Automation• Twitter bots• Online dating bots• Smart contracts
(„Code is law“)– E.g. DRM
• Quantified everything– Physique – Psyche
• Infer your Big 5 personality traits from your facebook profile
• Threats to Internet Security & Safety– Hacking – Social engineering
Steffen Staab Web Science 24
Produce
Consume
Cognition
EmotionBehavior
SocialisationKnowledge
Observable Micro-
interactions in the Web
AppsProtocols
Data & InformationGovernance
WWW
Observable Macro-
effects in the Web
Web Science: Discipline or Transdisciplinary Endeavour?
Steffen Staab Web Science 25
How to investigate the Web?
Steffen Staab Web Science 26
Web Observatories Konect
Steffen Staab Web Science 27
Why to observe?• Understanding
– Collecting– Describing– Analyzing– Modeling– Predicting– Repeating!
Steffen Staab Web Science 28
Challenges – Data Collection IssuesLegal and/or Ethical • Crawling
– May be disallowed by provider
• Usage logging– Privacy of individuals
• Even if it is allowed....
Steffen Staab Web Science 29
Challenges – Data Collection Issues• Crawling
– What does it mean to crawl a heavily interactive site?– Incomplete data
• Unreachability• Time outs
Steffen Staab Web Science 30
Challenges – Data Collection Issues• Crawling
– What does it mean to crawl a heavily interactive site?– Incomplete data– Where to start?
• We cannot observe everything!– Even just for data size!– What appear to be most fruitful starting points?
Steffen Staab Web Science 31
Challenges – Data Collection Issues• Crawling
– What does it mean to crawl a heavily interactive site?– Incomplete data– Where to start?– Where to stop?
• Each crawl is a view– Twitter
» Tweet• URL
• Web Page• Subweb
» Followers• Followers‘ Followers
• ...
Steffen Staab Web Science 32
Challenges – Data Collection Issues• Crawling
– What does it mean to crawl a heavily interactive site?– Incomplete data– Where to start?– Where to stop?– Synchronous vs asynchronous
• Strictly speaking: only asynchronous crawling possible– But in [Dellschaft&Staab] we targeted the construction of models for
streams of tags
Steffen Staab Web Science 33
Challenges – Data Publishing IssuesLegal and/or Ethical Example Issues• AOL query log• Netflix challenge• Delicious
– http://www.tagora-project.eu/data/• Twitter
– Collecting, but no sharing• SocialSensor project
Steffen Staab Web Science 34
Challenges – Data Publishing IssuesTechnical/Modelling issues• Generic format, e.g. RDF• Format ready for digestion by a certain software, e.g.
for Matlab processing• Openness to other data
– E.g. references to DBPedia/Wikipedia• Accuracy of publishing
– http://me.org showed „...“– http://me.org showed „...“@2013-05-01:0900CEST– http://me.org showed „...“@2013-05-01:0900CEST called
from IP 193.99.144.85 using browser...version...history...
Steffen Staab Web Science 35
Sharing Software• Software
– For crawling or usage logging– Rather than sharing the data, share the code for observing
• Example: code for crawling Twitter in a certain way
• Issues– Limited repeatability– Disturbance liability („Störerhaftung“) – at least in DE
• If you provide source code for crawling, e.g., Facebook, even if you do not crawl FB, FB can sue you
Steffen Staab Web Science 36
More later by Jerome
Steffen Staab Web Science 37
Model the Web
What is in the content?
What is in the algorithm?
What is in the Social machine?
Web
Obs
erva
tory
Steffen Staab Web Science 38
Example Topic: Bias
Steffen Staab Web Science 39
Bias in the Device
Accessibility• Impaired eyesight• Impaired use of
mouse and keyboard• ...
HTML5 semantics helps – but is not used much
Steffen Staab Web Science 40
Bias in the Device
Accessibility• Impaired eyesight• Impaired use of
mouse and keyboard• ...
HTML5 semantics helps – but is not used much
Haris Aslanidis
Steffen Staab Web Science 41
Bias in the Device
Accessibility• Impaired eyesight• Impaired use of mouse and
keyboard• ...
HTML5 semantics helps – but is not used much
http://west.uni-koblenz.de/en/research/projects/mamem
http://www.mamem.eu/mamem-makes-available-gazetheweb-browse-application/
Steffen Staab Web Science 42
Search engines• Categorizing people and animals
– White vs black– http://www.nytimes.com/2016/06/26/opinion/sunday/
artificial-intelligences-white-guy-problem.html?_r=0
• Job advertisements– Well-paid job not offered to females
Bias in the Software
Steffen Staab Web Science 43
Bias in Content/Data
Credit Hire Sex Ethnic Zip Height ... ...
+ +
+ -
- +
+ +
- -
correlated
Data protection laws suggest not to process sensitive data attributes
like „sex“ or „ethnic“
Steffen Staab Web Science 44
Example:
Notable women described by „has husband“
Notable men not described by „has wife“
Steffen Staab Web Science 45
Bias in Content: Social Networks(Lerman et al 15)
Steffen Staab Web Science 46
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fish, salmon, wine
rice, fish
lobster, seafood, shrimp
coffee
coffee, wine
coffee
wine
wine
pizza, wine
pizza, wine
pasta, wine
pasta, shrimplobster, shrimp
seafood, shrimp
Tagged photos with geo-coordinates from Flickr
Steffen Staab Web Science 47
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
seafood, shrimp
lobster, shrimp
Bias in the Algorithm: Shape of Clusters
Steffen Staab Web Science 48
Evaluation: Anectodal, Perplexity, Gaming
Gaming study: intrusion detection
Precision 8 topicsavg / median
LGTA 0.60 / 0.58
Basic model 0.64 / 0.58
MGTM 0.78 / 0.75
Steffen Staab Web Science 49
• The Web reflects current and past discrimination!
• Example:– Predict who will leave the company based on email
features (ICWSM-16)• People with outsiders‘ vocabulary more prone to fail• Unresolved:
– Do they fail because they are less adaptable or– Do they fail because the environment is hostile to outsider?
Bias and Social Practices
Steffen Staab Web Science 50
Wikipedia• Efforts to counter bias
Law• E.g. UK equality act• Protected characteristics:
– Age, disability, gender, marriage, religion,...
Bias and Processes
Steffen Staab Web Science 51
Biases in the Social Machine:The Case of Liquid Feedback
Steffen Staab Web Science 52
...
Steffen Staab Web Science 53
Online Delegative Democracy
CC-BY-SA Ilmari Karonen
Steffen Staab Web Science 54
Delegative Democracy
• Between direct and representative democracy
CC-BY-SA Ilmari Karonen
Steffen Staab Web Science 55
Delegative Democracy
• Between direct and representative democracy
• Voters can delegate their vote to other voters
CC-BY-SA Ilmari Karonen
Steffen Staab Web Science 56
CC-BY-SA Ilmari Karonen
Steffen Staab Web Science 57
CC-BY-SA Ilmari Karonen
Steffen Staab Web Science 58
CC-BY-SA Ilmari Karonen
Delegative Democracy
• Between direct and representative democracy
• Voters can delegate their vote to other voters
• Delegations can be revoked at any time
Steffen Staab Web Science 59
CC-BY-SA Ilmari Karonen
Delegative Democracy
• Between direct and representative democracy
• Voters can delegate their vote to other voters
• Delegations can be revoked at any time
• Votes are public!
Steffen Staab Web Science 60
Dataset:LiquidFeedback
(German Pirate Party)
Steffen Staab Web Science 61
LiquidFeedback – Pirate Party• Observation: 08/2010 – 11/2013
• 13,836 Members• 14,964 Delegations
• 499,009 Votes
Steffen Staab Web Science 62
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by issues and belong to areas
Steffen Staab Web Science 63
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by issues and belong to areas
Area: Environmental issuesIssue: CO2 output has to be reduced.Initiative: Subsidise wind turbines!
Steffen Staab Web Science 64
LiquidFeedback – German Pirate Party
•
Users create initiatives, which are grouped by issues and belong to areas
Area: Environmental issuesIssue: CO2 output has to be reduced.Initiative: Subsidise wind turbines!
Areas: 22Issues: 3,565Initiatives: 6,517
Steffen Staab Web Science 65
LiquidFeedback – German Pirate Party
• Users create initiatives, which are grouped by issues and belong to areas
Delegations on global, initiative, issue and area level
→ “Back-delegations” possible
Steffen Staab Web Science 66
Dataset – First Impressions
•
Steffen Staab Web Science 67
Dataset – First Impressions
•
Voting Weight
Steffen Staab Web Science 68
Dataset – Bias ?
•
3,658 members > 10 votes1,156 members > 100 votes 54 members > 1,000 votesMedian all: 8 votes
Median delegating: 42 votesMedian delegates: 64 votes
Steffen Staab Web Science 69
Delegation Network• Temporal analysis
•
Steffen Staab Web Science 70
Delegation Network• Temporal analysis
•
Steffen Staab Web Science 71
Delegation Network• Temporal analysis
•
Steffen Staab Web Science 72
Delegation Network• Temporal analysis
•
Steffen Staab Web Science 73
Investigation• What are the social processes?• How are data collected?• Which algorithms are used?• How is the algorithm used in system processes?• What are the system effects?
Purposes of investigation• Social sciences: Observe bias in action!• Intervention: Change the overall system!• System sciences: Understand system limits!• Engineering: Build „proper“ technical system!
Summary: Bias
Steffen Staab Web Science 74
Web Models
Steffen Staab Web Science 75
• Descriptive– Qualitative– Statistical
• Predictive– Modeling deterministic regularities
• Generative– Modeling non-deterministic principles
• Liking a song• Creating a link
Web Models
Steffen Staab Web Science 76
Descriptive ModelsExample: Bow Tie Structure of the Web
Steffen Staab Web Science 77
Bow-tie structure of the Web
Steffen Staab Web Science 78
Predictive ModelsExample: Link Prediction by Triangle Closing
Steffen Staab Web Science 79
Social Network
Person Friendship
Steffen Staab Web Science 80
Recommender Systems
Predict who I will add as friend next
Standard algorithm: find friends-of-friends
me
Steffen Staab Web Science 81
Friend of a Friend
1 2 4 5 6
3
Count the number of ways a person can be found as the friend of a friend.
Steffen Staab Web Science 82
Generative ModelsExample: Link Creation by Barabasi-Albert
Steffen Staab Web Science 83
• Many large networks are scale free– Matthew effect: rich get richer
• Vote delegation!
• The degree distribution has a power-law behavior for large k (far from a Poisson distribution)
• Random graph theory and the Watts-Strogatz model cannot reproduce this feature
General considerations
Steffen Staab Web Science 84
Scale free
Same shape of hull, no matter which resolution
Steffen Staab Web Science 85
Scale-free of Web distribution
Same shape of distribution: no matter which k
Steffen Staab Web Science 86
Two generic mechanisms common in many real networks:• Growth (www, research literature, ...)• Preferential attachment:
attractiveness of popularity
The two are necessary
Barabasi-Albert model (1999)
Barabási & Albert, Science 286, 509 (1999)
Steffen Staab Web Science 87
• t=0, m0 nodes• Each time step we add a new node with m (m0)
edges that link the new node to m different nodes already present in the system
Growth
Steffen Staab Web Science 88
• When choosing the nodes to which the new connects, the probability that a new node will be connected to node i depends on the degree ki of node
Preferential attachment
( ) ii
jj
kk
k
Linear attachment (more general models)Sum over all existing nodes
Steffen Staab Web Science 89
Numerical simulations
• Power-law P(k)k- SF=3• The exponent does not depend on m (the
only parameter of the model)
Steffen Staab Web Science 90
The Past and the Future Web
Steffen Staab Web Science 91
• 1945 Vannevar Bush, „As we may think“, Memex• 1962 Ted Nelson, Hypertext• 1965 Wide area network• 1968 Doug Engelbart, The mother of all demos• 1972 Public Arpanet, Email• 1974-82 Internet protocol TCP/IP• 1978 Consumer information services & Email• 1983 AOL, online service for games, communities...• 1984 Domain name service
Pre-Web
Steffen Staab Web Science 92
The World Wide Web• 1989 Concept drafted by Tim Berners-Lee• 1993 National Center for SuperComputing
Applications launched Mosaic X• 1994 First WWW conference• 1994 W3C started at MIT• Commercial websites began their proliferation• Followed by local school/club/family sites• The web exploded
– 1994 – 3,2 million hosts and 3,000 websites– 1995 – 6,4 million hosts and 25,000 websites– 1997 – 19,5 million hosts and 1,2 million websites– January 2001 – 110 million hosts and 30 million websites
Steffen Staab Web Science 93
The World Wide Web– 1994/1995 Amazon– 1994/1995 Wiki– 1995 AltaVista Search Engine– 1995 Internet Explorer– 1997-2001 Browser wars– 1996-1998 XML recommendation– 1998 Google– 1999 First W3C recommendation on RDF (Semantic Web)– 2001 Dot.Com bubble bursts– 2001 Wikipedia– 2003/2004 Facebook– 2004 Flickr– 2005 YouTube
Steffen Staab Web Science 94
Concepts Example Applications
Web of People Physical transport service (Uber (2009), Lyft), accommodation service (AirBnB, Couchsurfing), online dating service,
2009
Web of Things Smart city, ambient intelligence, personal and public health information, personal and public transport information
2007
Web of Services Cloud services, Digital transformation, Programmable Web (2005)
2005
Web of Data User generated content applications (Facebook, Wikipedia (2001) and Wikidata,…), Linked open gov data
2001
Web of Documents HTTP, HTML, XML, Browser (Mosaic 1993)
1993
Computer Networks/Internet
Document delivery (internet 1982), VOIP, Streaming
1982
Steffen Staab Web Science 95
Internet of Things vs Web of Things
IoT• Internet• P2P networks• Sensors
Web of things• How to link?• What is „linking WoT“?• How to use by me?• How to discover?• How to search?
Steffen Staab Web Science 96
Web of People
Steffen Staab Web Science 97
http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif
Steffen Staab Web Science 98
Aaron Swartz †
Co-author of RSS1.0 at the age of 14
Steffen Staab Web Science 99
http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif
Steffen Staab Web Science 100
Web of People
• What is it?– Identification
• Orcid• Facebook ID• Oauth 2.0?
– Trust• Uber score• Airbnb score• Ebay score• Tinder score• Group score
– Chinese social score– Credit rating score
?
Steffen Staab Web Science 101
Concepts Delivered technical capabilitiesWeb of People Identification and rating by/of peopleWeb of Things Identification, linking, aggregation,
monitoring and controlling of thingsWeb of Services Identification, composition and calling of
servicesWeb of Data Identification, linking and retrieval of dataWeb of Documents Identification, linking and retrieval of
documentsComputer Networks/Internet
Identification of and communication between computers
Steffen Staab Web Science 102
Concepts StandardsWeb of People No mature standards yetWeb of Things No mature standards yetWeb of Services REST, JSON, JSONLDWeb of Data RDF, SPARQLWeb of Documents HTTP, HTML, XML, AJAXComputer networks/Internet Internet, TCP/IP, Optical fibre,
5G
Steffen Staab Web Science 103
Your thoughts?
Steffen Staab Web Science 104
Conclusions
Steffen Staab Web Science 105
What is in the data?
What is in the algorithm?
What is in the Social machine?
Web
Obs
erva
tory
Steffen Staab Web Science 106
Telling the Story with Web Science
What is in the Data?
What is in the Algorithm?
What is in the Social Machine?
Story telling
Under-standing
Modelling
Steffen Staab Web Science 107
Web
Accomplishments
• Web of Services• Web of Data• Web of Documents• Computer
networks/Internet
Future in the making
• Web of People• Web of Things
How to identify and use?How to observe?How to resolve issues?
Steffen Staab Web Science 108
Thank you!