wwsss intro2016-final

Post on 13-Apr-2017

469 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Steffen Staab Web Science 1Institute for Web Science and Technologies · University of Koblenz-Landau, GermanyWeb and Internet Science Group · ECS · University of Southampton, UK &

Introduction to Web Science

Steffen StaabUniversity of Southampton &Universität Koblenz-Landau

Steffen Staab Web Science 2

Welcome in Koblenz!

Steffen Staab Web Science 3

9.00 am 10.30 am 3.00 pm

Thu Introduction to Web Science

Noshir Contractor Tutorial Poster session

Fri Internet and Law Nikolaus Forgo Tutorial Project work

Sat Web and Politics Stéphane Bazan Tutorial Project work

Mon Social Machines Jim Hendler Tutorial Project work

Tue Web Entrepreneurship Simon Köhl Tutorial Project work

Wed Computational Social Science

Nuria Oliver Tutorial Project presentation

Program Overview

Details at http://wwsss16.webscience.org/schedule

Steffen Staab Web Science 4

• Work independently and self-organised• Get feedback from your advisor • Choose your working space:

– D 238, D 239, A 308, B 005 on weekdays– E 412, E 413, E 414, A 308, B 005 on Saturday– or anywhere you like

• Final presentation on Wednesday afternoon (10 minutes per team)

Project work

Details at http://wwsss16.webscience.org/schedule

Steffen Staab Web Science 5

Social EventsThu Cocktails at city beach 6.00 pm Meet at main campus entry

Fri SommerUni party 8.00 pm Campus

Sat Free time Old town, KaleidosKOp festival, fortress, beer gardens, ...

Sun Excursion (few seats available!) 10.45 am Meet at main campus entry

Quarter final France vs. Island 9.00 pm Campus

Mon Free time Old town, fortress, beer gardens, ...

Tue Barbecue 7 pm Campus

SommerUni party 9 pm Campus

Wed SommerUni barbecue 6 pm Campus

Steffen Staab Web Science 6

Thursday

Noshir Contractor:Leveraging Web/Internet/Network Sciences (WINS) to address Grand Societal Challenges

9.00 am

10.30 amSteffen Staab and Jérôme Kunegis: Introduction to Web Science

3.00 pmPoster session6.00 pm

Cocktails at city beachmeeting point: main campus entrance

Breaks:• 10 am, • 12 am, • 2.30 pm

Steffen Staab Web Science 7

Friday

Nikolaus Forgó:Privacy Law and the Web: A Story of Love and Hate?

9.00 am

10.30 amRüdiger Grimm: Internet and Law

3.00 pmSupervised project work8.00 pm

SommerUni partyAt the campus – impossible to miss!

Breaks:• 10 am, • 12 am, • 2.30 pm

Steffen Staab Web Science 8

Thanks to our sponsors!

Steffen Staab Web Science 9

Let‘s get started!

Steffen Staab Web Science 10

Produce

Consume

Cognition

EmotionBehavior

SocialisationKnowledge

Observable Micro-

interactions in the Web

AppsProtocols

Data & InformationGovernance

WWW

Observable Macro-

effects in the Web

Web Science

Steffen Staab Web Science 11

Tertium Datur or Where Dijkstra was wrong

Computer Science

Science about

Computers

Astronomy

Telescope Science

Web Science

Science about the

Web

What is Web Science?

Steffen Staab Web Science 12

Web science is an emerging interdisciplinary field concerned with the study of large-scale socio-technical systems, such as the World Wide Web. It considers the relationship between people and technology, the ways that society and technology co-constitute one another and the impact of this co-constitution on broader society.

Wikipedia, 2016-06-29

Definition of Web Science

Steffen Staab Web Science 13

Agenda• What is Web Science?• What is the Web?

– Aspects of the Web at Large• How to investigate the Web?

– Observing the Web– An example using the architecture: bias in the Web– How to model aspects of the Web

• What is the past and the future of the Web?

Steffen Staab Web Science 14

I try to• classify, describe commonalities, find generalizations

I will not• be 100% correct, 100% complete

BUT• Please ask, suggest, complement,...WHENEVER you feel like it!DON‘T HESITATE ASKING!

A Note

Steffen Staab Web Science 15

What is the Web?

Steffen Staab Web Science 16

What is the Web: Aspects of the Web at LargeArchitecture methodology:• Draw pictures in different dimensions• Pictures do not compete with, but complement each

other

Not only technical pictures!

Steffen Staab Web Science 17

The Web as a Device

Software• Browsers

– IE, Firefox, Chrome,...• Web Servers

– Apache, Tomcat,..• Content Management

and Data Delivery– Wordpress, drupal,

databases...• Search Engines• ...

Standards• Uniform Resource

Locator (URL)• HyperText Transfer

Protocol (http)• HyperText Markup

Language (html)• Domain name service • ...+ many more

Steffen Staab Web Science 18

The Web as Content

For human consumpton (primarily)

Text, HypertextImagesVideoAudioMultimediaInteraction (Games...)BrailleMathematics

For machine consumption (primarily)MetadataDataOntologies

Steffen Staab Web Science 19

The Web as Content

Free• Informational• Advertisements• Goods & services• Web 2.0

– Wikipedia– facebook

Paid• Individual payments• Subscription• Micropayment

Freemium

Steffen Staab Web Science 20

The Web and its Stakeholders

People• Citizens• Customers• Leisure seekers• Workers• Software developers

Internet providers• Landline• Mobile• Nested providers

(internet cafe...)• Website hosts• Peer2peer networks (Bittorrent...)

Platform operators• Shops• News• Web 2.0• Payment• Advertisement networks• Trust centers

Government• Police• Military• Secret service• Law• Citizen services• Administration• Politics

Steffen Staab Web Science 21

The Web as a Process

Governing• Standards processes

– W3C– Internet Engineering

Task Force (IETF)– RFC

• Domain name registration

• Internet routing– E.g. „great Chinese

firewall“

Regulation• Legal

– copyright • where enforced

– hate speech– ...

• Private– E.g. Facebook,

Instagram ... Pictures, nipple double standard

Steffen Staab Web Science 22

Observing the Web as a Medium and Mirror of (anti-)social Practices

• (Self-)Expression

• Dark Web– Crime– Gold farming– Violence– Pornography– Identity theft

• Sex lifes– Fetishes– prostitution

• Relationships– Breakups– Mobbing– Stalking– Advising– Counseling– Democracy

Steffen Staab Web Science 23

The Web as Medium & Mirror of NEW (?) (anti-)Social Practices

Automation• Twitter bots• Online dating bots• Smart contracts

(„Code is law“)– E.g. DRM

• Quantified everything– Physique – Psyche

• Infer your Big 5 personality traits from your facebook profile

• Threats to Internet Security & Safety– Hacking – Social engineering

Steffen Staab Web Science 24

Produce

Consume

Cognition

EmotionBehavior

SocialisationKnowledge

Observable Micro-

interactions in the Web

AppsProtocols

Data & InformationGovernance

WWW

Observable Macro-

effects in the Web

Web Science: Discipline or Transdisciplinary Endeavour?

Steffen Staab Web Science 25

How to investigate the Web?

Steffen Staab Web Science 26

Web Observatories Konect

Steffen Staab Web Science 27

Why to observe?• Understanding

– Collecting– Describing– Analyzing– Modeling– Predicting– Repeating!

Steffen Staab Web Science 28

Challenges – Data Collection IssuesLegal and/or Ethical • Crawling

– May be disallowed by provider

• Usage logging– Privacy of individuals

• Even if it is allowed....

Steffen Staab Web Science 29

Challenges – Data Collection Issues• Crawling

– What does it mean to crawl a heavily interactive site?– Incomplete data

• Unreachability• Time outs

Steffen Staab Web Science 30

Challenges – Data Collection Issues• Crawling

– What does it mean to crawl a heavily interactive site?– Incomplete data– Where to start?

• We cannot observe everything!– Even just for data size!– What appear to be most fruitful starting points?

Steffen Staab Web Science 31

Challenges – Data Collection Issues• Crawling

– What does it mean to crawl a heavily interactive site?– Incomplete data– Where to start?– Where to stop?

• Each crawl is a view– Twitter

» Tweet• URL

• Web Page• Subweb

» Followers• Followers‘ Followers

• ...

Steffen Staab Web Science 32

Challenges – Data Collection Issues• Crawling

– What does it mean to crawl a heavily interactive site?– Incomplete data– Where to start?– Where to stop?– Synchronous vs asynchronous

• Strictly speaking: only asynchronous crawling possible– But in [Dellschaft&Staab] we targeted the construction of models for

streams of tags

Steffen Staab Web Science 33

Challenges – Data Publishing IssuesLegal and/or Ethical Example Issues• AOL query log• Netflix challenge• Delicious

– http://www.tagora-project.eu/data/• Twitter

– Collecting, but no sharing• SocialSensor project

Steffen Staab Web Science 34

Challenges – Data Publishing IssuesTechnical/Modelling issues• Generic format, e.g. RDF• Format ready for digestion by a certain software, e.g.

for Matlab processing• Openness to other data

– E.g. references to DBPedia/Wikipedia• Accuracy of publishing

– http://me.org showed „...“– http://me.org showed „...“@2013-05-01:0900CEST– http://me.org showed „...“@2013-05-01:0900CEST called

from IP 193.99.144.85 using browser...version...history...

Steffen Staab Web Science 35

Sharing Software• Software

– For crawling or usage logging– Rather than sharing the data, share the code for observing

• Example: code for crawling Twitter in a certain way

• Issues– Limited repeatability– Disturbance liability („Störerhaftung“) – at least in DE

• If you provide source code for crawling, e.g., Facebook, even if you do not crawl FB, FB can sue you

Steffen Staab Web Science 36

More later by Jerome

Steffen Staab Web Science 37

Model the Web

What is in the content?

What is in the algorithm?

What is in the Social machine?

Web

Obs

erva

tory

Steffen Staab Web Science 38

Example Topic: Bias

Steffen Staab Web Science 39

Bias in the Device

Accessibility• Impaired eyesight• Impaired use of

mouse and keyboard• ...

HTML5 semantics helps – but is not used much

Steffen Staab Web Science 40

Bias in the Device

Accessibility• Impaired eyesight• Impaired use of

mouse and keyboard• ...

HTML5 semantics helps – but is not used much

Haris Aslanidis

Steffen Staab Web Science 42

Search engines• Categorizing people and animals

– White vs black– http://www.nytimes.com/2016/06/26/opinion/sunday/

artificial-intelligences-white-guy-problem.html?_r=0

• Job advertisements– Well-paid job not offered to females

Bias in the Software

Steffen Staab Web Science 43

Bias in Content/Data

Credit Hire Sex Ethnic Zip Height ... ...

+ +

+ -

- +

+ +

- -

correlated

Data protection laws suggest not to process sensitive data attributes

like „sex“ or „ethnic“

Steffen Staab Web Science 44

Example:

Notable women described by „has husband“

Notable men not described by „has wife“

Steffen Staab Web Science 45

Bias in Content: Social Networks(Lerman et al 15)

Steffen Staab Web Science 46

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

rice, fish

lobster, seafood, shrimp

coffee

coffee, wine

coffee

wine

wine

pizza, wine

pizza, wine

pasta, wine

pasta, shrimplobster, shrimp

seafood, shrimp

Tagged photos with geo-coordinates from Flickr

Steffen Staab Web Science 47

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

seafood, shrimp

lobster, shrimp

Bias in the Algorithm: Shape of Clusters

Steffen Staab Web Science 48

Evaluation: Anectodal, Perplexity, Gaming

Gaming study: intrusion detection

Precision 8 topicsavg / median

LGTA 0.60 / 0.58

Basic model 0.64 / 0.58

MGTM 0.78 / 0.75

Steffen Staab Web Science 49

• The Web reflects current and past discrimination!

• Example:– Predict who will leave the company based on email

features (ICWSM-16)• People with outsiders‘ vocabulary more prone to fail• Unresolved:

– Do they fail because they are less adaptable or– Do they fail because the environment is hostile to outsider?

Bias and Social Practices

Steffen Staab Web Science 50

Wikipedia• Efforts to counter bias

Law• E.g. UK equality act• Protected characteristics:

– Age, disability, gender, marriage, religion,...

Bias and Processes

Steffen Staab Web Science 51

Biases in the Social Machine:The Case of Liquid Feedback

Steffen Staab Web Science 52

...

Steffen Staab Web Science 53

Online Delegative Democracy

CC-BY-SA Ilmari Karonen

Steffen Staab Web Science 54

Delegative Democracy

• Between direct and representative democracy

CC-BY-SA Ilmari Karonen

Steffen Staab Web Science 55

Delegative Democracy

• Between direct and representative democracy

• Voters can delegate their vote to other voters

CC-BY-SA Ilmari Karonen

Steffen Staab Web Science 56

CC-BY-SA Ilmari Karonen

Steffen Staab Web Science 57

CC-BY-SA Ilmari Karonen

Steffen Staab Web Science 58

CC-BY-SA Ilmari Karonen

Delegative Democracy

• Between direct and representative democracy

• Voters can delegate their vote to other voters

• Delegations can be revoked at any time

Steffen Staab Web Science 59

CC-BY-SA Ilmari Karonen

Delegative Democracy

• Between direct and representative democracy

• Voters can delegate their vote to other voters

• Delegations can be revoked at any time

• Votes are public!

Steffen Staab Web Science 60

Dataset:LiquidFeedback

(German Pirate Party)

Steffen Staab Web Science 61

LiquidFeedback – Pirate Party• Observation: 08/2010 – 11/2013

• 13,836 Members• 14,964 Delegations

• 499,009 Votes

Steffen Staab Web Science 62

LiquidFeedback – German Pirate Party

Users create initiatives, which are grouped by issues and belong to areas

Steffen Staab Web Science 63

LiquidFeedback – German Pirate Party

Users create initiatives, which are grouped by issues and belong to areas

Area: Environmental issuesIssue: CO2 output has to be reduced.Initiative: Subsidise wind turbines!

Steffen Staab Web Science 64

LiquidFeedback – German Pirate Party

Users create initiatives, which are grouped by issues and belong to areas

Area: Environmental issuesIssue: CO2 output has to be reduced.Initiative: Subsidise wind turbines!

Areas: 22Issues: 3,565Initiatives: 6,517

Steffen Staab Web Science 65

LiquidFeedback – German Pirate Party

• Users create initiatives, which are grouped by issues and belong to areas

Delegations on global, initiative, issue and area level

→ “Back-delegations” possible

Steffen Staab Web Science 66

Dataset – First Impressions

Steffen Staab Web Science 67

Dataset – First Impressions

Voting Weight

Steffen Staab Web Science 68

Dataset – Bias ?

3,658 members > 10 votes1,156 members > 100 votes 54 members > 1,000 votesMedian all: 8 votes

Median delegating: 42 votesMedian delegates: 64 votes

Steffen Staab Web Science 69

Delegation Network• Temporal analysis

Steffen Staab Web Science 70

Delegation Network• Temporal analysis

Steffen Staab Web Science 71

Delegation Network• Temporal analysis

Steffen Staab Web Science 72

Delegation Network• Temporal analysis

Steffen Staab Web Science 73

Investigation• What are the social processes?• How are data collected?• Which algorithms are used?• How is the algorithm used in system processes?• What are the system effects?

Purposes of investigation• Social sciences: Observe bias in action!• Intervention: Change the overall system!• System sciences: Understand system limits!• Engineering: Build „proper“ technical system!

Summary: Bias

Steffen Staab Web Science 74

Web Models

Steffen Staab Web Science 75

• Descriptive– Qualitative– Statistical

• Predictive– Modeling deterministic regularities

• Generative– Modeling non-deterministic principles

• Liking a song• Creating a link

Web Models

Steffen Staab Web Science 76

Descriptive ModelsExample: Bow Tie Structure of the Web

Steffen Staab Web Science 77

Bow-tie structure of the Web

Steffen Staab Web Science 78

Predictive ModelsExample: Link Prediction by Triangle Closing

Steffen Staab Web Science 79

Social Network

Person Friendship

Steffen Staab Web Science 80

Recommender Systems

Predict who I will add as friend next

Standard algorithm: find friends-of-friends

me

Steffen Staab Web Science 81

Friend of a Friend

1 2 4 5 6

3

Count the number of ways a person can be found as the friend of a friend.

Steffen Staab Web Science 82

Generative ModelsExample: Link Creation by Barabasi-Albert

Steffen Staab Web Science 83

• Many large networks are scale free– Matthew effect: rich get richer

• Vote delegation!

• The degree distribution has a power-law behavior for large k (far from a Poisson distribution)

• Random graph theory and the Watts-Strogatz model cannot reproduce this feature

General considerations

Steffen Staab Web Science 84

Scale free

Same shape of hull, no matter which resolution

Steffen Staab Web Science 85

Scale-free of Web distribution

Same shape of distribution: no matter which k

Steffen Staab Web Science 86

Two generic mechanisms common in many real networks:• Growth (www, research literature, ...)• Preferential attachment:

attractiveness of popularity

The two are necessary

Barabasi-Albert model (1999)

Barabási & Albert, Science 286, 509 (1999)

Steffen Staab Web Science 87

• t=0, m0 nodes• Each time step we add a new node with m (m0)

edges that link the new node to m different nodes already present in the system

Growth

Steffen Staab Web Science 88

• When choosing the nodes to which the new connects, the probability that a new node will be connected to node i depends on the degree ki of node

Preferential attachment

( ) ii

jj

kk

k

Linear attachment (more general models)Sum over all existing nodes

Steffen Staab Web Science 89

Numerical simulations

• Power-law P(k)k- SF=3• The exponent does not depend on m (the

only parameter of the model)

Steffen Staab Web Science 90

The Past and the Future Web

Steffen Staab Web Science 91

• 1945 Vannevar Bush, „As we may think“, Memex• 1962 Ted Nelson, Hypertext• 1965 Wide area network• 1968 Doug Engelbart, The mother of all demos• 1972 Public Arpanet, Email• 1974-82 Internet protocol TCP/IP• 1978 Consumer information services & Email• 1983 AOL, online service for games, communities...• 1984 Domain name service

Pre-Web

Steffen Staab Web Science 92

The World Wide Web• 1989 Concept drafted by Tim Berners-Lee• 1993 National Center for SuperComputing

Applications launched Mosaic X• 1994 First WWW conference• 1994 W3C started at MIT• Commercial websites began their proliferation• Followed by local school/club/family sites• The web exploded

– 1994 – 3,2 million hosts and 3,000 websites– 1995 – 6,4 million hosts and 25,000 websites– 1997 – 19,5 million hosts and 1,2 million websites– January 2001 – 110 million hosts and 30 million websites

Steffen Staab Web Science 93

The World Wide Web– 1994/1995 Amazon– 1994/1995 Wiki– 1995 AltaVista Search Engine– 1995 Internet Explorer– 1997-2001 Browser wars– 1996-1998 XML recommendation– 1998 Google– 1999 First W3C recommendation on RDF (Semantic Web)– 2001 Dot.Com bubble bursts– 2001 Wikipedia– 2003/2004 Facebook– 2004 Flickr– 2005 YouTube

Steffen Staab Web Science 94

Concepts Example Applications  

Web of People Physical transport service (Uber (2009), Lyft), accommodation service (AirBnB, Couchsurfing), online dating service,

2009

Web of Things Smart city, ambient intelligence, personal and public health information, personal and public transport information

2007

Web of Services Cloud services, Digital transformation, Programmable Web (2005)

2005

Web of Data User generated content applications (Facebook, Wikipedia (2001) and Wikidata,…), Linked open gov data

2001

Web of Documents HTTP, HTML, XML, Browser (Mosaic 1993)

1993

Computer Networks/Internet

Document delivery (internet 1982), VOIP, Streaming

1982

Steffen Staab Web Science 95

Internet of Things vs Web of Things

IoT• Internet• P2P networks• Sensors

Web of things• How to link?• What is „linking WoT“?• How to use by me?• How to discover?• How to search?

Steffen Staab Web Science 96

Web of People

Steffen Staab Web Science 97

http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif

Steffen Staab Web Science 98

Aaron Swartz †

Co-author of RSS1.0 at the age of 14

Steffen Staab Web Science 99

http://static.tapastic.com/cartoons/19/f5/11/a0/a70ea5d7dbdd477d9e3bc2e0a2bfa286.gif

Steffen Staab Web Science 100

Web of People

• What is it?– Identification

• Orcid• Facebook ID• Oauth 2.0?

– Trust• Uber score• Airbnb score• Ebay score• Tinder score• Group score

– Chinese social score– Credit rating score

?

Steffen Staab Web Science 101

Concepts Delivered technical capabilitiesWeb of People Identification and rating by/of peopleWeb of Things Identification, linking, aggregation,

monitoring and controlling of thingsWeb of Services Identification, composition and calling of

servicesWeb of Data Identification, linking and retrieval of dataWeb of Documents Identification, linking and retrieval of

documentsComputer Networks/Internet

Identification of and communication between computers

Steffen Staab Web Science 102

Concepts StandardsWeb of People No mature standards yetWeb of Things No mature standards yetWeb of Services REST, JSON, JSONLDWeb of Data RDF, SPARQLWeb of Documents HTTP, HTML, XML, AJAXComputer networks/Internet Internet, TCP/IP, Optical fibre,

5G

Steffen Staab Web Science 103

Your thoughts?

Steffen Staab Web Science 104

Conclusions

Steffen Staab Web Science 105

What is in the data?

What is in the algorithm?

What is in the Social machine?

Web

Obs

erva

tory

Steffen Staab Web Science 106

Telling the Story with Web Science

What is in the Data?

What is in the Algorithm?

What is in the Social Machine?

Story telling

Under-standing

Modelling

Steffen Staab Web Science 107

Web

Accomplishments

• Web of Services• Web of Data• Web of Documents• Computer

networks/Internet

Future in the making

• Web of People• Web of Things

How to identify and use?How to observe?How to resolve issues?

Steffen Staab Web Science 108

Thank you!

top related