the mediabase

49
Lehrstuhl Informatik 5 (Informationssysteme) Prof. Dr. M. Jarke I5-KL-111010-1 TeLLNet GALA The MediaBase Ralf Klamma Informatik 5 (DBIS), RWTH Aachen University Webinar December 16, 2010

Upload: ralf-klamma

Post on 10-May-2015

4.405 views

Category:

Technology


8 download

DESCRIPTION

The MediaBase A Webinar for the TELMAP project December 16, 2010 Ralf Klamma RWTH Aachen University Information Systems & Database Technology

TRANSCRIPT

Page 1: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-1

TeLLNet

GALA The MediaBase

Ralf Klamma

Informatik 5 (DBIS),RWTH Aachen University

WebinarDecember 16, 2010

Page 2: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-2

TeLLNet

GALA

The Overall Approach

Page 3: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-3

TeLLNet

GALA

What is unique aboutthe MediaBase?

Community

Interdisciplinary multidimensional model of digital networks– Social network analysis (SNA) is defining measures for social

relations– Actor network theory (ANT) is connecting human and media agents– I* framework is defining strategic goals and dependencies– Theory of media transcriptions is studying cross-media knowledge

social softwareWiki, Blog, Podcast, IM, Chat, Email, Newsgroup, Chat …

i*-Dependencies(Structural, Cross-media)

Members(Social Network Analysis: Centrality,

Efficiency)

network of artifactsMicrocontent, Blog entry, Message, Burst, Thread,

Comment, Conversation, Feedback (Rating)

network of members

Communities of practice

Media Networks

Page 4: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-4

TeLLNet

GALA

Modeling Dependencies Using the i* Framework

Eric S. K. Yu, Towards Modeling and Reasoning Support for Early-Phase Requirements Engineering, RE 1997

Network

Coordinator

Gatekeeper

Hub

Member

Iterant Broker

URL

isA

isA

isA

isA

Coordination

Artifact

Communication

Legend:AgentGoalResource Task

Page 5: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-5

TeLLNet

GALA

What can you do with the Mediabase Community Interface for (Firefox Plugin)

– Adding media for crawling, searching & viewing– Observing social networks over time– Retrieving structural patterns of media– Applying Web 2.0 operations (tagging, etc.) on media

Writing your own crawlers Applying all kind of social network measures

– Centrality measures – Finding influential & powerful persons– Network statistics – Understand networks at large

Advanced queries in RDF Store on concepts and relations– Who is the owner of company x?– Structured input for conceptual mapping tools

Page 6: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-6

TeLLNet

GALA

What is the MediaBase?Collection of Social Software artifacts: Mailing lists (>200 k) Blogs (>300 k) Websites Newsletters

The MediaBase• IBM DB 2 data store• 24/7 Perl crawlers for media artifacts• Community oriented Commander Interface• Social network analysis & visualization tools• PALADIN: A pattern language for automatic behavior detection• Automatic extraction of concepts and relations in RDF

Wikipedias RSS Feeds Forums …

Page 7: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-7

TeLLNet

GALA

The Data Model

Page 8: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-8

TeLLNet

GALA

MediaBase Model

A Mediabase is a six-tuple graph L), , , R,(A, M ηνµ=

A A R ×⊆L A : →µ

L R : →ν{ }1 0, R : →η

Page 9: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-9

TeLLNet

GALA

Simplified Meta Model

Actor

Agent CommunityProcessMedium Artifact

Attribute has

stores creates is affected by belongs go

represents consumes performs ranks

… LocalizeTranscribeBrowse Address

isA

isA

Latour: On Recalling ANT, 1999

Page 10: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-10

TeLLNet

GALA

Actors in the Mediabase

{ }Network Agent, Process, Artefact, Medium, A ⊆

Folksonomy site, gbookmarkin Social Forum, Wiki,room,Chat Podcast, Blog, site,-Web

Feed, Newsgroup, ,Newsletter lists, Mailing Medium

Reference Rankíng,,Multimedia Rating, URL,Review, Trackback, Tag, ,Executions

Thread, entry, Blog Burst, on,Conversati Feedback, Host, n,Transactio Entry, RSS Comment, Index, mail,-E Message,

Artifact

⊆Addressing ion,Transcript Retrieval,

,Monitoring Search, n,Acquisitio Process

⊆Expert onalist,Conversati Spammer, Troll, ,Questioner

person, Answering Dead, Reviewer, Lurker, Member, tor,AdministraAgent

Page 11: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-11

TeLLNet

GALA

Medium – Artifact Compatibility

Email Mailing List Blog Transaction-

based Website Wiki Chat Room URL Forum

Message + + - - - - - +

Thread - + - - + - - +

Burst + + + + + - - +

Conversation - - - - - + - +

Blog Entry - - + - - - - -

Comment - - + + + - - +

Web Page - - - - + - + -

Transaction - - - + - - - -

Feedback - - - + - - - +

Page 12: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-12

TeLLNet

GALA

The Crawlers

Page 13: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-13

TeLLNet

GALA

Crawling Technologies

Artifact MediaW ∪=

Index Thread Message list MailingMW ∪∪∪=

Agent Process Artifact MediaI ∪∪∪=

Network Agent Process Artifact MediaG ∪∪∪∪=

Mix of dumps (Wikis) and special purpose crawlers:

Index Blogentry Blogroll BlogBW ∪∪∪∪= Comment

Page 14: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-14

TeLLNet

GALA

Crawler Overview

Page 15: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-15

TeLLNet

GALA

Website Crawler

Page 16: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-16

TeLLNet

GALA

Feed Crawler

Page 17: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-17

TeLLNet

GALA

Mailinglist Crawler

Page 18: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-18

TeLLNet

GALA

News Crawler

Page 19: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-19

TeLLNet

GALA

Podcast Crawler

Page 20: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-20

TeLLNet

GALA

The MediaBase Commander

Page 21: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-21

TeLLNet

GALA

Media Base Web 2.0 Commander Personalization (user annotates resources with tags and has his page) Community-awareness (resources and annotation of others are open) User-friendly interface (Firefox plug-in, easy insertion of resources, tags, tracking of

recent changes)

Page 22: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-22

TeLLNet

GALA

Application Programmer Interfaces

Under Development– GraphService – Visualization and PALADIN

– http://dbis.rwth-aachen.de/~atlas/module_build/JavaDoc//atlas_las_services_graph-service/HEAD/javadoc/index.html

– TargETLy Service – RDF Data Generator– http://dbis.rwth-

aachen.de/~atlas/module_build/JavaDoc/atlas_theses_da_krenge_TargETLy2/HEAD/javadoc/index.html

Page 23: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-23

TeLLNet

GALA

GraphService

AbstractDigitalNetwork – Representation ofMetaModel

Classes for Networks – Blogs, Mailinglists, etc. Classes for Basic SNA Classes for Pattern Analysis Classes for GraphLayout

Page 24: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-24

TeLLNet

GALA

TargETLy Service

Connection to RDF Store OpenCalais Service – RDF Generator Pattern Analysis IntentAnalysis Collection of predefined RDF Queries

– e.g. companyCompetitor, companyEmployeeNumber– e.g. patentFiling, patentIssuance– e.g. personEmailAddress, creditRating

Page 25: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-25

TeLLNet

GALA

PALADIN – Pattern Analysis

Page 26: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-26

TeLLNet

GALA

PALADIN: Disturbances in Cross-media Social Networks

What is a disturbance?– Sensing an incompatibility

between theories exposed and theories-in-use

Disturbances are starting points of learning processes– Disturbances disturb,

prevent … but they are creating reflection

Disturbances are hard to detect or to forecast

Page 27: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-27

TeLLNet

GALA

Pattern Language for PALADIN: Example Troll

Troll Pattern: This pattern tries to discover the cases when a troll exists in a digital social network. A troll in the network is considered a disturbance.

Disturbance:(EXISTS [medium | medium.affordance = threadArtefact]) &

(EXISTS [troll |(EXISTS [thread | (thread.author = troll) & (COUNT [message | (message.author = troll) & (message.posted = thread)]) > minPosts]) &(~EXISTS[ thread1, message1| (thread1.author1 != troll) &(message1.author = troll & message1.posted = thread1 ]))])])

Forces: medium; troll; network; member; thread; message; urlForce Relations: neighbour(troll, member); own thread(troll, thread)Solution: No attention must be paid to the discussions started by the troll. Rationale: The troll needs attention to continue its activities. If no attention is paid, he/she

will stop participating in the discussions. Pattern Relations: Associates Spammer pattern.

Page 28: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-28

TeLLNet

GALA

Pattern Discovery ProcessPattern

DisturbanceVariables

Pattern TemplateDisturbance

VariablesPattern Parameters

Pattern Template Instance

Pattern Instance

Disturbance

Variables Pattern Parameters

Forces ForceRelations

Rationale

Dependencies

Description Solution

Pattern Relations

Disturbance Instances

Variables Pattern Parameters

Digital Social Network

1. Set pattern parameters

2. Instantiate disturbances

3. Evaluate disturbances

4a. Change Pattern Parameters

4b. Apply Pattern Solution

Page 29: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-29

TeLLNet

GALA

PALADIN Case Study10 patterns of disturbance over 119 social network instances, 17359 individuals, 215 345 mails

Pattern Occurrences RemarksBurst 22 The pattern finds out topics which were very important for certain

period of time. Scalability is necessary.No Conversationalist 76 The existence implies little communication in the network.No Questioner 67 The existence implies that the network is not popular.No Answering Person 61 Occurs in small networks. The effects of the lack of an answering

person must be further checked with content analysis.Troll 2 Troll occurs very rarely in cultural communities. True negatives exist.Spammer 86 Spammers can be found often in discussion groups. False positives

exist.Leader 37 The pattern occurs in the network centered around a member.No Leader 40 Occurs in big networks where the members are distributed in

different clusters.Structural Hole 67 Occurs for members having neighbors with only one contact.Independent Discussions

13 Occurs in large networks where disconnected subnetworks exist. Scalability is necessary.

Page 30: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-30

TeLLNet

GALA

Visualization & Analysis

Page 31: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-31

TeLLNet

GALA

Social Network Analysis of Open Source Communities

Eclipse components network based on analysis of source code repository (Software Architecture)

Eclipse components network based on analysis of mailing list communication (Social Structure)

Page 32: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-32

TeLLNet

GALA

Community Reflection about Development Process

Social platform: Eclipse forum eclipsezone Forum: Eclipse communication framework (ECF) Measure: degree centrality Statistics: 225 nodes, 283 edges

Page 33: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-33

TeLLNet

GALA

Conversationalist Pattern Social platform: Eclipse mailing list Forum: Device debugging developer discussion

Page 34: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-34

TeLLNet

GALA

Questioner Pattern Social platform: Eclipse mailing list Forum: Device debugging developer discussion

Page 35: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-35

TeLLNet

GALA

Identification of End-Users and Developers in OSS Communities

Community Clustering

Page 36: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-36

TeLLNet

GALA

Textual Analysis of Postings from Community Experts

Postings from experts of one of the identified communities

Page 37: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-37

TeLLNet

GALA

Computer Science Knowledge Network:the Visualization

Page 38: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-38

TeLLNet

GALA

Computer Science Knowledge Network:Clustering

Page 39: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-39

TeLLNet

GALA

Interdisciplinary Venues:Top Betweenness Centrality

Page 40: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-40

TeLLNet

GALA

High Prestige Series:Top PageRank

Page 41: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-41

TeLLNet

GALA

Data Sets DBLP (http://www.informatik.uni-trier.de/~ley/db/)

- 788,259 author’s names- 1,226,412 publications- 3,490 venues (conferences, workshops, journals)

CiteSeerX (http://citeseerx.ist.psu.edu/)- 7,385,652 publications (including publications in reference lists)- 22,735,240 citations- Over 4 million author’s names

Combination- Canopy clustering [McCallum 2000]- Result: 864,097 matched pairs - On average: venues cite 2306 and

are cited 2037 times

Page 42: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-42

TeLLNet

GALA

WikiWatcher – System Design

article

Article pages,URLS,

Revisions

Tim

Liz

Joe

123.45.67.89

Authors

RDB

Stage 1: SAX-based Parser in PERL

Stage 2: Dynamic Analysis and Visualization

Wiki Network Data

Metadata

[[Article]]

[[requested]]

article

[http://…]

[[Article2]]

Generating XMLdump/export files

Parsing wiki data/database transfer

Measurement

Network Analysis

Generating Networks

Visualization

[[never exists]]

Page 43: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-43

TeLLNet

GALA

Network Heterogeneity Author Networks

– Author nodes (anonymous/registered users)

– Edges represent collaboration between authors during a period t

Article Networks– Article nodes

(incl. wiki namespaces)– Directed edges (links)

between articles As expected both kind of

networks stay heterogenous

Page 44: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-44

TeLLNet

GALA

Importance of Network Actors Articles: High betweenness

centrality controls the flow of information within a Wiki

Betweenness values grow up or stay nearly constant during the evolution process

Determines– Important actors– Important articles– Vandalism

Page 45: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-45

TeLLNet

GALA

Evolution of Shortest Paths Densification Power Law:

Complex networks may become denser during their growth

Generally this could not verified for wiki author networks!

The average distances stagnate at nearly 2 for all considered author networks

Page 46: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-46

TeLLNet

GALA

Evolution of Author Networks Strongly connected components merged by collaboration of

two wiki authors

Author Network of German Wikia in July 2007 Author Network of German Wikia in August 2007

Page 47: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-47

TeLLNet

GALA

Visualization & Analysis

Page 48: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-48

TeLLNet

GALA

What you cannot do with theMediabase (in the moment )

Creating a new Mediabase in a new environment– Maintenance with databases, scripts and interfaces is tedious– Interfaces integrated into Zope/Plone

Not all media are equally supported– Very good support for mailing lists, forums, web sites and blogs– Less support for wikis, podcasts, social bookmarks

Lacking support for– Conceptual navigation interface (Conzilla!)– Discourse management tools– Weak signal analysis tools– Topic & sentiment & opinion mining tools– Automatic generation of recommendations

Page 49: The MediaBase

Lehrstuhl Informatik 5(Informationssysteme)

Prof. Dr. M. JarkeI5-KL-111010-49

TeLLNet

GALA

The Future of the Mediabase: CommunityBase

Self-modeling

Self-reflection

Activity Theory[Enge87]

Actor Network Theory [Lato05]

Community ofPractice [Weng98]

disturbancedisturbance disturbance

+/- -

Self-modeling phase contributes to self-reflection phase and vice versa

+

[PeKl08]

Community experiencerepository