pula 5 giugno 2007

Complex networks intagging systems

Andrea Capocci

Dipartimento di Informatica e SistemisticaUniversità di Roma ”Sapienza”

Tag networks

www.citeulike.org

Users save scientific publications and tag them with tags (keywords).

Other examples:

Flickr.com (photos)del.icio.us (bookmarks)Connotea.org, BibSonomy (papers)

Tagging systems astripartite networks

Tag assignmentA tagging system is a set of tag assignments. A tag assignment is a triplet

(user, resource, tag)

CiteULike550k tag assignments48k distinct tags180k distinct papers6k distinct users

Text analysis of tagging

The stream of tags can be interpreted as a text continuously written by collaborative users.

Zipf laws, preferential attachment and Yule processes in tags streams?

del.icio.us > Cattuto et al.

Sub-linear vocabulary growth

internal time

# of tags

del.icio.us > x0.8

Tag frequency distribution

Preferential attachment

Few tags per resource

Where is semantics?

Such properties can be modeled by Yule-Simon processes with memory (see Cattuto et al.)

But such analysis does not capture the semantics of tags: hierarchical relations etc.

Why semantics matters?

Detection of tags categories.

Understanding users' strategies to improve the system, propose new services.

Spam detection.

Tag co-occurrence network

Tags are nodes.

If two tags are assigned to the sameresource, one puts an edge between thetwo tags.

Edges are weighted: each co-assignmentof two tags increases the edge weight byone.

Strength instead of degree.

Distribution of strength

Nontrivial clustering & spam detection

Clustering coefficient C(k) Average density of triangles around nodes with degree k

k = 502

Looking for a k = 502 page...

spamk = 502

Co-occurrence networksand semantics

Co-occurrence networks are scale-free ones.

The significance of such statistical property is ambiguous.

Clustering encodes semantics (?)

Clustering can be used to detect spam.

Users' strategies

Do users tag resources according to tag conceptual

hierarchy?

For example

”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert

Semantics and hierarchy

For example

scale-free networks

Semantics and hierarchyFor example

scale-free networks networks

HIERARCHICAL

For example

scale-free networks WWW

NON HIERARCHICAL

Model based on hierarchy

Conjectures

1. Tags have an underlying hierarchy.2. With high probability, users add tags hierarchically.

Can we reproduce the co-occurrence network structure based on tag hierarchy?

The underlying hierarchy is a random tree.

At each time step, we add a new resource, with two tags.

New tags are introduced with probability Pnt.

With probability Psb

, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.

Results: strength distribution

Results: clustering

Conclusions

Tagging systems display non trivial statistical properties: Zipf laws.

Co-occurrence networks are a way of discovering semantic relationship between tags (?)

Clustering in co-occurrence networks encodes semantics (?) and detects spam.

Simple models based on hierarchy partially explain such properties.

Thank youand thanks to...

Guido Caldarelli

The TAGORA group (Cattuto et al.)

pula 5 giugno 2007

Business

tag networks

tag hierarchy

new tags

random networks

detection of tags categories

distinct tags

semantics matters

complex networks