pula 5 giugno 2007
DESCRIPTION
TRANSCRIPT
Complex networks intagging systems
Andrea Capocci
Dipartimento di Informatica e SistemisticaUniversità di Roma ”Sapienza”
Tag networks
www.citeulike.org
Users save scientific publications and tag them with tags (keywords).
Other examples:
Flickr.com (photos)del.icio.us (bookmarks)Connotea.org, BibSonomy (papers)
TAGS
Tagging systems astripartite networks
Tag assignmentA tagging system is a set of tag assignments. A tag assignment is a triplet
(user, resource, tag)
CiteULike550k tag assignments48k distinct tags180k distinct papers6k distinct users
Text analysis of tagging
The stream of tags can be interpreted as a text continuously written by collaborative users.
Zipf laws, preferential attachment and Yule processes in tags streams?
del.icio.us > Cattuto et al.
Sub-linear vocabulary growth
internal time
# of tags
del.icio.us > x0.8
Tag frequency distribution
Preferential attachment
Few tags per resource
Where is semantics?
Such properties can be modeled by Yule-Simon processes with memory (see Cattuto et al.)
But such analysis does not capture the semantics of tags: hierarchical relations etc.
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
Why semantics matters?
Detection of tags categories.
Understanding users' strategies to improve the system, propose new services.
Spam detection.
Tag co-occurrence network
Tags are nodes.
If two tags are assigned to the sameresource, one puts an edge between thetwo tags.
Edges are weighted: each co-assignmentof two tags increases the edge weight byone.
Strength instead of degree.
Distribution of strength
Distribution of strength
?
Nontrivial clustering & spam detection
Clustering coefficient C(k) Average density of triangles around nodes with degree k
Nontrivial clustering & spam detection
Nontrivial clustering & spam detection
k = 502
Looking for a k = 502 page...
SPAM
Nontrivial clustering & spam detection
spamk = 502
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
Co-occurrence networksand semantics
Co-occurrence networks are scale-free ones.
The significance of such statistical property is ambiguous.
Clustering encodes semantics (?)
Clustering can be used to detect spam.
Users' strategies
Do users tag resources according to tag conceptual
hierarchy?
For example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
Semantics and hierarchy
For example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
scale-free networks
Semantics and hierarchy
Semantics and hierarchyFor example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
scale-free networks networks
HIERARCHICAL
For example
”Emergence of scaling in random networks”by A.-L. Barabasi and R. Albert
scale-free networks WWW
NON HIERARCHICAL
Semantics and hierarchy
Model based on hierarchy
Conjectures
1. Tags have an underlying hierarchy.2. With high probability, users add tags hierarchically.
Can we reproduce the co-occurrence network structure based on tag hierarchy?
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
Model based on hierarchy
The underlying hierarchy is a random tree.
At each time step, we add a new resource, with two tags.
New tags are introduced with probability Pnt.
With probability Psb
, the second tag is a ”generalization” of the first tag, otherwise it is chosen randomly.
Results: strength distribution
\\
Results: clustering
Conclusions
Tagging systems display non trivial statistical properties: Zipf laws.
Co-occurrence networks are a way of discovering semantic relationship between tags (?)
Clustering in co-occurrence networks encodes semantics (?) and detects spam.
Simple models based on hierarchy partially explain such properties.
Thank youand thanks to...
Guido Caldarelli
The TAGORA group (Cattuto et al.)