self-organized web usage regularities. problems of foraging information on www slow accession...

18
Self-Organized Web Usage Regularities

Upload: camron-willis

Post on 19-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Self-Organized Web Usage Regularities

Page 2: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Problems of foraging information on WWW

Slow accessionDifficulty in finding useful information is related to balkanization of Web structureDifficulty to solve this fragmentation problem by designing an effective classification scheme.Solution:

To seek Web regularities in user behavior.

Page 3: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Issues

How to characterize the strong regularities in Web surfing in terms of user navigation strategies.How to present an information foraging agent-based approach to describing user behavior.Issues on web self-organization.

Page 4: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Related research worksWeb mining for pattern-oriented adaptation: to identify the inter-relationships among different websites, either based on the analysis of the contents in Web pages or based on the discovery of the access patterns from Web log files.Web data mining: computing association rules, detecting sequential patterns, and discovering classification rules and data clusters.Web usage mining: analysis of Web usage patterns, such as user access statistical properties, association rules and sequential patterns in user sessions, user classification and web page clusters based on user behavior.

Page 5: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Information Foraging Agent Model (IFAM)

Objectives: to find the inter-relationship between the statistical observations on Web navigation regularities and foraging behavior patterns of individual agents.

Page 6: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Artificial Web Space

Artificial Web Space: a collection of websites connected by hyperlinks.

D(ci,cj)=(Σ M

k=1(cwik – cwj

k)2)1/2

Where D(ci,cj) denotes the Euclidean distance between the content vectors of nodes i and j.

Page 7: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Artificial Web Space

Content Distribution Models T + |Xc|, if i=j

cwni =

|Xc|, otherwise

fxc ~ normal (0,)

T ~ normal(t, t)

Where fxc: probability distribution of weight xc

normal (0,): normal distribution with mean 0

and variance .

T: content (increment) offset on a topic

t: mean of normally distributed offset T

t: variance of normally distributed offset T

Page 8: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Artificial Web Space

Power-law distribution: T + |Xc|, if i=j

cwni =

|Xc|, otherwise

fxc ~ (Xc + 1) – (

+1) , Xc > 0, > 0

Where fxc: probability distribution of weight Xc

: shape parameter of a power-law distribution

T: content (increment) offset on a topic

Page 9: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Artificial Web SpaceConstructing an Artificial Web1. For each topic i2. Create node content vectors

End3. For each node i4. Initialize the link list of node i5. For each node j6. If D(ci,cj) < r7. Add node j to the link list of node i8. Add D(ci,cj) to the link list of node i

end end

end

Page 10: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging AgentsInterest Profilespm = [ pwm

1, pwm2…pwm

i…pwmM]

pwmi

Pmi =

m

j=1pwmi

Hm = - m

j=1pmi log(pmi)

Where pm: preference vector of agent m

pwmi : weight of preference on topic i

Hm: interest entropy of user m

Page 11: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging AgentsInterest Distribution Models1. Normal distribution: pwm

i = X p

fxp ~ normal(0,u)

Where normal(0,u) denotes the normal distribution with mean 0 and variance u

2. Power-law distribution:

pwmi = X p

fxp ~ u(Xp+1) -u + 1, Xp > 0, u > 0

Where u denotes the shape parameter of a power-law distribution

Page 12: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging in an Artificial Web SpaceRandom agents: have no strong interests in any specific topics.Rational agents: have specific interested topics in mind and they forage in order to locate the pages that contain information on those topics.Recurrent agents: Recurrent agents are those who are familiar with the Web structure and know the whereabouts of interesting contents.

Page 13: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging in an Artificial Web Space

Agent Preference Updating: depending on how much information on interesting topics the agent has found and how much the agent has absorbed such information.Pm() = Pm( - 1) - cn

pwmi = 0, for pwm

i () < 0, i = 1…MWhere denotes an absorbing factor in [0,1] that implies how much information is accepted by agents on average. Pm() and Pm( - 1) denote an agent’s preference vector after and before accessing information in page n, respectively.

Page 14: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging in an Artificial Web SpaceMotivation Functions

flog(mtv) ~ normal(m, m)

Where m and m denote the mean and variance of the log-normal distribution of mt

v, respectively

mtv = mem step

Where m and m denote the coefficient and rate of an exponential function. Step denotes the number of pages/notes that an agent has continuously visited.

Page 15: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging in an Artificial Web Space

Rewarding Function

Rt = M i=1(pwm

i( -1) - pwmi()

Page 16: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

IFAM – Foraging in an Artificial Web Space

Foraging1. Initialize the nodes and links in an artificial Web space

2. Initialize information foraging agents and their interest profiles3. For each agent m

4. While the support for the agent S < max_supportm and S > min_supportm

5. Find the hyperlinks inside node n that the agent is presently in

6. Select, based on pk, the hyperlink that connects to the next-level page7. Forage to the selected page8. Update the preference weights in the agent’s interest profile9. Update the support function of the agent

End10. If the support for the agent S > max_supportm

11. Agent m is satisfied with the contents and leaves the Web space Else

12. Agent m is dissatisfied and leaves the Web space EndEnd

Page 17: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Result

The experiment shows that by applying a weighted linear-regression method, the higher the occurrence rate of a depth or a link-click-frequency is, the higher the weight will be.

Page 18: Self-Organized Web Usage Regularities. Problems of foraging information on WWW Slow accession Difficulty in finding useful information is related to balkanization

Self-Organized Agent

To support adaptive organizations between agent, adding modeling technique allows agents to model their interactions with the environment and to recognize and manipulate new environmental scenarios to achieve organizational goals.