self-organized web usage regularities. problems of foraging information on www slow accession...
TRANSCRIPT
Self-Organized Web Usage Regularities
Problems of foraging information on WWW
Slow accessionDifficulty in finding useful information is related to balkanization of Web structureDifficulty to solve this fragmentation problem by designing an effective classification scheme.Solution:
To seek Web regularities in user behavior.
Issues
How to characterize the strong regularities in Web surfing in terms of user navigation strategies.How to present an information foraging agent-based approach to describing user behavior.Issues on web self-organization.
Related research worksWeb mining for pattern-oriented adaptation: to identify the inter-relationships among different websites, either based on the analysis of the contents in Web pages or based on the discovery of the access patterns from Web log files.Web data mining: computing association rules, detecting sequential patterns, and discovering classification rules and data clusters.Web usage mining: analysis of Web usage patterns, such as user access statistical properties, association rules and sequential patterns in user sessions, user classification and web page clusters based on user behavior.
Information Foraging Agent Model (IFAM)
Objectives: to find the inter-relationship between the statistical observations on Web navigation regularities and foraging behavior patterns of individual agents.
IFAM – Artificial Web Space
Artificial Web Space: a collection of websites connected by hyperlinks.
D(ci,cj)=(Σ M
k=1(cwik – cwj
k)2)1/2
Where D(ci,cj) denotes the Euclidean distance between the content vectors of nodes i and j.
IFAM – Artificial Web Space
Content Distribution Models T + |Xc|, if i=j
cwni =
|Xc|, otherwise
fxc ~ normal (0,)
T ~ normal(t, t)
Where fxc: probability distribution of weight xc
normal (0,): normal distribution with mean 0
and variance .
T: content (increment) offset on a topic
t: mean of normally distributed offset T
t: variance of normally distributed offset T
IFAM – Artificial Web Space
Power-law distribution: T + |Xc|, if i=j
cwni =
|Xc|, otherwise
fxc ~ (Xc + 1) – (
+1) , Xc > 0, > 0
Where fxc: probability distribution of weight Xc
: shape parameter of a power-law distribution
T: content (increment) offset on a topic
IFAM – Artificial Web SpaceConstructing an Artificial Web1. For each topic i2. Create node content vectors
End3. For each node i4. Initialize the link list of node i5. For each node j6. If D(ci,cj) < r7. Add node j to the link list of node i8. Add D(ci,cj) to the link list of node i
end end
end
IFAM – Foraging AgentsInterest Profilespm = [ pwm
1, pwm2…pwm
i…pwmM]
pwmi
Pmi =
m
j=1pwmi
Hm = - m
j=1pmi log(pmi)
Where pm: preference vector of agent m
pwmi : weight of preference on topic i
Hm: interest entropy of user m
IFAM – Foraging AgentsInterest Distribution Models1. Normal distribution: pwm
i = X p
fxp ~ normal(0,u)
Where normal(0,u) denotes the normal distribution with mean 0 and variance u
2. Power-law distribution:
pwmi = X p
fxp ~ u(Xp+1) -u + 1, Xp > 0, u > 0
Where u denotes the shape parameter of a power-law distribution
IFAM – Foraging in an Artificial Web SpaceRandom agents: have no strong interests in any specific topics.Rational agents: have specific interested topics in mind and they forage in order to locate the pages that contain information on those topics.Recurrent agents: Recurrent agents are those who are familiar with the Web structure and know the whereabouts of interesting contents.
IFAM – Foraging in an Artificial Web Space
Agent Preference Updating: depending on how much information on interesting topics the agent has found and how much the agent has absorbed such information.Pm() = Pm( - 1) - cn
pwmi = 0, for pwm
i () < 0, i = 1…MWhere denotes an absorbing factor in [0,1] that implies how much information is accepted by agents on average. Pm() and Pm( - 1) denote an agent’s preference vector after and before accessing information in page n, respectively.
IFAM – Foraging in an Artificial Web SpaceMotivation Functions
flog(mtv) ~ normal(m, m)
Where m and m denote the mean and variance of the log-normal distribution of mt
v, respectively
mtv = mem step
Where m and m denote the coefficient and rate of an exponential function. Step denotes the number of pages/notes that an agent has continuously visited.
IFAM – Foraging in an Artificial Web Space
Rewarding Function
Rt = M i=1(pwm
i( -1) - pwmi()
IFAM – Foraging in an Artificial Web Space
Foraging1. Initialize the nodes and links in an artificial Web space
2. Initialize information foraging agents and their interest profiles3. For each agent m
4. While the support for the agent S < max_supportm and S > min_supportm
5. Find the hyperlinks inside node n that the agent is presently in
6. Select, based on pk, the hyperlink that connects to the next-level page7. Forage to the selected page8. Update the preference weights in the agent’s interest profile9. Update the support function of the agent
End10. If the support for the agent S > max_supportm
11. Agent m is satisfied with the contents and leaves the Web space Else
12. Agent m is dissatisfied and leaves the Web space EndEnd
Result
The experiment shows that by applying a weighted linear-regression method, the higher the occurrence rate of a depth or a link-click-frequency is, the higher the weight will be.
Self-Organized Agent
To support adaptive organizations between agent, adding modeling technique allows agents to model their interactions with the environment and to recognize and manipulate new environmental scenarios to achieve organizational goals.