technology for e-commerce helena ahonen-myka. in this part... n search tools n metadata n...
TRANSCRIPT
Technology for E-commerce
Helena Ahonen-Myka
In this part...
search tools metadata personalization collaborative filtering data mining
Search tools
the site has to be accessible site architecture and navigation
structure is important … but some users prefer search keep users on the site usage can be monitored: useful
knowledge about the users’ needs
Users’ preferences
search: 50% navigation: 20% mixed: the rest...
Search tools
Indexer: gathers the words from documents (HTML pages, local files, database records) and puts them into an index file
Search engine: accepts queries, locates the relevant pages in the index, and formats the results in an HTML page
Remote vs local search
search tool can reside in a different server, also in a remote location
indexing may take a lot of processing time, and the resulting index may need a lot of space
local software may be faster
Indexer local: scans directories web spider: an indexing robot begins at
a given page, then follows the links and stores words of the pages
’robots.txt’ file: which robots allowed HTML meta elements:
<meta name=”robots” content=”noindex, follow”><meta name=”robots” content=”index,nofollow”><meta name=”robots” content=”noindex,nofollow”>
Indexer
link structure should reach all the pages that should be indexed
non-text links (imagemaps etc.): robots may not be able to follow links -> provide also text links
frames: provide some navigational links to give a context, if the page is retrieved by a query
Search page
search forms are the user interface of the search engine
simple form: just a text field and a button
or a(n advanced) search page: boolean search, date ranges, subscopes...
Search results
the occurrences of the query terms are located from the index
the results are sorted according to their (assumed) relevance to the query
results page should have the same look-and-feel than the other pages on the site
Why searches fail?
empty searches: people just put the search button without giving any words
wrong scope: people think they are searching the entire web
vocabulary mismatch: terms are too specific, too general, just not used
spelling mistakes query requirements not met
Why searches fail? problems with query syntax: spaces,
parentheses, etc. capitalization and special characters:
exact matches required stopwords: some common words are not
indexed short words: short words are not
indexed numbers are not indexed
No-matches pages
answer pages to the user if the search does not return any matches
should have the same look-and-feel than the other pages + navigation aids + search again field
explanations why the search might have failed and what to do next
Some usability issues web design: strong sense of structure
and navigation support some people do not like to search people who search end up in some
page: they should know where they are people need to move around in the
neighborhood search should be available on every
page
Some usability issues
scoped search: difficult for the users to understand what is the scope -> scope should be stated clearly, and a search to the entire site has to be offered easily
boolean search is difficult: ’cats and dogs’ vs ’cats or dogs’ -> ’or’ could be used in the query, ’and’ in the ordering
Metadata
often a search results in a long list of matches; many of them may be irrelevant
metadata can make the queries more powerful
HTML meta elements
<head profile=”http://www.acme.com/profiles/core”> <title>How to complete memo cover sheets</title> <meta name=”author” content=”John Doe”> <meta name=”copyright” content=”© 2000 Acme”.. <meta name=”keywords” content=”corporate, guidelines, cataloging”> <meta name=”date” content=”2000-10-17”></head>
Metadata
RDF (RDF (RResource esource DDescription escription FFramework):ramework):– Gives means to define metadata for XML and HTML
documents– Give means to interchange it between different applications
on the Web
Example: Dublin Core metadataExample: Dublin Core metadata– Contains 15 elements (title, creator, date…)
Dublin Core
Dublin Core Metadata Elements:Dublin Core Metadata Elements:
Content:Content:
TitleSubjectDescriptionLanguageRelationCoverage
Intellectual Intellectual Property:Property:
CreatorPublisherContributorRights
Instance:Instance:
DateTypeFormatIdentifier
Dublin Core in RDF
<RDF:RDF><RDF:RDF> <RDF:Description RDF:HREF="URI"><RDF:Description RDF:HREF="URI"> <DC:Relation><DC:Relation> <RDF:Description><RDF:Description> <DC:Relation.Type> isPartOf<DC:Relation.Type> isPartOf </DC:Relation.Type></DC:Relation.Type> <RDF:Value RDF:HREF="URI2"/><RDF:Value RDF:HREF="URI2"/> </RDF:Description></RDF:Description> </DC:Relation></DC:Relation> </RDF:Description></RDF:Description></RDF:RDF></RDF:RDF>
Dublin Core represented in RDF
Searching XML documents
structure of XML documents can be used to make more precise queries, e.g. find Albert Einstein in Author element only
problem: how the user specifies the structure
Searching XML documents
1) The user specifies the hierarchy in the query: Einstein in Author
2) The user makes a simple query, but the search engine presents the alternative contexts: Einstein can be in Author or in Street or in School
Using links
good site: many links into the site, particularly from other good sites
text surrounding the link describes (probably) what the target of the link is about
the knowledge above + the contents of the page itself are taken into account
e.g. Google (www.google.com)
Natural language queries
E.g. Ask Jeeves questions and answers prepared by
human editors user’s query is mapped to the prepared
queries
Personalization
goal: the right people receive the right information at the right time
but: people do not like to state complex queries, or initialize a service (like answering a questionaire)
user profiles have to be generated and stored, preferably automatically
User profiles
may contain data like: interests, geographical area, age
could be collected once, and shared with many services
trust of the user: the profile should only be used to offer better service, and only if the user wants to let some service to use it
Recommendations
users who bought this book also bought these books / liked these cd’s etc.
rating movies, tv programs, wines… recommending paths on a site
Recommendations
based on the user’s former behavior and profile data
based on social (collaborative) filtering: what similar users liked
User’s former behavior
if used as the only source: the user never sees anything new
particularly a new user hardly gets any recommendations
Collaborative filtering draws on the experiences of a
population or community of users the profile information of the target user
is compared to the profiles of nearest-neighbor users
look for correlation between users in terms of their ratings: recommend items that are included in the neighbors profile but not in the target user’s profile
Collaborative filtering
Problems: cannot recommend new items (some
users have to rate an item before it can be recommended)
unusual user may not get (good) recommendations: no neighbors that are close enough
Matching engines
Apply one set of complex characteristics to another
e.g., recruiting sites: match a job seeker and a job
Data mining for e-commerce
users’ behavior on the web site provides a lot of information:
Which pages the users view? Which paths the users navigate? How long the users spend on the site? What is the rate of viewing a product
and purchasing it?
Data mining process
Gathering the data Cleaning/preprocessing the data Transforming the data Analysis / finding general models Interpreting the results Using the knowledge
Data collection
clickstream logging: web server logs or packet sniffers
business event logging
Clickstream logging
web log: page requested, time of request, client HTTP address, etc.
lot of requests for images -> have to be filtered out
users and user sessions difficult to identify
requests for a page: the same page, but different dynamic content
Clickstream logging
more efficient at the application server layer
instead of just pages, knowledge on products
user and session tracking possible also track of information absent in web
server logs: pages that were aborted while being downloaded
Business event logging
looking at subsets of requests as one logical event or episode:
add/remove item to/from shopping cart initiate/finish checkout search (log keywords and nr of results) register
From order data to customers
collected data is order-oriented data for each customer is spread into
many records information on customers is the real
target information for each customer has to be
aggregated
From order data to customers
What percentage of each customer’s orders used a VISA credit card?
How much money does each customer spend on books?
What is the frequency of each customer’s purchases?
Model generation Answer questions like: What characterizes heavy spenders? What characterizes customers that prefer
promotion X over Y? What characterizes customers that buy
quickly? What characterizes visitors that do not
buy?
Data mining tools
e.g., classification rules
IF Income > $80,000 AND Age <= 30 AND Average Session Duration is between 10 AND 20 minutesTHEN Heavy spender
Understanding the results
result of a data mining process may be difficult for a business user to understand: e.g. thousands of rules
visualization is important tailored for a specific domain
Using the results
site structure can be updated procedures like registering or checking-
out can be simplified metadata can be added to make search
more efficient personalization rules, recommendating
systems