technology for e-commerce helena ahonen-myka. in this part... n search tools n metadata n...

Technology for E-commerce

Helena Ahonen-Myka

In this part...

search tools metadata personalization collaborative filtering data mining

Search tools

the site has to be accessible site architecture and navigation

structure is important … but some users prefer search keep users on the site usage can be monitored: useful

knowledge about the users’ needs

Users’ preferences

search: 50% navigation: 20% mixed: the rest...

Search tools

Indexer: gathers the words from documents (HTML pages, local files, database records) and puts them into an index file

Search engine: accepts queries, locates the relevant pages in the index, and formats the results in an HTML page

Remote vs local search

search tool can reside in a different server, also in a remote location

indexing may take a lot of processing time, and the resulting index may need a lot of space

local software may be faster

Indexer local: scans directories web spider: an indexing robot begins at

a given page, then follows the links and stores words of the pages

’robots.txt’ file: which robots allowed HTML meta elements:

<meta name=”robots” content=”noindex, follow”><meta name=”robots” content=”index,nofollow”><meta name=”robots” content=”noindex,nofollow”>

Indexer

link structure should reach all the pages that should be indexed

non-text links (imagemaps etc.): robots may not be able to follow links -> provide also text links

frames: provide some navigational links to give a context, if the page is retrieved by a query

Search page

search forms are the user interface of the search engine

simple form: just a text field and a button

or a(n advanced) search page: boolean search, date ranges, subscopes...

Search results

the occurrences of the query terms are located from the index

the results are sorted according to their (assumed) relevance to the query

results page should have the same look-and-feel than the other pages on the site

Why searches fail?

empty searches: people just put the search button without giving any words

wrong scope: people think they are searching the entire web

vocabulary mismatch: terms are too specific, too general, just not used

spelling mistakes query requirements not met

Why searches fail? problems with query syntax: spaces,

parentheses, etc. capitalization and special characters:

exact matches required stopwords: some common words are not

indexed short words: short words are not

indexed numbers are not indexed

No-matches pages

answer pages to the user if the search does not return any matches

should have the same look-and-feel than the other pages + navigation aids + search again field

explanations why the search might have failed and what to do next

Some usability issues web design: strong sense of structure

and navigation support some people do not like to search people who search end up in some

page: they should know where they are people need to move around in the

neighborhood search should be available on every

page

Some usability issues

scoped search: difficult for the users to understand what is the scope -> scope should be stated clearly, and a search to the entire site has to be offered easily

boolean search is difficult: ’cats and dogs’ vs ’cats or dogs’ -> ’or’ could be used in the query, ’and’ in the ordering

Metadata

often a search results in a long list of matches; many of them may be irrelevant

metadata can make the queries more powerful

HTML meta elements

<head profile=”http://www.acme.com/profiles/core”> <title>How to complete memo cover sheets</title> <meta name=”author” content=”John Doe”> <meta name=”copyright” content=”© 2000 Acme”.. <meta name=”keywords” content=”corporate, guidelines, cataloging”> <meta name=”date” content=”2000-10-17”></head>

Metadata

RDF (RDF (RResource esource DDescription escription FFramework):ramework):– Gives means to define metadata for XML and HTML

documents– Give means to interchange it between different applications

on the Web

Example: Dublin Core metadataExample: Dublin Core metadata– Contains 15 elements (title, creator, date…)

Dublin Core

Dublin Core Metadata Elements:Dublin Core Metadata Elements:

Content:Content:

TitleSubjectDescriptionLanguageRelationCoverage

Intellectual Intellectual Property:Property:

CreatorPublisherContributorRights

Instance:Instance:

DateTypeFormatIdentifier

Dublin Core in RDF

<RDF:RDF><RDF:RDF> <RDF:Description RDF:HREF="URI"><RDF:Description RDF:HREF="URI"> <DC:Relation><DC:Relation> <RDF:Description><RDF:Description> <DC:Relation.Type> isPartOf<DC:Relation.Type> isPartOf </DC:Relation.Type></DC:Relation.Type> <RDF:Value RDF:HREF="URI2"/><RDF:Value RDF:HREF="URI2"/> </RDF:Description></RDF:Description> </DC:Relation></DC:Relation> </RDF:Description></RDF:Description></RDF:RDF></RDF:RDF>

Dublin Core represented in RDF

Searching XML documents

structure of XML documents can be used to make more precise queries, e.g. find Albert Einstein in Author element only

problem: how the user specifies the structure

Searching XML documents

1) The user specifies the hierarchy in the query: Einstein in Author

2) The user makes a simple query, but the search engine presents the alternative contexts: Einstein can be in Author or in Street or in School

Using links

good site: many links into the site, particularly from other good sites

text surrounding the link describes (probably) what the target of the link is about

the knowledge above + the contents of the page itself are taken into account

e.g. Google (www.google.com)

Natural language queries

E.g. Ask Jeeves questions and answers prepared by

human editors user’s query is mapped to the prepared

queries

Personalization

goal: the right people receive the right information at the right time

but: people do not like to state complex queries, or initialize a service (like answering a questionaire)

user profiles have to be generated and stored, preferably automatically

User profiles

may contain data like: interests, geographical area, age

could be collected once, and shared with many services

trust of the user: the profile should only be used to offer better service, and only if the user wants to let some service to use it

Recommendations

users who bought this book also bought these books / liked these cd’s etc.

rating movies, tv programs, wines… recommending paths on a site

Recommendations

based on the user’s former behavior and profile data

based on social (collaborative) filtering: what similar users liked

User’s former behavior

if used as the only source: the user never sees anything new

particularly a new user hardly gets any recommendations

Collaborative filtering draws on the experiences of a

population or community of users the profile information of the target user

is compared to the profiles of nearest-neighbor users

look for correlation between users in terms of their ratings: recommend items that are included in the neighbors profile but not in the target user’s profile

Collaborative filtering

Problems: cannot recommend new items (some

users have to rate an item before it can be recommended)

unusual user may not get (good) recommendations: no neighbors that are close enough

Matching engines

Apply one set of complex characteristics to another

e.g., recruiting sites: match a job seeker and a job

Data mining for e-commerce

users’ behavior on the web site provides a lot of information:

Which pages the users view? Which paths the users navigate? How long the users spend on the site? What is the rate of viewing a product

and purchasing it?

Data mining process

Gathering the data Cleaning/preprocessing the data Transforming the data Analysis / finding general models Interpreting the results Using the knowledge

Data collection

clickstream logging: web server logs or packet sniffers

business event logging

Clickstream logging

web log: page requested, time of request, client HTTP address, etc.

lot of requests for images -> have to be filtered out

users and user sessions difficult to identify

requests for a page: the same page, but different dynamic content

Clickstream logging

more efficient at the application server layer

instead of just pages, knowledge on products

user and session tracking possible also track of information absent in web

server logs: pages that were aborted while being downloaded

Business event logging

looking at subsets of requests as one logical event or episode:

add/remove item to/from shopping cart initiate/finish checkout search (log keywords and nr of results) register

From order data to customers

collected data is order-oriented data for each customer is spread into

many records information on customers is the real

target information for each customer has to be

aggregated

From order data to customers

What percentage of each customer’s orders used a VISA credit card?

How much money does each customer spend on books?

What is the frequency of each customer’s purchases?

Model generation Answer questions like: What characterizes heavy spenders? What characterizes customers that prefer

promotion X over Y? What characterizes customers that buy

quickly? What characterizes visitors that do not

buy?

Data mining tools

e.g., classification rules

IF Income > $80,000 AND Age <= 30 AND Average Session Duration is between 10 AND 20 minutesTHEN Heavy spender

Understanding the results

result of a data mining process may be difficult for a business user to understand: e.g. thousands of rules

visualization is important tailored for a specific domain

Using the results

site structure can be updated procedures like registering or checking-

out can be simplified metadata can be added to make search

more efficient personalization rules, recommendating

systems

technology for e-commerce helena ahonen-myka. in this part... n search tools n metadata n...

Documents

search toolsindexer

easilyboolean search

search button

advanced search page

search resultsthe occurrences

search enginesimple

pages navigation aids

navigation structure