web content development dr. komlodi classes 20-21: search systems

Post on 02-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Web Content Development

Dr. Komlodi

Classes 20-21: Search systems

Web Searching

• Search within your site:– Full site or subsites– www.jhu.edu, www.umbc.edu

• Web search:– Search indexes of web pages– www.google.com,

• Metasearch:– Searching across multiple search engines– clusty.com, www.dogpile.com, www.myriadsearch.com

• Web Search Engine Watch: http://searchenginewatch.com/

Does your site need a search?

• Pp. 145-1481. Sufficient content2. Sufficient resources3. Time and know-how to optimize system4. Better alternatives?5. Will users bother with it?6. Too much information to browse7. Fragmented site8. Learning tool9. User expectations10.Dynamism

• Post bullets on Blackboard discussion board

Why should an IA worry about search?

• You know the users

• Many decisions should be user-centered and not technology-centered

• It has an interface

How does search work?

©2004 Google Source: http://www.google.com/technology/pigeonrank.html

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

How does web search work?

Searchers

Search Interface Queries

Documents = Web sites & pages

Indexing =Automatic, spiders & robots crawl

websites and index pages according to their own rules. As a result, they build large databases

containing the indexes.

Indexes

Search Engine

Matching = queries to

search engine indexes

Results

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

What to search on…

• All the content?• Determining search zones• Site search:

– Subsite– Type of document

• Web search: – Multimedia and heterogeneous

• Full-text or metadata• Types of indexes

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Indexing by

• Navigation vs. destination pages

• Audience or Reading level

• Topic

• Date of update

• Author

• Title

• User task

What would the index look like?

Full Text Indexing

• Take out frequent words from documents

• List the rest of the words from each document

• May add frequency numbers to each word

• Search the lists of words

What would the index look like?

Indexing Languages

• An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents.

• An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.

Web Search Engine Indexes

• The larger a web search engine’s index is, the more web pages it can return and the more types of queries it can accommodate

• However, quantity is just one measure of performance

• How to compare:

• http://www.google.com/help/indexsize.html

• Try this!

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Search Interface

• Shneiderman, Byrd, Croft, Clarifying Search, DLib, 1997

• Formulation:– Sources– Fields– What to search for– Variants

• Action • Review of results • Refinement • Let’s see Google’s advanced search

Query Format• Boolean:

– Good for advanced users– Precise and clear why you go results back– Need to understand syntax

• “Natural Language”– Good for difficult questions when you can’t think of terms– Or novice users– Difficult to know why certain results come back– Black box

• Relevance Feedback– User selects relevant items from results– Search engine consider these in reformulating query

• Similarity Retrieval– Similar to relevance feedback– “I want more like this”– Both are good if you don’t know what exactly you are looking for

Boolean

Natural Language

Relevance Feedback

Source: http://nayana.ece.ucsb.edu/imsearch/imsearch.html Accessed January 2007.

Relevance Feedback

Relevance Feedback

Similarity Retrieval

Other Query Building Tools

• Citation networks:– This page/paper is citing/linking to?– This page/paper is cited by/linked to?– What other papers/pages cite/link to the same

papers/pages?– http://portal.acm.org/dl.cfm

• Spell checkers in queries• Phonetic tools• Stemming tools• Controlled vocabularies

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Matching

• Boolean – AND, OR, NOT

• Probabilistic

• Vector model (calculate weights of words)

• Natural Language – process the query as well, match lists

How does search work?

Searchers

Search Interface Queries

Documents

Indexing(manual or automatic)

Indexes

Search Engine

Matching

Results

Results Presentation

• How many?

• How much information about each item?

• What can users do with each item?

• Presenting results by categories

Evaluation of Search Engines

Your book is wrong on page 159!!!

Recall:Relevant retrieved documents

All relevant documents in collection

Precision: Relevant retrieved documents

All retrieved documents

Copyright Dr. David Grossman, Source: http://ir.iit.edu/~dagr/cs529/files/handouts/01Introduction-6per.PDF

Within-Site Search Bloopers 1

1. Baffling search controls. Search options require knowledge of computer or industry-insider concepts.

2. Dueling search controls. Competing search boxes on page, with no guidance.

3. Hits look alike. List of found items cannot be easily distinguished by scanning.

4. Duplicate hits. List of found items contains duplicates.

5. Search myopia: Missing relevant items. Items that should be found are not.

http://www.web-bloopers.com/

Within-Site Search Bloopers 2

6. Needle in a haystack: Piles of irrelevant hits. Many items don’t match search criteria.

7. Hits sorted uselessly. Sort-order of found items doesn’t support user tasks.

8. Crazy search behavior. Modifying search criteria yields unexpected results.

9. Search-terms not shown. Not showing what search terms produced these results.

10.Number of hits not revealed. Not showing how many items were found.

http://www.web-bloopers.com/

Search User Interface Design Recommendations 1

• Put a simple, reasonably long search field on every page of the site. (Nielsen: min. 27 characters long)

• Use simple words to explain the process: remove all jargon and technical terms, and make sure that any icons have labels.

• Avoid inventing a new interface, which will confuse users: take the best of the formats of the large public search engines

• Make the search forms and results pages fit into the overall design of the web site: they should use the same colors, fonts and so on.

http://www.searchtools.com/info/user-interface.html

Search User Interface Design Recommendations 2

• Include site names and navigation links into results pages, so users can see the context and structure of the site.

• Set up a special page to be displayed when the search does not find any matches in the index

• Avoid surprises: clarify all automated search features, such as stemming, phonetic matching, thesaurus lookups and stopwords

http://www.searchtools.com/info/user-interface.html

How Search Should WorkPWU Ch5

• Follow the standards of the large search engines:– Search box (min. 27 char-s) and a button in the top right

corner of the page– Search box on every page– Linear results in order of relevance

• Users expect search to be a keyword search and not other types of searches (by types of clothing, size, season, etc.)

• Advanced search should be a secondary option or omitted

• Scope search useful is you site has distinct sections• Do not default the search to a scope

Search Engine Results Pages(PWU Ch5)

• Copy the design of major search engines• List results in relevance order but no need

to show measure of relevance• If appropriate, allow users to re-sort results• Each result should start with a clickable

headline• Follow headline by 2-3-line summary• Include a search box with the user’s query

in it to make query reformulation easier

Design of No-Matches Pages• Site Context and Navigation• Instead of a bare page saying that the search failed, show the standard site

layout, including background colors, logos, text and link colors, and navigation links.

• If you have a site map or Yahoo-style directory for your site, include it in the no-matches page -- otherwise you may want a statement of the site scope. That provides a positive way to help people understand what is available, and browse if they choose.

• Search Again Field• Make sure there is a Search field, so people can try a different search. Don't

make them click a link or otherwise take an extra step to search again.• Suggested Wording• Include some text that explains why the search might have failed, and what

people can do next. This list is carefully worded to be positive and helpful, rather than blaming the user for the search failure. For example: Your search returned no results. Try broadening your search (from heart attack to heart disease) or adding additional terms (from high blood pressure to high blood pressure or hypertension).

http://www.searchtools.com/guide/nomatches.html

Search UI Design Exercise

• Work in pairs

• Select an imaginary website

• Design on paper:– A homepage with a search box– A search results page– A no-hit page

Search Engine Optimization

How do search engines find you?

• Search engine optimization:– Changing your site to improve the site’s ranking in

search results

• Search engine submission: – To submit your site to search engines to make sure

the engines know about it

• Search engine marketing/promotion: the process of submitting (free or paid) and search engine optimization

• http://blog.searchenginewatch.com/090402-110851 (From 2:12)

Search Engine Submission

• Yahoo’s human-compiled directory listings (http://help.yahoo.com/l/us/yahoo/ysm/ds/index.html):– Crawlers look at those pages– Free for normal review– $299 for expedited review and commercial listing (no guarantee of

listing)

• Google:– Free but not guaranteed (http://www.google.com/addurl/?

continue=/addurl)– Or use AdWords for payment (http://www.google.com/ads/)

• Yahoo ads submission:– Yahoo sponsored search

(http://searchmarketing.yahoo.com/arp/sponsoredsearch_ss.php?o=US1806&cmp=SYC&ctv=&s=Y&s2=S&b=25)

– Pay by the number of clicks

Search Engine Optimization

• Linguistic SEO:– Research what words users use for your content:

• Search engine logs, user testing, support calls, discussion forums

– Use those words to describe your content on your pages and in the metadata

• Architectural SEO:– Make sure your important content is text– Make sure your linking structure leads search

engine indexing crawlers to important content• Reputation SEO:

– Make sure other sites link to you

Search Engine Optimization

• Study your guideline• Create a few bullet points to describe your

guideline and post them on the discussion board• Sources:• http://searchenginewatch.com/webmasters/

article.php/2168021

• http://searchenginewatch.com/webmasters/article.php/2167931

top related