web content development dr. komlodi classes 20-21: search systems
Post on 02-Jan-2016
216 Views
Preview:
TRANSCRIPT
Web Content Development
Dr. Komlodi
Classes 20-21: Search systems
Web Searching
• Search within your site:– Full site or subsites– www.jhu.edu, www.umbc.edu
• Web search:– Search indexes of web pages– www.google.com,
• Metasearch:– Searching across multiple search engines– clusty.com, www.dogpile.com, www.myriadsearch.com
• Web Search Engine Watch: http://searchenginewatch.com/
Does your site need a search?
• Pp. 145-1481. Sufficient content2. Sufficient resources3. Time and know-how to optimize system4. Better alternatives?5. Will users bother with it?6. Too much information to browse7. Fragmented site8. Learning tool9. User expectations10.Dynamism
• Post bullets on Blackboard discussion board
Why should an IA worry about search?
• You know the users
• Many decisions should be user-centered and not technology-centered
• It has an interface
How does search work?
©2004 Google Source: http://www.google.com/technology/pigeonrank.html
How does search work?
Searchers
Search Interface Queries
Documents
Indexing(manual or automatic)
Indexes
Search Engine
Matching
Results
How does web search work?
Searchers
Search Interface Queries
Documents = Web sites & pages
Indexing =Automatic, spiders & robots crawl
websites and index pages according to their own rules. As a result, they build large databases
containing the indexes.
Indexes
Search Engine
Matching = queries to
search engine indexes
Results
How does search work?
Searchers
Search Interface Queries
Documents
Indexing(manual or automatic)
Indexes
Search Engine
Matching
Results
What to search on…
• All the content?• Determining search zones• Site search:
– Subsite– Type of document
• Web search: – Multimedia and heterogeneous
• Full-text or metadata• Types of indexes
How does search work?
Searchers
Search Interface Queries
Documents
Indexing(manual or automatic)
Indexes
Search Engine
Matching
Results
Indexing by
• Navigation vs. destination pages
• Audience or Reading level
• Topic
• Date of update
• Author
• Title
• User task
What would the index look like?
Full Text Indexing
• Take out frequent words from documents
• List the rest of the words from each document
• May add frequency numbers to each word
• Search the lists of words
What would the index look like?
Indexing Languages
• An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents.
• An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.
Web Search Engine Indexes
• The larger a web search engine’s index is, the more web pages it can return and the more types of queries it can accommodate
• However, quantity is just one measure of performance
• How to compare:
• http://www.google.com/help/indexsize.html
• Try this!
How does search work?
Searchers
Search Interface Queries
Documents
Indexing(manual or automatic)
Indexes
Search Engine
Matching
Results
Search Interface
• Shneiderman, Byrd, Croft, Clarifying Search, DLib, 1997
• Formulation:– Sources– Fields– What to search for– Variants
• Action • Review of results • Refinement • Let’s see Google’s advanced search
Query Format• Boolean:
– Good for advanced users– Precise and clear why you go results back– Need to understand syntax
• “Natural Language”– Good for difficult questions when you can’t think of terms– Or novice users– Difficult to know why certain results come back– Black box
• Relevance Feedback– User selects relevant items from results– Search engine consider these in reformulating query
• Similarity Retrieval– Similar to relevance feedback– “I want more like this”– Both are good if you don’t know what exactly you are looking for
Boolean
Natural Language
Relevance Feedback
Source: http://nayana.ece.ucsb.edu/imsearch/imsearch.html Accessed January 2007.
Relevance Feedback
Relevance Feedback
Similarity Retrieval
Other Query Building Tools
• Citation networks:– This page/paper is citing/linking to?– This page/paper is cited by/linked to?– What other papers/pages cite/link to the same
papers/pages?– http://portal.acm.org/dl.cfm
• Spell checkers in queries• Phonetic tools• Stemming tools• Controlled vocabularies
How does search work?
Searchers
Search Interface Queries
Documents
Indexing(manual or automatic)
Indexes
Search Engine
Matching
Results
Matching
• Boolean – AND, OR, NOT
• Probabilistic
• Vector model (calculate weights of words)
• Natural Language – process the query as well, match lists
How does search work?
Searchers
Search Interface Queries
Documents
Indexing(manual or automatic)
Indexes
Search Engine
Matching
Results
Results Presentation
• How many?
• How much information about each item?
• What can users do with each item?
• Presenting results by categories
Evaluation of Search Engines
Your book is wrong on page 159!!!
Recall:Relevant retrieved documents
All relevant documents in collection
Precision: Relevant retrieved documents
All retrieved documents
Copyright Dr. David Grossman, Source: http://ir.iit.edu/~dagr/cs529/files/handouts/01Introduction-6per.PDF
Within-Site Search Bloopers 1
1. Baffling search controls. Search options require knowledge of computer or industry-insider concepts.
2. Dueling search controls. Competing search boxes on page, with no guidance.
3. Hits look alike. List of found items cannot be easily distinguished by scanning.
4. Duplicate hits. List of found items contains duplicates.
5. Search myopia: Missing relevant items. Items that should be found are not.
http://www.web-bloopers.com/
Within-Site Search Bloopers 2
6. Needle in a haystack: Piles of irrelevant hits. Many items don’t match search criteria.
7. Hits sorted uselessly. Sort-order of found items doesn’t support user tasks.
8. Crazy search behavior. Modifying search criteria yields unexpected results.
9. Search-terms not shown. Not showing what search terms produced these results.
10.Number of hits not revealed. Not showing how many items were found.
http://www.web-bloopers.com/
Search User Interface Design Recommendations 1
• Put a simple, reasonably long search field on every page of the site. (Nielsen: min. 27 characters long)
• Use simple words to explain the process: remove all jargon and technical terms, and make sure that any icons have labels.
• Avoid inventing a new interface, which will confuse users: take the best of the formats of the large public search engines
• Make the search forms and results pages fit into the overall design of the web site: they should use the same colors, fonts and so on.
http://www.searchtools.com/info/user-interface.html
Search User Interface Design Recommendations 2
• Include site names and navigation links into results pages, so users can see the context and structure of the site.
• Set up a special page to be displayed when the search does not find any matches in the index
• Avoid surprises: clarify all automated search features, such as stemming, phonetic matching, thesaurus lookups and stopwords
http://www.searchtools.com/info/user-interface.html
How Search Should WorkPWU Ch5
• Follow the standards of the large search engines:– Search box (min. 27 char-s) and a button in the top right
corner of the page– Search box on every page– Linear results in order of relevance
• Users expect search to be a keyword search and not other types of searches (by types of clothing, size, season, etc.)
• Advanced search should be a secondary option or omitted
• Scope search useful is you site has distinct sections• Do not default the search to a scope
Search Engine Results Pages(PWU Ch5)
• Copy the design of major search engines• List results in relevance order but no need
to show measure of relevance• If appropriate, allow users to re-sort results• Each result should start with a clickable
headline• Follow headline by 2-3-line summary• Include a search box with the user’s query
in it to make query reformulation easier
Design of No-Matches Pages• Site Context and Navigation• Instead of a bare page saying that the search failed, show the standard site
layout, including background colors, logos, text and link colors, and navigation links.
• If you have a site map or Yahoo-style directory for your site, include it in the no-matches page -- otherwise you may want a statement of the site scope. That provides a positive way to help people understand what is available, and browse if they choose.
• Search Again Field• Make sure there is a Search field, so people can try a different search. Don't
make them click a link or otherwise take an extra step to search again.• Suggested Wording• Include some text that explains why the search might have failed, and what
people can do next. This list is carefully worded to be positive and helpful, rather than blaming the user for the search failure. For example: Your search returned no results. Try broadening your search (from heart attack to heart disease) or adding additional terms (from high blood pressure to high blood pressure or hypertension).
http://www.searchtools.com/guide/nomatches.html
Search UI Design Exercise
• Work in pairs
• Select an imaginary website
• Design on paper:– A homepage with a search box– A search results page– A no-hit page
Search Engine Optimization
How do search engines find you?
• Search engine optimization:– Changing your site to improve the site’s ranking in
search results
• Search engine submission: – To submit your site to search engines to make sure
the engines know about it
• Search engine marketing/promotion: the process of submitting (free or paid) and search engine optimization
• http://blog.searchenginewatch.com/090402-110851 (From 2:12)
Search Engine Submission
• Yahoo’s human-compiled directory listings (http://help.yahoo.com/l/us/yahoo/ysm/ds/index.html):– Crawlers look at those pages– Free for normal review– $299 for expedited review and commercial listing (no guarantee of
listing)
• Google:– Free but not guaranteed (http://www.google.com/addurl/?
continue=/addurl)– Or use AdWords for payment (http://www.google.com/ads/)
• Yahoo ads submission:– Yahoo sponsored search
(http://searchmarketing.yahoo.com/arp/sponsoredsearch_ss.php?o=US1806&cmp=SYC&ctv=&s=Y&s2=S&b=25)
– Pay by the number of clicks
Search Engine Optimization
• Linguistic SEO:– Research what words users use for your content:
• Search engine logs, user testing, support calls, discussion forums
– Use those words to describe your content on your pages and in the metadata
• Architectural SEO:– Make sure your important content is text– Make sure your linking structure leads search
engine indexing crawlers to important content• Reputation SEO:
– Make sure other sites link to you
Search Engine Optimization
• Study your guideline• Create a few bullet points to describe your
guideline and post them on the discussion board• Sources:• http://searchenginewatch.com/webmasters/
article.php/2168021
• http://searchenginewatch.com/webmasters/article.php/2167931
top related