site-wide search

30
Site-wide Search Upgrade and new features Jon Warbrick University of Cambridge Computing Service [email protected]

Upload: aron

Post on 05-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Upgrade and new features Jon Warbrick University of Cambridge Computing Service [email protected]. Site-wide Search. Site-wide search. web-search.cam.ac.uk. Site-wide search. web-search.cam.ac.uk Ultraseek, from Infoseek. Site-wide search. web-search.cam.ac.uk - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Site-wide Search

Site-wide Search

Upgrade and new features

Jon Warbrick

University of Cambridge Computing Service

[email protected]

Page 2: Site-wide Search

Site-wide search

● web-search.cam.ac.uk

Page 3: Site-wide Search

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek

Page 4: Site-wide Search

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi

Page 5: Site-wide Search

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi -> Verity

Page 6: Site-wide Search

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi -> Verity

-> Autonomy

Page 7: Site-wide Search

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi -> Verity

-> Autonomy

● Currently indexing

– ~600 servers

– ~1.2 million documents

– ~2.5 million URLs

Page 8: Site-wide Search

Site-wide search

● Indexes 'more-or-less official' servers

Page 9: Site-wide Search

Site-wide search

● Indexes 'more-or-less official' servers

● Maintains two indexes

– 'internal' and 'external'

– automatically routes queries

Page 10: Site-wide Search

Site-wide search

● Indexes 'more-or-less official' servers

● Maintains two indexes

– 'internal' and 'external'

– automatically routes queries

● Services for University Webmasters

– Add/delete/re-index

– Packaged searches

Page 11: Site-wide Search

2006 Upgrade

● Improved resilience

Page 12: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

Page 13: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

Page 14: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

Page 15: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

Page 16: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

Page 17: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

● Grouping by location

Page 18: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-insensitive matching

● Quick Links

● Passage-based summaries

● Grouping by location

Page 19: Site-wide Search

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

● Grouping by location

● [ All terms matching ]

Page 20: Site-wide Search

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

Page 21: Site-wide Search

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

Page 22: Site-wide Search

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

Page 23: Site-wide Search

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

● Sources of indexing requests

– s1.web-search.cam.ac.uk -

s6.web-search.cam.ac.uk

– an address in the range 192.153.213.0-255

Page 24: Site-wide Search

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

● Sources of indexing requests

– s1.web-search.cam.ac.uk -

s6.web-search.cam.ac.uk

– an address in the range 192.153.213.0-255

● Backup search engines

– Add URL, Revisit Site, etc.

Page 25: Site-wide Search

Problems with dynamic content

Page 26: Site-wide Search

Problems with dynamic content

● Randomly permuted

query arguments

● Gratuitously-varying

detail

● Variant pages

● Calendars linking to other

pages

● Cache-busting headers

● Frames hiding real URL

● Junk path info

● 'Success' error pages

● Lack of Last Modification

time stamp

● Inconsistent URLs

Page 27: Site-wide Search

Further information

● Notes for webmasters:

http://www.cam.ac.uk/cs/web-search/

● Details of recent changes:

http://www.cam.ac.uk/cs/web-search/changes-200608.html

● Help and advice:

[email protected]

Page 28: Site-wide Search

If you have been, thanks for listening

Page 29: Site-wide Search

I wonder if anyone will ask...

Page 30: Site-wide Search

I wonder if anyone will ask...

“Why don't you use Google?”