google session

29
1 G o ooo g l e atMIT IT Partners,April 2005 Suzana Lisanti,H ubertPham [email protected]

Upload: bianca

Post on 23-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Google Session. About MIT’s Google Search Appliance (GSA) Adding Google search to your web site Customizing search results Tips on improving a site’s rankings Q&A – actually, ask questions anytime!. MIT's Google Configuration. MIT license is for 3M documents - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Google Session

1

Goooogle at MIT

IT Partners, April 2005

Suzana Lisanti, Hubert Pham

[email protected]

Page 2: Google Session

2

Google Session

1. About MIT’s Google Search Appliance (GSA)

2. Adding Google search to your web site

3. Customizing search results

4. Tips on improving a site’s rankings

5. Q&A – actually, ask questions anytime!

Page 3: Google Session

3

MIT's Google Configuration

• MIT license is for 3M documents

• Two collections of 1.5M documents each

• MIT has over 1M web pages on 1,000 web servers

• Google follows links from the MIT Home Page

• web.mit.edu – crawled three times a week

• Other MIT web servers – crawled twice a week

Page 4: Google Session

4

MIT Google does

• Performs twice as well as Inktomi in a “blind test”

• Indexes 220 different file formats

• Provides control over our own crawling schedule

• Allows user customization of search results format

• Indexes certificate-restricted content(not implemented yet)

Page 5: Google Session

5

MIT Google does NOT

• Cache old pages

• Index image files (our decision)

• Index image ALT tags (Google’s decision)

• Allow us to fiddle with the relevancy algorithm

• Tell you “who’s linking to my page” because the GSA does not share that information across collections.

When your pages move, we recommend using a 301 redirect.

Page 6: Google Session

6

MIT Google does NOT index

Java, Perl, Python documentation

Debian, GNU/Linux mirrors

URLs containing these strings:

sipb.mit.edu

dev.mit.edu

net.mit.edu

lees.mit.edu

ops.mit.edu

classics.mit.edu

hypermail

pipermail

Certificate protected pages

No robots sites, no index pages

Dynamically generated pages

containing ‘?’ except by request

URLs containing cgi-bin

URLs containing /afs/

Page 7: Google Session

7

Telling Google not to index

• No robots in server

• No robots in locker/directory

• No robots in html file

• No index, follow

Page 8: Google Session

8

Avg. daily views - January 2005

0

5000

10000

15000

20000

25000

30000

5:00

7:00

9:00

11:0

013

:00

15:0

017

:00

19:0

021

:00

23:0

01:

003:

00

Series1

Total queries Jan 1 - 26: 340,656

Page 9: Google Session

9

Gooogle search forms

Page 10: Google Session

10

Simple search form

Page 11: Google Session

11

Sample search code

1. <form method='get' action='http://gb-server.mit.edu/search'>2. <input type='text' name='q' size='32' maxlength='255' value=''/>3. <input type='submit' name='btnG' value='Search'/>4. <input type='hidden' name='site' value='mit'/>5. <input type='hidden' name='client' value='mit'/>6. <input type='hidden' name='proxystylesheet‘ value='http://web.mit.edu/xsl/google-mit.xsl'/>7. <input type='hidden' name='output' value='xml_no_dtd'/>8. <input type='hidden' name='as_dt' value='i'/>9. <input type='hidden' name='as_sitesearch' value= 'web.mit.edu/newsoffice'/>10.</form>

Doc

Page 12: Google Session

12

Restrict to one directory tree

• name='as_sitesearch' value='<yoururl>'

use web.mit.edu/newsoffice not web/newsoffice

• The slash / matters  

web.mit.edu/newsoffice to include sub-directories

web.mit.edu/newsoffice/ to exclude sub-directories

• as_sitesearch allows allows you to specify one directory (and all its

sub-directories) as the domain to be searched—you cannot specify

multiple disparate directories using this option

• If you want the search feature on your site to search the entire MIT web

site, delete this parameter.

Doc

Page 13: Google Session

13

Restrict to multiple directories or servers

Doc

• Contact [email protected] and we will create a subcollection for you.

• A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library".

Page 14: Google Session

14

Advanced search example

Page 15: Google Session

15

Gooogle Custom Results

You can customize the look and feel of

Google’s search results by providing a stylesheet.

Page 16: Google Session

16

Site-wide MIT template

Page 17: Google Session

17

IS&T custom results

Page 18: Google Session

18

IS&T Search

Page 19: Google Session

19IS&T Custom Results

Page 20: Google Session

20

Customizing results

• You provide the header and footer (HTML) wrapper, and any desired content formatting

• Google provides the raw data (XML)

GoogleResults Data

Your HTMLheader/footer

Page 21: Google Session

21

Results content “title” only

Page 22: Google Session

22

How customization works• The form points to an XSLT stylesheet

• Google returns results to query in XML

• An XSLT document translates the XML into your custom HTML

MIT-GoogleIndex

MIT-GoogleIndex

MIT-GoogleIndex

MIT-GoogleIndex

Search Query

<XML/>

Search Results

<XSLT>

Stylesheet

+HTMLResults

=

Page 23: Google Session

23

Notes

• It is not necessary to customize the results.

– You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet.

• Updates to the Google service may require you to make changes in your stylesheet.

– Subscribe to [email protected]

• WCS will provide fee-based production services for custom search results.

Page 24: Google Session

24

How to customize the results

• Plan how you want the results to look

• Copy the MIT Google XSLT stylesheet

http://web.mit.edu/xsl/google-mit.xsl

• Save it to web readable space, naming it

google-mysite.xsl

Page 25: Google Session

25

Point to your XSL

<form method='get' action='http://gb-server.mit.edu/search'><input type='text' name='q' size='32' maxlength='255' value=''/><input type='submit' name='btnG' value='Search'/><input type='hidden' name='site' value='mit'/><input type='hidden' name='client' value='mit'/><input type='hidden' name='proxystylesheet' value='http://web.mit.edu/my_dept/google-mydept.xsl'/><input type='hidden' name='output' value='xml_no_dtd'/></form>

• Update your search form to point the MIT-Google server to your custom XSLT style sheet.

Page 26: Google Session

26

Step-by-step customization

See

http://web.mit.edu/ist/google/stylesheets.html

Page 27: Google Session

27

Documentation

• http://web.mit.edu/ist/google/

(Includes the “official” Google documentation, including their XML specification; also XSLT tips.)

• Search Engine Submission Tips http://searchenginewatch.com/webmasters/Using SS for an

• Effective SEO Campaignhttp://www.alistapart.com/articles/seo/

Page 28: Google Session

28

Support

• The MIT Google team will support your creating a Google search form and answer queries sent to [email protected]

• WCS offers fee-based production services for custom search results

HTMLResults

Page 29: Google Session

29

Q&A