training project report on search engines

28
1 SHIVAM SAXENA B.Tech CSE 1131910024 INDUSTRIAL TRAINING PROJECT REPORT A Project Report Submitted to the Faculty of RAKSHPAL BAHADUR COLLEGE OF ENGG. AND TECH. in Partial Fulfilment of the Requirement for the Degree of of Bachelor Of Technology In Computer Science and Engineering

Upload: shivam-saxena

Post on 22-Nov-2014

95 views

Category:

Engineering


5 download

DESCRIPTION

This is a Summer Training Project Report Prepared by me to be submitted in my College... This report consist of a Tiny WEB SQL Search Engine made by me during training period...

TRANSCRIPT

Page 1: Training Project Report on Search Engines

1

SHIVAM SAXENAB.Tech CSE1131910024

INDUSTRIAL TRAINING PROJECT REPORT

A Project ReportSubmitted tothe Faculty of

RAKSHPAL BAHADUR COLLEGE OF ENGG. AND TECH.in Partial Fulfilment of the

Requirement forthe Degree of

of Bachelor Of TechnologyIn

Computer Science and Engineering

Page 2: Training Project Report on Search Engines

ON“A Customized Web Search Engine Using a Tiny

WebSQL Query Language”UsingPHP

2

INDUSTRIAL TRAINING PROJECT REPORT

SHIVAM SAXENAB.Tech CSE1131910024

Page 3: Training Project Report on Search Engines

ABSTRACT

This project proposes :

1)the Tiny WebSQL search engine;

2)the Tiny WebSQL query as the restrictions for a spider to collect Web information.

3)A Customized Web Spider (The Tiny Spider)

4)Web-Database Connectivity

3

Page 4: Training Project Report on Search Engines

INTRODUCTION

The following list gives the objectives of this project:1. Study how search engines work.2. Examine simple and advanced services from some popular search engines.3. Define an advanced search method by using an SQL-like language, the Tiny4.WebSQL Query language.

4

Page 5: Training Project Report on Search Engines

5

How much information is on the web?

35 GB? 300 GB? 3 TB? more?Mid 1999 estimate: 800 million pagesMid 2000 estimate: 3 billion (מיליארד) pagesMid 2003 estimate: 15 billion pages + “Deep Web”Google now indexes (only?) well over 4 billion

Early 2001 “Deep Web” estimate: 500 billionHow do you even estimate?How can you find what you are looking for?Doesn’t this remind you of going to the

library???

Page 6: Training Project Report on Search Engines

6

How much information is on the web?

Page 7: Training Project Report on Search Engines

7

Search Engines

Page 8: Training Project Report on Search Engines

E-Commerce

Prof. Sheizaf Rafaeli 8

Not all are American or even English, here, eg., are several Hebrew engines

: וואלהhttp://www.walla.co.il : אחלהhttp://www.achla.co.il : תפוזhttp://www.tapuz.co.il : נענעhttp://www.nana.co.il : סבבהhttp://www.sababa.co.il הארץ וIOL נדב הראל וiguide

Page 9: Training Project Report on Search Engines

How far do people look for results?

Page 10: Training Project Report on Search Engines

E-Commerce

Prof. Sheizaf Rafaeli 10

What do Search Engines search?

They do NOT search the Web! That is, they do not search the web

the very moment you ask for something. Rather they search their databases or indexes

Search engines store the contents of millions of websites in an index or DB, and your query is matched up against that

Page 11: Training Project Report on Search Engines

11

What do Search Engines search?

They don’t even catalog the entire contents of the WWW! Nowhere near, in fact... you only get

what they have! For the most part, they don’t have the

contents of the websites they show you, only links to these sites

Page 12: Training Project Report on Search Engines

12

How do they find it?

They use Spiders, webbots and bots Crawlers, worms, and harvesters Wanderers, indexers, and sitesuckers

What are they? Self-directed browsers which go from

link to link, retrieving all or part of the contents of any given site for inclusion in the search engine's database.

Page 13: Training Project Report on Search Engines

Crawling the Web

Mode of crawl: BFS…

Frequency of crawl: importantrobots.txt gives explicit directions on what not to crawl…

Parallel machines crawl all the time..

Page 14: Training Project Report on Search Engines

Architecture of My Tiny Search Engine

The Web

Ad indexes

Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)

Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages

Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages

Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages

Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages

Sponsored Links

CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com

Tiny Web spider

Indexer

Indexes

Search

User

Page 15: Training Project Report on Search Engines

User Interface Of My Tiny Web Spider

15

Page 16: Training Project Report on Search Engines

User Interface Of My Tiny Search Software

16

Page 17: Training Project Report on Search Engines

17

Know the Keywords

Full-Text Indexing An indexing method in which every

word in the web page is put into the database, with the exception of prepositions, conjuctions, and the like.

Controlled-language indexing How directories are implemented

Both of these are done for you by the Search Engine

Page 18: Training Project Report on Search Engines

18

Know the Keywords

Stemming A type of search that uses the common

root of a word to include all possible occurrences of that word

Example:"child*" would yield results that include

childhood, childless, children, etc.

Page 19: Training Project Report on Search Engines

E-Commerce

Prof. Sheizaf Rafaeli 19

Problems with search engines

Coverage

Page 20: Training Project Report on Search Engines

E-Commerce

Prof. Sheizaf Rafaeli 20

Problems with search engines

Invalid

Links

Page 21: Training Project Report on Search Engines

21

Problems with search engines

Page 22: Training Project Report on Search Engines

22

Search Engines Refer Only A Small Percentage Of Traffic To Web Sites Worldwide

                                                                                       

Are Search Engines truly so important?

Page 23: Training Project Report on Search Engines

E-Commerce

Prof. Sheizaf Rafaeli 23

The “Deep Web”

Page 24: Training Project Report on Search Engines

24

The “Deep Web”

500 times larger than surface web95% of it is public and freeContent in deep web 1000+ times

better quality7,500 TerraBytes (TB) of information45,000 search engines in “surface

web”

Page 25: Training Project Report on Search Engines

25

Meta-Search Engines

Use multiple search engines in parallel to provide an answer to a single query

Front-ends to other search engines and their collections and typically do not contain their own databases

Examples Surfwax, Vivisimo, Ask Jeeves,

Metacrawler, The Mining Company

Page 26: Training Project Report on Search Engines

26

The Best Search Engine is…

Whichever one you can actually find things with Sometimes their indexing is a little more

“natural” to you Some people prefer search engines that

use directories (Yahoo! and others) and some prefer simple indexing (AltaVista and others)

Some people prefer the “human touch” (“bibliographies”, “about” The Mining Company).

Page 27: Training Project Report on Search Engines

27

Resources

www.MetaSpy.com

Page 28: Training Project Report on Search Engines

THANK YOU

SHIVAM SAXENAB.Tech CSE1131910024