training project report on search engines
DESCRIPTION
This is a Summer Training Project Report Prepared by me to be submitted in my College... This report consist of a Tiny WEB SQL Search Engine made by me during training period...TRANSCRIPT
1
SHIVAM SAXENAB.Tech CSE1131910024
INDUSTRIAL TRAINING PROJECT REPORT
A Project ReportSubmitted tothe Faculty of
RAKSHPAL BAHADUR COLLEGE OF ENGG. AND TECH.in Partial Fulfilment of the
Requirement forthe Degree of
of Bachelor Of TechnologyIn
Computer Science and Engineering
ON“A Customized Web Search Engine Using a Tiny
WebSQL Query Language”UsingPHP
2
INDUSTRIAL TRAINING PROJECT REPORT
SHIVAM SAXENAB.Tech CSE1131910024
ABSTRACT
This project proposes :
1)the Tiny WebSQL search engine;
2)the Tiny WebSQL query as the restrictions for a spider to collect Web information.
3)A Customized Web Spider (The Tiny Spider)
4)Web-Database Connectivity
3
INTRODUCTION
The following list gives the objectives of this project:1. Study how search engines work.2. Examine simple and advanced services from some popular search engines.3. Define an advanced search method by using an SQL-like language, the Tiny4.WebSQL Query language.
4
5
How much information is on the web?
35 GB? 300 GB? 3 TB? more?Mid 1999 estimate: 800 million pagesMid 2000 estimate: 3 billion (מיליארד) pagesMid 2003 estimate: 15 billion pages + “Deep Web”Google now indexes (only?) well over 4 billion
Early 2001 “Deep Web” estimate: 500 billionHow do you even estimate?How can you find what you are looking for?Doesn’t this remind you of going to the
library???
6
How much information is on the web?
7
Search Engines
E-Commerce
Prof. Sheizaf Rafaeli 8
Not all are American or even English, here, eg., are several Hebrew engines
: וואלהhttp://www.walla.co.il : אחלהhttp://www.achla.co.il : תפוזhttp://www.tapuz.co.il : נענעhttp://www.nana.co.il : סבבהhttp://www.sababa.co.il הארץ וIOL נדב הראל וiguide
How far do people look for results?
E-Commerce
Prof. Sheizaf Rafaeli 10
What do Search Engines search?
They do NOT search the Web! That is, they do not search the web
the very moment you ask for something. Rather they search their databases or indexes
Search engines store the contents of millions of websites in an index or DB, and your query is matched up against that
11
What do Search Engines search?
They don’t even catalog the entire contents of the WWW! Nowhere near, in fact... you only get
what they have! For the most part, they don’t have the
contents of the websites they show you, only links to these sites
12
How do they find it?
They use Spiders, webbots and bots Crawlers, worms, and harvesters Wanderers, indexers, and sitesuckers
What are they? Self-directed browsers which go from
link to link, retrieving all or part of the contents of any given site for inclusion in the search engine's database.
Crawling the Web
Mode of crawl: BFS…
Frequency of crawl: importantrobots.txt gives explicit directions on what not to crawl…
Parallel machines crawl all the time..
Architecture of My Tiny Search Engine
The Web
Ad indexes
Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)
Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages
Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages
Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages
Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages
Sponsored Links
CG Appliance Express Discount Appliances (650) 756-3931 Same Day Certified Installation www.cgappliance.com San Francisco-Oakland-San Jose, CA Miele Vacuum Cleaners Miele Vacuums- Complete Selection Free Shipping! www.vacuums.com Miele Vacuum Cleaners Miele-Free Air shipping! All models. Helpful advice. www.best-vacuum.com
Tiny Web spider
Indexer
Indexes
Search
User
User Interface Of My Tiny Web Spider
15
User Interface Of My Tiny Search Software
16
17
Know the Keywords
Full-Text Indexing An indexing method in which every
word in the web page is put into the database, with the exception of prepositions, conjuctions, and the like.
Controlled-language indexing How directories are implemented
Both of these are done for you by the Search Engine
18
Know the Keywords
Stemming A type of search that uses the common
root of a word to include all possible occurrences of that word
Example:"child*" would yield results that include
childhood, childless, children, etc.
E-Commerce
Prof. Sheizaf Rafaeli 19
Problems with search engines
Coverage
E-Commerce
Prof. Sheizaf Rafaeli 20
Problems with search engines
Invalid
Links
21
Problems with search engines
22
Search Engines Refer Only A Small Percentage Of Traffic To Web Sites Worldwide
Are Search Engines truly so important?
E-Commerce
Prof. Sheizaf Rafaeli 23
The “Deep Web”
24
The “Deep Web”
500 times larger than surface web95% of it is public and freeContent in deep web 1000+ times
better quality7,500 TerraBytes (TB) of information45,000 search engines in “surface
web”
25
Meta-Search Engines
Use multiple search engines in parallel to provide an answer to a single query
Front-ends to other search engines and their collections and typically do not contain their own databases
Examples Surfwax, Vivisimo, Ask Jeeves,
Metacrawler, The Mining Company
26
The Best Search Engine is…
Whichever one you can actually find things with Sometimes their indexing is a little more
“natural” to you Some people prefer search engines that
use directories (Yahoo! and others) and some prefer simple indexing (AltaVista and others)
Some people prefer the “human touch” (“bibliographies”, “about” The Mining Company).
THANK YOU
SHIVAM SAXENAB.Tech CSE1131910024