restricted search engine laurent balat christophe decis thomas forey sebastien leclercq essi2...

Post on 03-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Restricted Search Engine

Laurent Balat

Christophe Decis

Thomas Forey

Sebastien Leclercq

ESSI2 Project

Supervisor: Johny BOND

June 2002

Introduction(1)

• What is a search engine?

• 3 types:– disciplinary– global– thematic

• Internet users spend more than 50% of their time to search!

Introduction (2)

• Lots of pages can’t be reached.

WEB

Indexable WEB Google

How does it work ?

• The search engine is composed of two parts

First processing : the WEB site spider

WEB Spider Processing

indexing

PDFunitDOC

unitHTMLprocessing

unit

DATABASE

Constraint

How does it work ?

• User part architecture

DATABASEQuery engine

Query Interface

User

Constraints

• Domain Restriction.

• Search depth.

• Theme: words accepted or not.

• Document type.

• Time delay.

The Spider Part

Check if link already visited

Check type data in constraints

Error download

HTTP HEADlink

linkpriority queue

Stackdata pagePush pageDownload

Document Processing

• Analyse of type• Send to the appropriate unit.• Extract words and links• Trying to resolve bad links

Indexation

• Binary Search Tree:- quick building- efficient use

• Check constraints:- start list and stop list.

Database

• MySQL database.• General Structure:

KeywordsWeb links

Correspondencebetween keywords and links

User interface and query engine

• The web page is generated by a script (cgi).

• The query engine questions the database

• Formatting the results

Demonstration (1)• Fill the Database

Demonstration (2)

• How to search pages?

Conclusion

• Results and perspective– Original search engine.– Easy to improve by adding units to process

differents file format (ps, doc, xls,…).• Team working and repartition. • This Project shows us how to use the

different tools seen this year

References

http://www.w3c.org

http://www.mysql.com

http://www.sgi.com/tech/stl

http://www.searchengineshowdown.com

top related