search engine

22
(Punjab Collage of Technical Education) SEARCH ENGINE AND META SEARCH ENGINE

Upload: alisha-korpal

Post on 09-May-2015

5.429 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Search engine

(Punjab Collage of Technical Education)

SEARCH ENGINEAND

META SEARCH ENGINE

SUBMITTED TO: SUBMITTED BY:MISS. SHRUTI JAIN KIRANDEEP KAUR NANCY JAIN SHEENA

Page 2: Search engine

Search engineA web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results and are often called hits. The information may consist of web pages, images, information and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input.

SCOPE Search Engine Optimization (SEO) has acquired a great position today. To get higher visibility of a website on Google, DMOZ, Yahoo, AltaVista, Dogpile and other search engines, it is necessary to implement better SEO techniques. Google acquired the best search engine currently; hence most effective SEO efforts and techniques should be done to achieve high Page Rank on google.com.In today’s online web word age, the role of Search Engine Optimization (SEO) is becoming increasingly noteworthy across the world especially in USA, UK, Europe, Australia and France. Search Engines are rapidly becoming basic way to get most accurate results of searches on Internet. Research has shown that, about 80% of internet traffic is generated through search engines. Approximately 75% of the users staying only on the 1st page of the search results and only about 20% of the users go ahead to the 2nd page of the search result.After achieving great returns in offshore outsourcing web development business in India, Internet Marketing business scope is growing very well. In start of 2000 decade SEO & SEM scope and future was not bright, very few employment creating and very few people know about Search Engine Marketing (SEM) in India. Now a day there is a great scope & future of SEO / SEM in India and too much creating SEO expert’s job in India. Now Search Engine Marketing become a major department of all sector of business because now every one want to promote own business in all major search engine and internet areas. A lot of SEO, SEM and Internet Marketing jobs opportunity creating in India and even round the globe.

Search Engine Optimization has great scope & future in India and all SEO Companies in India have a dedicated team of experienced and professional Search engine optimization experts, Link Building Experts, Website Promotion, and Search Engine friendly web developers that help you to gain maximum outcome for your interactive internet marketing campaign. Each search engine optimization experts has huge experience in the SEO industry, and well operational to switch any kind of interactive marketing projects.As search engine optimization is a strategy for improving a company's revenues, some companies outsource these operations. There are quite a few professional SEO outfits, which are savvy to the continually changing trends. They also know the golden rule of Google: What has worked in the past will not necessarily deliver as well in the future. Finally, they can devote their entire time to enhance your SEO initiatives.

Page 3: Search engine

Types of search

Although the text-box and search button is fairly common-place, the type of search—often described in terms of the scope of content the search engine has indexed—is not always evident.

Internal searchAn internal search can only be used to find content on a single website (or intranet or extranet). For example the Motive search, at the top-right of each page, can only be used to find pages on the Motive website.

External or public search

A public search can be used to find content on any website, anywhere on the web. For example Google (also see details below on search engine registration).

Meta search engine Meta search engine uses the indexes of other search engines to find content, anywhere on the web. For example Dog pile.Search engine registration

Search engine registration

In addition to a webpage address, a search engine may also require basic information about your site, such as a short description of the website, topics covered, and owner.

Most public search engines have an ‘Add URL’, ‘Submit URL’ or ‘Suggest a site’ link that links to information on how to register a website. This link is typically found in the list of links at the bottom of the search engine homepage.

Once the website has been registered, the search engine will access the website using an indexing program (spider). The indexing program follows all the links on the submitted webpage to other WebPages under the same domain. It then follows the links it finds on those WebPages, ‘crawling’ the entire website, to build a index of all the website content.To add a website to its search index, a search engine must first be told where to ‘find it’. Notifying a search engine of a new website is referred to as search engine registration.

The registration process involves submitting an entry-level webpage address (URL) to a search engine. This entry-level page is typically the address of the homepage or sitemap.

ADD URL PAGESQuick links to the website registration pages for the top search engines and directories.

Page 4: Search engine

GoogleYahoo! (requires a Yahoo! account/registration)Bing (formerly MSN Live)Open Directory ProjectsIn addition to a webpage address, a search engine may also require basic information about your site, such as a short description of the website, topics covered, and owner.

Most public search engines have an ‘Add URL’, ‘Submit URL’ or ‘Suggest a site’ link that links to information on how to register a website. This link is typically found in the list of links at the bottom of the search engine homepage.

Once the website has been registered, the search engine will access the website using an indexing program (spider). The indexing program follows all the links on the submitted webpage to other WebPages under the same domain. It then follows the links it finds on those WebPages, ‘crawling’ the entire website, to build a index of all the website content.

Search engine resultsA search engine results page (SERP) lists WebPages in order of their relevance to the query entered. The webpage listed at the top of the results page has been selected by the search engine as the most likely to provide the content the user is seeking.

Each search result listing usually features the destination webpage meta title (as the link text), followed by a description and/or an excerpt showing the query highlighted in the context of the webpage content (concordance).

Search engine ranking algorithmsEach search engine has its own method for calculating relevance, usually based on an analysis of the content of the destination webpage, including:

Meta title (visible at the top of the web browser window);Metadata: number of incoming links, (commonly referred to as the page’s ‘popularity’). Popularity-based ranking assumes that the more incoming links a webpage has, the more likely it is to be a subject ‘authority’;Incoming link text: a search engine may make assumptions about the content of a website based on how other people have described it through the text they have used to link to a site;Use: of appropriate semantic markup, for example, use of heading elements; andPage text.Each of these aspects of the webpage is scored and then weighted. For example, a search engine may assign a greater weighting to meta title text than other aspects of the webpage. In this case, a webpage that includes the query in its meta title text may then be ranked higher than a webpage where the meta title does not include the query.

The calculation each search engine uses to rank webpage relevance is often a closely-guarded secret.

Page 5: Search engine

The scores for each aspect of the webpage are combined to determine the overall relevance of the webpage.

The calculation (algorithm) each search engine uses to rank webpage relevance is often a closely-guarded (and patented) secret. This is both to prevent websites from artificially inflating their rankings; and also because the quality of the search results translates directly into user-loyalty, traffic and revenue generating opportunities.

How web search engines workHigh-level architecture of a standard Web crawlerA search engine operates in the following order:Web crawlingIndexingSearchingWeb search engines work by storing information about many web pages, which they retrieve from the html itself. These pages are retrieved by a Web crawler (sometimes also known as a spider) — an automated Web browser which follows every link on the site. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called Meta tags). Data about web pages are stored in an index database for use in later queries. A query can be a single word. The purpose of an index is to allow information to be found as quickly as possible. Some search engines, such as Google, store all or part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as AltaVista, store every word of every page they find. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google's handling of it increases usability by satisfying user expectations that the search terms will be on the returned webpage. This satisfies the principle of least astonishment since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. The index is built from the information stored with the data and the method by which the information is indexed. Unfortunately, there are currently no known public search engines that allow documents to be searched by date. Most search engines support the use of the Boolean operators AND, OR and NOT to further specify the search query. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called proximity search which allows

Page 6: Search engine

users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. As well, natural language queries allow the user to type a question in the same form one would ask it to a human. A site like this would be ask.com.The usefulness of a search engine depends on the relevance of the result set it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve. There are two main types of search engine that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have programmed extensively. The other is a system that generates an "inverted index" by analyzing texts it locates. This second form relies much more heavily on the computer itself to do the bulk of the work.

Different techniques of Searching on Google

Google is one of the best and top Search Engine and if you use Google only to search for words and phrases, you’re doing it wrong. There are so many things you can do with Google Search. The service is loaded with many advanced tricks that you can enable from that unassuming search box.

Find the current time elsewhere: Don’t bother trying to convert the time from your local setting to a distant city. Just type time city , as in time Delhi, to see the current time in that location.

Search for a file type: You can look up results that match a specific file type. This trick is great for special searches, such as tracking down a product manual or video file. Try search term filetype: three-letter type.For example, I entered Zoom H2 manual filetype: pdf to find the manual for that Zoom recording device.

Weather as reported by Google Search Get the weather: To see the weather for many U.S. and worldwide cities, type “weather” followed by the city and state, U.S. zip code, or city and country. For example: weather Delhi

Calculate and convert: To use Google’s built-in calculator function, simply enter the calculation you’d like done into the search box. Try typing math problems, such as 89*22/(16), or conversions, like 100 yards = ? Meters. Google will do the rest.

Page 7: Search engine

Track stocks: To see current market data for a given company or fund, type the ticker symbol into the search box. On the results page, you can click the link to see more data from Google Finance. You can enter a stock’s trading abbreviation, such as GOOG, and the first result will show the stock’s latest price, a graph of the day, and other financial details.

Get movie times: On the Web you have a myriad of choices to look up show times, but Google’s simplicity is tough to beat .To find reviews and show times for the movies playing near you, type "movies" or the name of a current film into the Google search box. If you've already saved your location on a previous search, the top search result will display show times for nearby theaters for the movie you've chosen. Click the More movies link to get more-specific listings.

Track packages: Have a FedEx, UPS, or USPS tracking number? Just enter it in the Google search box for the latest package status.

Sports Scores: To see scores and schedules for sports teams type the team name or league name into the search box. This is enabled for many leagues including the National Basketball Association, National Football League, National Hockey League, and Major League Baseball.

Music: want the details of a song? use music: song name for Music specific search on Google

Area Code Lookup: type in the US area code into Google to find out where the area code is.

Format specific search: sometimes finding what you want in Google can be difficult, but Google offers a range of format specific search sites. Google News, Blog Search, even Video are a few Google sites you can use to find what you’re looking for.

Phrase Search: I use this trick regularly. If you’re looking for the exact phrase, not the words entered, do your search like this “I did but see her passing by”

Wildcard: old DOS users will remember doing directory searches using an asterisk (*) as a wildcard, and Google supports wildcard entries as well. Example: blogging *.com.au

Not: adding a minus (-) allows you to narrow your search, for example if you wanted to search for New York but not City you’d enter New York -City

Either/or. Google looks for the combination of terms you type in, but you can tell it to look for multiple words, for example Olympic or Gold. The short cut is | so Olympic | Gold works as well

Book Search: If you’re looking for results from Google Book Search, you can enter the name of the author or book title into the search box and we’ll return any book content we

Page 8: Search engine

have as part of your normal web results. You can click through on the record to view more detailed info about that author or title.

Earthquakes: To see information about recent earthquakes in a specific area type “earthquake” followed by the city and state or U.S. zip code. For recent earthquake activity around the world simply type “earthquake” in the search box.

Unit Conversion: You can use Google to convert between many different units of measurement of height, weight, and volume among many others. Just enter your desired conversion into the search box and we’ll do the rest.

Synonym Search: If you want to search not only for your search term but also for its synonyms, place the tilde sign (~) immediately in front of your search term.

Dictionary Definitions: To see a definition for a word or phrase, simply type the word “define” then a space, then the word(s) you want defined. To see a list of different definitions from various online sources, you can type “define:” followed by a word or phrase. Note that the results will define the entire phrase.

Spell Checker: Google’s spell checking software automatically checks whether your query uses the most common spelling of a given word. If it thinks you’re likely to generate better results with an alternative spelling, it will ask “Did you mean: (more common spelling)?”. Click the suggested spelling to launch a Google search for that term.

Airline Travel Info: To see flight status for arriving and departing U.S. flights, type in the name of the airline and the flight number into the search box. You can also see delays at a specific airport by typing in the name of the city or three-letter airport code followed by the word “airport”.For Example: - American airlines 18, Houston airport

Currency Conversion: To use built-in currency converter, simply enter the conversion you’d like done into the Google search box and we’ll provide your answer directly on the results page.For Example: 150 GBP in USD

Phone Listing: Let’s say someone calls you on your mobile number and you don’t know who it is. If all you have is a phone number, you can look it up on Google using the phonebook feature.For Example phonebook: 617-555-1212 (note: the provided number does not work – you’ll have to use a real number to get any results). Email ThisBlogThis!Share to TwitterShare to Face bookShare to Google Buzz

Page 9: Search engine

Labels: InternetWorking of a Search EngineMost Web search engines are commercial ventures supported by advertising revenue and, as a result, some employ the practice of allowing advertisers to pay money to have their listings ranked higher in search results. Those search engines which do not accept money for their search engine results make money by running search related ads alongside the regular search engine results. The search engines make money every time someone clicks on one of these ads.

Page 10: Search engine

Meta search engine

Meta search engines are search engines that search other search engines. Confused? To put it simply, a meta search engine submits your query to several other search engines and returns a summary of the results. Therefore, the search results you receive are an aggregate result of multiple searches.While this strategy gives your search a broader scope than searching a single search engine, the results are not always better. This is because the meta search engine must use its own algorithm to choose the best results from multiple search engines. Often, the results returned by a meta search engine are not as relevant as those returned by a standard search engineA metasearch engine is a search tool[1] that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Metasearch engines enable users to enter search criteria once and access several search engines simultaneously. Metasearch engines operate on the premise that the Web is too large for any one search engine to index it all and that more comprehensive search results can be obtained by combining the results from several search engines. This also may save the user from having to use multiple search engines separately.The term "metasearch" is frequently used to classify a set of commercial search engines, see the list of search engines, but is also used to describe the paradigm of searching multiple data sources in real time. The National Information Standards Organization (NISO) uses the terms Federated Search and Metasearch interchangeably to describe this web search paradigm.

Meta engines don’t have the budget of the superior engines and are simply ignored. Another reason why they are overlooked is they compile the results of multiple search engines and give unrelated results. Meta search engines reduce the power of the bigger search engines.Dog pile is perhaps the best-known Meta search engine. It compiles ten different results from ten different websites and gives relevant information thus by eliminating duplicate data. Meta search engines save the time for the searcher by cutting down the number of search operations. Meta search engines are the programs that send request to multiple search engines, combine the result and show them. Since they do not have any database with them they search in major search engines and show the results. Working style of Best Meta Search Engines: Type the word that you want to search in the search menu; once you have typed in a Meta search engine they forward the request to many primary search engines. Since each primary engine has its own rules and regulations for requesting data, Meta engines change the requested data and they resend. Meta search engines send requests simultaneously and they get processed in parallel by saving time. Depending on the capacity, the Meta engines get one or more search result

Page 11: Search engine

pages from the primary search engines. A few Meta search engines are very helpful in doing an in-depth search.

Once all the results have been received, the next step is to remove duplicate results and show them. All the results are regularly sorted by the primary engine that supplies result, based on the rank of the result. Some Meta search engines sort results based on user preferences. Since Meta Search Engines search different search engines results missed by primary engines are shown here. Saves time by parallel searching. Eliminates duplicate results. Best combines the separate result sets.

Timeouts occurs while searching in different search engines. Most of the Meta search engines get only ten to fifty results per primary engine. Advanced features and techniques are not available in Meta engines. Meta search engines may exclude one or more major search engines like Google, Yahoo and MSN.Primary search engines generally do not view Meta search engines as competition. Before Meta search engines can perform their search operations, they want consent information from primary search engines for search operation.

Most people use Google, yahoo, MSN-the big three search engines for search operations. But all search engines are not equal. Depending on the type of information the user looking for we select different search engines. The main purpose of introducing Meta search engines is to reduce time complexity and to increase optimization in searching. Meta search engines are capable of locating results that you might miss in primary engines. But, merging results from different search engines into single list sometimes bring up some security issues. By properly understanding the problem we can better look out for best Meta search engine.

Operations

Metasearch engines create what is known as a virtual database. They do not compile a physical database or catalogue of the web. Instead, they take a user's request, pass it to several other heterogeneous databases and then compile the results in a homogeneous manner based on a specific algorithm.No two metasearch engines are alike. Some search only the most popular search engines while others also search lesser-known engines, newsgroups, and other databases. They also differ in how the results are presented and the quantity of engines that are used. Some will list results according to search engine or database. Others return results according to relevance, often concealing which search engine returned which results. This benefits the user by eliminating duplicate hits and grouping the most relevant ones at the top of the list.Search engines frequently have different ways they expect requests submitted. For example, some search engines allow the usage of the word "AND" while others require

Page 12: Search engine

"+" and others require only a space to combine words. The better metasearch engines try to synthesize requests appropriately when submitting them.

Architecture of a metasearch engine

Metasearch engines create what is known as a virtual database. They do not compile a physical database or catalogue of the web. Instead, they take a user's request, pass it to several other heterogeneous databases and then compile the results in a homogeneous manner based on a specific algorithm.No two metasearch engines are alike. Some search only the most popular search engines while others also search lesser-known engines, newsgroups, and other databases. They also differ in how the results are presented and the quantity of engines that are used. Some will list results according to search engine or database. Others return results according to relevance, often concealing which search engine returned which results. This benefits the user by eliminating duplicate hits and grouping the most relevant ones at the top of the list.Search engines frequently have different ways they expect requests submitted. For example, some search engines allow the usage of the word "AND" while others require "+" and others require only a space to combine words. The better metasearch engines try to synthesize requests appropriately when submitting them[citation needed].[edit]

Page 13: Search engine

Architecture diagram….Observations About Search Engine Operation

The criteria and algorithms used vary from search engine to search engine.The criteria and algorithms are complex, but are not published.The criteria and algorithms change over time, as often as every 2 weeks.There is consolidation among search engines, and new ones are being added continually.Search engines reward simple pages with a high concentration of keywords and key phrases.Search engines reward repetition of keywords and key phrases, but penalize spamming.Search engines penalize old pages.50% of search engines (including AltaVista and Google) will preferentially index pages with many links from the outside.Some search engines (DirectHits.com) reward how often a page is selected and how much time is spent with it.Websites are only re-indexed infrequently, approx. every 8 weeks to 6 months.

Page 14: Search engine

No search engines indexes more than 16% of the word’s 800 million URLs (as of Feb 1999). See chart. As of mid-1999, Excite could only index 50 million URLs, would drop pages at random.Search engines will only…- Index pages in top 2 or 3 directories.- Index the top few hundred words of each page.- Index max of 300 to 400 pages of large websites (only 25 for Excite). - Not index pages with a ? or & in the URL.- Not index dynamically generated pages.Search engines are increasingly falling behind actual web growth.

Search Engine OperationsFollowing are the basic three operation of search engine.

CrawlingThe set of automated programs known as bots, agents or spiders to crawl the contents of web pages, documents by using the Hyperlink structure. There are billions of web pages available on the internet but still all are not crawled by the search engines.

Indexed documentsAfter performing the crawling operations now it’s time to keep the crawled contents in the repository. Search engines maintain a huge repository of documents called “Index” to store the content in an organized way. It need to tightly managed to entertain the user query by traversing the billions documents.

Processing QueriesInternet users search for millions of words or phrase each day in search engines. When the user submits his or her query it comes to the search engine where the documents are indexed for better match with the query and return backs the relevant search results.

Ranking resultsIts for sure that lots of matches found for the query in the documents, now to decision to be made for priority to display the search results. Search engines uses complex algorithms to rank the results based on hundreds of unknown factors to find the most relevant results for the query.

The main objective of search engines is to provide relevant and better results to user’s queries. In order to do that search engines employed number of complex information or we can say developed a language which can speak to the Web sites, forums or blogs which can be understandable or spoken only if the Web Sites adopt the SEO techniques.

.

Page 15: Search engine

Reference: Wikipedia