infobrokering and searching the deep web

Post on 15-Jun-2015

809 Views

Category:

Education

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Infobrokering and Searching the Deep Web - the New Role of Employee of the Department of Medical Scientific Information. Presentation from EAHIL Workshp Kraków 2007.

TRANSCRIPT

Infobrokering and Searching the Deep Web

the New Role of Employee of the Department of Medical Scientific

Information.

Witold Kozakiewicz, Barbara Grala

Main Library, Medical University of Łódź,Poland

"The Librarian", a 1556 painting by Giuseppe Arcimboldo

Information should be meaningful, valuable, adequate, complete, actual and reliable.

Google is like box of chocolate....

Deep Web

The deep Web (or Deepnet, invisible Web or hidden Web) refers to World Wide Web content that is not part of the surface Web indexed by search engines.

Deep Web Pages• Dynamic content - dynamic pages which are returned in response to a submitted

query or accessed only through a form (especially if open-domain input elements e.g. text fields are used; such fields are hard to navigate without domain knowledge).

• Unlinked content - pages which are not linked to by other pages, which may prevent Web crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks).

• Private Web - sites that require registration and login (password-protected resources).• Contextual Web - pages with content varying for different access contexts (e.g.

ranges of client IP addresses or previous navigation sequence).• Limited access content - sites that limit access to their pages in a technical way (e.g.,

using the Robots Exclusion Standard, CAPTCHAs or pragma:no-cache/cache-control:no-cache HTTP headers), prohibiting search engines from browsing them and creating cached copies.

• Scripted content - pages that are only accessible through links produced by JavaScript as well as content dynamically downloaded from Web servers via Flash or AJAX solutions.

• Non-HTML/text content - textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.

Source: Wikipedia

Deep Web databases

Source: Bin He, Mitesh Patel, Zhen Zhang, Kevin Chen-Chuan Chang. Accessing the Deep Web http://doi.acm.org/10.1145/1230819.1241670

How to improve searching process?

 try to use abilities of search engines, use more complex questions with Boolean operators, keywords. Use advanced search option or

search engine suggestions

try specialized services like Google Scholar, Google Books, MS Live Search Academic, Yahoo Search Subscriptions

 if you are looking for specific file types, try dedicated search engines like Picsearch, or Yahoo Podcast Search;

try metasearch engines like friskr.com, dogpile.com, clusty.com, mamma.com turbo10.com;

use specialized web services and database search enginesPubMed, Medic8, WebMD, MammaHealth

use subject gateways – an online service that provides links to numerous other sites or documents on the Internet. (Intute, Scout Archives, BUBL

Infomine)

try to search open access journals or repositories like DOAJ, OAIster;

 try to use find the specific database using CompletePlanet or Geniusfind ;

too many!

too complicated!

The role of Librarian

• Help user to find the information

• Choose proper search tools.

• Prepare the tool-box

• Teach how to use it.

Thank You.

top related