extending the google search appliance to crawl valuable data behind the firewall
DESCRIPTION
The Google Search Appliance is an on-premise hardware and software solution that brings Google search into the enterprise, so users can find content quickly and securely. In this session, learn how partners today are plugging enterprise data sources into the GSA through Connectors and displaying results using OneBox.Watch a video at http://www.bestechvideos.com/2009/06/09/google-i-o-2009-extending-the-google-search-appliance-to-crawl-valuable-data-behind-the-firewallTRANSCRIPT
Google Search Appliance Extending the Google Search Appliance to Crawl Valuable Data Behind the Firewall
Nitin MangtaniMay 27, 2009
Search is the starting point to the world’s information
Google Enterprise Search
More than 20,000 enterprise search customers
Dedicated team of enterprise engineers focused on solving enterprise search problems.
Backed by Google’s core research and development
Bringing Google.com search experience to businesses
Our Search Products
Universal Search
Employee Directory
Content Management
Wikis
Intranet
File share
SharePoint
Google’s Search Philosophy
User
All information‘Real-time’ dataCustomizable and extendable
Reach
Highly secure architectureStandards-basedLeverage existing security
Security
Intuitive, unified resultsHighly relevantUser-friendly innovation
Large corpus searchCross-enterprise managementFlexible infrastructure
Scale
Personalized Search Experience
Marketing
Engineering
Advanced Biasing Controls
Administrators can create multiple biasing policies.
Source biasing
Date biasing
Metadata biasing New!
Front-end biasing New!
Simple setup - No complex coding or scripts.
Metadata Biasing New!
Determine influence of metadata parameter
On Specific metadata name,
content
Biasing based on metadata attribute and value
“Boost all documents that have author as Larry Page”
Administrators control influence (positive or negative) on metadata attribute/value pairs
Embedding Search Box in your application
<form method="GET" action="http://search.mycompany.com/search"> <input type="text" name="q" size="32" maxlength="256" value="query string"> <input type="submit" name="btnG" value="Google Search"> <input type="hidden" name="site" value="default_collection"> <input type="hidden" name="client" value="default_frontend"> <input type="hidden" name="output" value="xml_no_dtd"> <input type="hidden" name="proxystylesheet" value="default_frontend"></form> Such forms are the most recognizable methods for generating GET requests, but there are numerous other ways.
A web application may make a HTTP GET request directly:GET /search?q=query+string&site=default_collection &client=default_frontend &output=xml_no_dtd &proxystylesheet=default_frontend HTTP/1.0
Leverage users’ input
Do-It-Yourself KeyMatch
Search-as-you-Type
Google Search Appliance
Fileshares Intranets Databases Enterprise
applicationsContent
Management
Universal Search: Powered by Google Search Appliance
Documentum
SharePoint
FileNet
Livelink
Any other system
Over 200 file formats
MS Office, PDF, HTML, etc.
Web servers
Portals
Oracle
SQL Server
MySQL
DB2
Sybase
ERP systems
Business intelligence systems
Architecture
SecureReal-time access to business information
Real-Time Access to Business Applications
“The Google Search Appliance with OneBox is our command line interface to our world …adding more content and additional OneBox
interfaces will only increase the value to our organization” – Danny Perri, BOC Gases
Access to real-time business data with OneBox
2008 Q4
Q1 2007 Q3 2007 Q1 2008 Q3 2008Q1 2007 – Q4 2008
①
②③
④
⑤
https://provider…
XML
ProviderServer
Google OneBox for Enterprise
1. User enters a query 2. OneBox “trigger”
determines if the query is relevant to a OneBox module.
3. The appliance makes a secure REST call (https GET request) to the predefined OneBox provider, passing security credentials and other parameters.
4. The provider users the information to determine appropriate, user-specific, secure results to the query, and passes those results back to the appliance in XML.
5. The XML is transformed into HTML based on the XSL template provided in the OneBox module and presented to the user inline with their search results.
Google OneBox for Enterprise
Real-time, secure access to information from the search boxTriggers - Configurable to show OneBox results:
Always On: the module is invoked for every query
Keyword(s): the module is invoked in response to specific keywords
Regular Expression: invoked when query matches a regular expression
Providers Internal: Specialized search content in a separate appliance collection
External: Modules from OneBox module gallery
External: API enables you to create your own modules
OneBox Results Schema<OneBoxResults><resultCode>result_code </resultCode><Diagnostics>failure_reason </Diagnostics><provider>provider_name </provider><searchTerm>query_escape </searchTerm><totalResults>total_results_escape </totalResults><title><urlText>results_title </urlText><urlLink>results_uri </urlLink></title><IMAGE_SOURCE>image_uri </IMAGE_SOURCE><MODULE_RESULT><U>uri </U><Title>title </Title><Field name="name1 ">value1 </Field><Field name="name2 ">value2 </Field><Field name="nameN ">valueN </Field></MODULE_RESULT></OneBoxResults>
Common Security Protocols
HTTP-Basic
NTLM (v1, v2)
LDAP
Advanced Security
Kerberos New!SSO - Oracle (Oblix), CA/SiteMinderX509 Certificates
Custom Authentication & Authorization Support for SAML SPI
Document Level Security Provide the right users with access to the right documents
Security
“Zero” Sign-on
Access Control (NTLM, HTTP Basic, SSO, etc.)
1. User executes search for public and secure content (access=a)
2. User is prompted for credentials (if NTLM/Basic Auth & SSO, user is prompted for both sets of credentials)
3. Users credentials are sent securely to the search appliance
4. Google Search Appliance queries index for all possible results
5. Search appliance makes ‘authorization’ requests of the host content servers with user’s credential set
6. Host servers respond with success or failure
7. Secure results restricted to user are filtered from search results
8. Final search results (filtered) are presented to the user
nonehttp://corp…/welcome/…http basichttp://corp…/policyhtml2ntlmhttp://corp…/preso.ppt1SecureURL#
Results
ssohttp://int…/customer.jspn
Index x
401200 200
DatabaseFile sharesContent Mgmt.
Traditional search technology for millions of docs
+
Disaster Recovery Server
+Patch Deployment Management Server
+
Volume License Management Server
Google Architecture: 10M documents in a box
Health Vine SimplicityPatients
Immediate Family
Community
Where’s your GSA??
The State of Missouri’s use of Google GSA
Where was Missouri?
16 Executive AgenciesNo common web searchNo unified way for citizen’s or businesses find information about State Government.
Where is Missouri??
Centrally Managed Google GSAFront Ends and Collections provided to all State Government entitiesCommon search across all State Government web contentReliable information now easily found by citizens and businesses