enhanced site search with cognitive apis - glynn bird
TRANSCRIPT
Enhanced Site Search with Cognitive APIsGlynn BirdDeveloper Advocate @ IBM Cloud Data [email protected]@glynn_bird
●What is search?●Simple Search●Adding some "cognitive"
Agenda
@glynn_bird
Primary search
@glynn_bird
In-site search
@glynn_bird
Elasticsearch• Stores JSON Documents• Search based on Apache Lucene• Provides HTTP search API• Pay per-GB on compose.com
@glynn_bird
Cloudant• Stores JSON Documents• Based on Apache CouchDB• Search based on Apache Lucene• Provides HTTP search API• PAYG/Dedicated-as-a-service or Local
@glynn_bird
Get started - Simple Search Service
https://developer.ibm.com/clouddataservices/simple-search-service/
@glynn_bird
Game of Thrones search demo
http://sss-got-theme.mybluemix.net/
@glynn_bird
Structured vs Unstructured DataStructured Data
● known schema● predictable● indexable
Unstructured Data
● unknown schema● difficult to parse
and index
DB
@glynn_bird
Example data{ "url": "http://www.bbc.co.uk/news/business-37742991", "title": "AT&T announces it will buy Time Warner", "description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner", "date": "2016-10-22T23:44:03.000Z", "image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"}
@glynn_bird
Structured data{ "url": "http://www.bbc.co.uk/news/business-37742991", "title": "AT&T announces it will buy Time Warner", "description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner", "date": "2016-10-22T23:44:03.000Z", "image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"}
@glynn_bird
Unstructured data{ "url": "http://www.bbc.co.uk/news/business-37742991", "title": "AT&T announces it will buy Time Warner", "description": "US telecoms giant AT&T announces it will buy entertainment group Time Warner", "date": "2016-10-22T23:44:03.000Z", "image_url": "http://c.files.bbci.co.uk/_91950162_breaking_image_large-3-1.png"}
@glynn_bird
Let's build news website● take RSS feeds● put the data into a database● index it
○ newest articles first○ keyword search
@glynn_bird
Node-RED● visual programming tool● https://nodered.org/
@glynn_bird
Indexing data in Cloudant - MapReduce
function(doc) { emit(doc.date, doc.title);}
● Build index sort articles by date● Create custom 'map' function
@glynn_bird
Indexing data in Cloudant - MapReduce
@glynn_bird
Front end
@glynn_bird
Indexing data in Cloudant - Search
function(doc) { index('default', doc.title); index('default', doc.description);}
● Build full-text index● Create custom 'map' function
@glynn_bird
Cloudant Search● Punctuation removal● Word splitting/stemming● Stop-word removal● Full-text indexing using Apache
Lucene
@glynn_bird
Front end
@glynn_bird
Front end
@glynn_bird
Summary so far...
@glynn_bird
But can we do better?
@glynn_bird
Watson Alchemy Language API● Feed it text or a URL● Returns:
○ entities - people/places/companies○ taxonomy
@glynn_bird
Watson Alchemy Language APIEntities
Country: US Company: AT&T Company: Time Warner JobTitle: Telecoms
Taxonomy /art and entertainment /technology and computing/internet technology/isps /business and industrial/company/merger and acquisition
@glynn_bird
How can we use Alchemy in our workflow?
@glynn_bird
How can we use Alchemy in our workflow?
@glynn_bird
More indexing● Index the Alchemy entities
○ e.g. Country:US● Index the Alchemy taxonomy
○ e.g. ["Finance","Investing"]
@glynn_bird
Front end
@glynn_bird
@glynn_bird
@glynn_bird
@glynn_bird
Demo
https://glynnbird.github.io/alchemy-news/
@glynn_bird
It's not just language...
@glynn_bird
Watson saw….
@glynn_bird
Just one more
@glynn_bird
Watson saw...
@glynn_bird
@glynn_bird
@glynn_bird
Summary● Node-RED● Cloudant● Alchemy Language API
Bluemix: https://www.ibm.com/cloud-computing/bluemix/
Simple Search Service: https://developer.ibm.com/clouddataservices/simple-search-service/
News Demo: https://glynnbird.github.io/alchemy-news/