semantic web in industry r. guha. two levels of the semantic web deep semantic web: –intelligent...

22
Semantic Web In Industry R. Guha

Upload: audra-allen

Post on 17-Dec-2015

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Semantic Web In Industry

R. Guha

Page 2: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Two Levels of the Semantic Web

• Deep Semantic Web: – Intelligent agents performing inference – Semantic Web as distributed AI– Small problem … the AI problem is not yet solved

• Shallow Semantic Web: using SW/Knowledge Representation techniques for– Data integration– Search– Is starting to see traction in industry

Page 3: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Integration: The new buzzword in bussiness

• Huge explosion in the number of new databases, applications, documents, … in the 90s– Lots of redundancy, duplication … => high inefficiency

• Economic pressures forcing consolidation and efforts to reduce inefficiency

• Two aspects to integration: Process & Data– Process integration depends on data integration

Page 4: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Data Integration for Science

• Many experimental fields will generate more data in the next 2 years than exists today

• Large part of research consists of writing programs to analyze data, e.g., NASA

• Tools to normalize, share, integrate data stuck in the 80s (ftp, perl, …)

• Semantic Web could create a “web of data” that changes all this.

• Example of the Internet Observatory

Page 5: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Varieties of Data Integration: Data Transformation

• Data Transformation Example– Contact Information in SAP, Siebel, PeopleSoft, …– We want to reflect updates in one data source into

another

App. Server

XSLT, etc.

Siebel Clarify PeopleSoft

Page 6: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Varieties of Data Integration: Data Aggregation

• Data Aggregation Example– Clinical trial data at Stanford, UCSF, Mayo …– We want to give a Meta-analyst a uniform view of data

from these different clinical trials– Example of how this would have helped recent meta

studies such as the estrogen study

DBMS

Relational Views

Meta-Analyst

UCSF Stanford Mayo

Page 7: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Data Integration Layers

• Coping with software from different vendors– Oracle vs. DB2 vs. SQL Server … this is a solved problem

• Coping with different formats– Relational vs. XML vs. ISAM… this too is a solved problem

• Coping with different schemas– Solved for the small case where one person understands all the

schemas– No products for the case where it is truly distributed

• We know how to do it in theory, but lots of practical problems

• Coping with data from unknown sources– Wide open … lots of unsolved problems

Page 8: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Typical Data Integration Methodology

• Use a common namespace of terms for the concepts in the domain of the data sources being integrated, e.g., Employee, Customer, Patient, weight, height, bodyTemperature, …

• Mappings relate data items in data sources to terms in namespace

• Transformation algorithms map queries in terms of common namespace into corresponding queries in terms of data source vocabularies

• Background knowledge about terms essential for transformations … e.g., Employee subClassOf Person, 2 people with the same last name, first name and street address are likely to be the same, I.e., common namespace is really an Ontology

• Mappings and common namespace are the workhorse

Page 9: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Role of Semantic Web in Data Integration

• The XML stack (XML, XSD, XPath, XQuery, …) does not have the concepts (objects, classes, properties, …) required for representing ontologies

• RDF/S does …

• Neither of the them have a language for expressing mappings– But RDF/S, being closer to logic, has more of the

machinery that is required

Page 10: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Kinds of Mappings

• Simple structural– DB1.patient.weight corresponds to Patient’s weight

• Conditional structural– If DB1.patient.type equals Outpatient then DB1.patient.foo

corresponds to Patient’s visits duration …

• Term mappings– CA in DB1 corresponds to California in domain namespace– Object with ssn 7687667 in database 1 corresponds to

object with id “aksdks” in database 2

Page 11: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Challenges and non-challenges in data integration

• Non-challenge: algorithms for doing the transformations (ISI, MCC, SU & AT&T)

• Engineering Challenges– Creating large, useful ontologies that are shared by many– Creating mappings

• Research Challenges– Semantic Drift– Fuzzy terms, probabilistic mappings– Trust

Page 12: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Engineering Challenges

• Creating large, detailed ontologies is complex and expensive– But it is happening … CrossWorlds for business concepts,

MAGE, etc. for medicine– Danger: some of them might turn out to be proprietary

• Creating mappings is tedious and time consuming

• Object mappings pose special challenges– Mappings need to be dynamic and constantly updated

Page 13: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Research Challenges with mappings

• Semantic Drift– The meaning of terms as interpreted by different members of a

community, over time could drift– Cyc experience shows that Description Logic mechanisms are

not adequate for either detecting or fixing these

• Fuzzy mappings– E.g., walmart’s concept of chair is similar to but not the same as

MOMA’s concept of chair

• Probabilistic mappings– There is a 82% likelihood that Michael Jordan in database 1 is the

same as Michael Jordan in database 2

Page 14: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Other data web related challenges

• Trust: How should the program know whether to trust some new data source?– Without this, we will only have closed systems– Options: centralized approaches like UDDI or decentralized

approaches like WOTs

• Inverse trust: how can I trust you not to indiscriminately distribute my data? A big issue in fresh scientific data

• Systems challenges– Caching– Preventing accidental DOS attacks

Page 15: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Forecast for SW and Data Integration

• We already have a number of data integration tools on the market

• We are seeing the first generation of ontology based data integration tools from small companies

• At least some of the big players will probably have some offerings for doing data integration based on Semantic Web concepts in the near future– Whether they use Semantic Web formats and acronyms is

an open question …

• These common vocabularies will exhibit very strong network effects

Page 16: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Semantic Web for Search: Going beyond search as Location Bar

• Keywords a particular page – Typically a home page or well known hub page – United airlines www.united.com– Unix gnu.org, linux.org, freebsd.org

• Search as a smarter location bar

• Page rank is ideally suited for this– This is largely a solved problem

Page 17: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Varieties of Search: Research searches

• User is searching for info about something• Could be directed – user is looking for a particular

property– Price of something, location of some event, …

• Or undirected – user is looking for some general class of properties– Reviews/feedback on product, info on person or country

• If there is no hub page on the thing, existing search engines perform very poorly

• New focus is on this class of searches

Page 18: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Semantic Web for Search

• Keyword based approaches haven’t made significant advances since PageRank

• Improvements may be gained by adding a modicum of understanding about the *object* denoted by the search query

• Improvements not just in search itself but also in the relevance of search related advertising

Page 19: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Basic Issues

• Need database of potential objects user may be referring to, along with some properties of the object … e.g., its type

• Too many objects to manually construct DB– At least 300 million distinct object references on Web

• If it does know something more about the search term’s denotation, (e.g., it denotes a musician), how can the search engine do better?

Page 20: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Building the Web KB

• Many different automated approaches– Simple natural language processing (Riloff, TAP, …)– Scrappers– Machine Learning

• Most commercial efforts lead to proprietary KBs

• Huge opportunity for wider SW community– Collaborate to actually create the KB

Page 21: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Using the KB

• Word Sense Disambiguation., e.g., MSN Search, Teoma

• Incorporating data feeds into search results. E.g., MSN with popular musicians

• Incorporating object type specific actions. E.g., Google with addresses and stock symbols

• Coming soon … KB construction driven by ads

Page 22: Semantic Web In Industry R. Guha. Two Levels of the Semantic Web Deep Semantic Web: –Intelligent agents performing inference –Semantic Web as distributed

Conclusions

• Please help Eric miller