natural language processing & semantic modelsin an imperfect world

41
Confidential Presenter: Marc Hadfield [email protected] www.alitora.com Natural Language Processing & Semantic Models in an Imperfect World Copyright Alitora Systems, Inc. 2009

Upload: vitalai

Post on 05-Dec-2014

1.123 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Natural Language Processing & Semantic Modelsin an Imperfect World

Confidential

Presenter:Marc Hadfield

[email protected]

Natural Language Processing

& Semantic Modelsin an Imperfect World

Copyright Alitora Systems, Inc. 2009

Page 2: Natural Language Processing & Semantic Modelsin an Imperfect World

Marc Hadfield

CTO of Alitora Systems Computer Science Research in Bioinformatics

NLP Big (Fuzzy) Networks

Generalized Semantic Data Platform

Page 3: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Systems

System Approach

…Talk about Systems & Apps more than Modules.

Page 4: Natural Language Processing & Semantic Modelsin an Imperfect World

Discussion Today

Storing Data – Semantic Repository Generating Data – NLP Modeling Data – Semantic Models Analyze Data – Methodology Using Data – Application

Page 5: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Systems Architecture

Page 6: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Systems API (ASAPI)

User Interfaces ASAPI Collaboration kHarmony™ Semantic

DB Alitora Foundry

Text-Mining UMIS Secure

Distributed URIs URI to Named Graphs

Page 7: Natural Language Processing & Semantic Modelsin an Imperfect World

ASAPI Cloud

Multi-Billion Triples

Page 8: Natural Language Processing & Semantic Modelsin an Imperfect World

kHarmony™ Semantic DB

Semantic / Graph DB Cloud Deployable

Distribute Data over Servers Layers of Cache

Data Analytics / Clustering Determine High-Value

Knowledge Knowledge Relevancy

Embedded Scripting Data Entitlements

Users, Teams, Organizations, Colleagues

Base Ontology

Page 9: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Foundry

Manages NLP processes Annotators which add metadata to text

Includes external services like OpenCalais as annotators

Workflows to link annotators together Common data representation across

components RDF in, RDF out Ontology includes representation of

certainty, error

Page 10: Natural Language Processing & Semantic Modelsin an Imperfect World

Foundry Workflow

Independent Workflows based on type of text

Combine ML &Rule-based systems

Page 11: Natural Language Processing & Semantic Modelsin an Imperfect World

Foundry Data Model

Two dimensional representation of tokens Labels/Spans to tag token ranges (features in machine learning)

Allows multiple interpretations of tokens Chemical names tokenized differently than personal names

Sequence Recognition and Categorization (with scoring/likelyhood) Entities, Entity Types, Normalized (Disambiguated) Entities (ER vs. ER)

Shared across workflow steps Direct RDF representation

“Span”

Page 12: Natural Language Processing & Semantic Modelsin an Imperfect World

NLP In Action

Copyright Alitora Systems, Inc. 2009Confidential

Page 13: Natural Language Processing & Semantic Modelsin an Imperfect World

Sentence

“Suppression of endogenous Bim greatly inhibits Gadd45a induction of apoptosis.”

Parse [action, inhibit, [action, suppress, [unknown], [gp, endogenous Bim] ], [action, induce, [gp, Gadd45a], [process, apoptosis] ], ]

Confidential Copyright Alitora Systems, Inc. 2009

Foundry Relationship Extraction

Page 14: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Knowledge Ontology

Data Representation:

Each Object is Named Graph. Unique URI.

“chunks” of RDF

OWL2

“Core” Model

Page 15: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Knowledge Ontology

Named Graphs:

•URI

•“Reified”

•Provenance

• Hash/Signature

• Creation, Modification, Expiration Dates

•Certainty/Error

Page 16: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Knowledge Ontology

Lesson:

“Reification” at the model level.

Expose the topology of the knowledge.

Page 17: Natural Language Processing & Semantic Modelsin an Imperfect World

Semantic Knowledge StatementsDomain Ontology + Instance Statements

Alitora Knowledge Ontology

Page 18: Natural Language Processing & Semantic Modelsin an Imperfect World

Semantic Collaborative Statements

Alitora Knowledge Ontology

Page 19: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora Knowledge Ontology

Fact Representation This example has 9

Named Graphs The “Relation” is the

head Any number of

Relation-Parts Relation-Parts are

chained

“Company Merger”

Page 20: Natural Language Processing & Semantic Modelsin an Imperfect World

•OWL

•“Reified”

•Knowledge Representation

•Certainty, Error, Provenance, …

•Graph + Semantic

•Topology Interpretation

•Logical Interpretation

Alitora Knowledge Ontology

Page 21: Natural Language Processing & Semantic Modelsin an Imperfect World

MemomicsBio Ontology (Domain) Extends Alitora Knowledge Ontology

Inherits knowledge representation structures OWL Domain Specific Defines types of “facts” specific to

biomedical domain A general AKO fact can be

mapped/asserted into a Memomics BioOntology fact

Page 22: Natural Language Processing & Semantic Modelsin an Imperfect World

Where are we?

Store Data Generate data with NLP Represent data in a general knowledge

model Have a domain specific ontology

Where the “action” happens

Need some analysis to push facts into the domain ontology

Query, Inference using the domain ontology

Page 23: Natural Language Processing & Semantic Modelsin an Imperfect World

Relevancy

The shape or “topology” of the graph helps to identify relevant knowledge.

The “paths” connecting a User to knowledge, based on search usage, factor into Relevancy

“Knowledge Rank” “Best” facts

Relevancy based onGraph Topology

Page 24: Natural Language Processing & Semantic Modelsin an Imperfect World

Scripting, Analysis, Inference Submitted Scripts applied over Graph Walk

Groovy Scripts (Java Interface) Can calculate “scores”

Offline Clustering and Analysis Algorithms Grid/Cloud based

Inference process utilizes knowledge Asserting statements (Relation Statement) Prolog, HiLog, F-Logic Use all features in inferencing (such as certainty)

Page 25: Natural Language Processing & Semantic Modelsin an Imperfect World

Certainty

How accurate (F-score) are your NLP extractions?

How accurate is the source material? How dynamic is your domain? Can facts be independently verified

Do multiple sources reinforce a “fact”? Can your community of users curate or

validate information? How sensitive are you to error?

Will users tolerate error (such as in search) or are you trying to inference over absolute “truth”?

Page 26: Natural Language Processing & Semantic Modelsin an Imperfect World

Certainty

Choose to assert facts(or not)based on certainty assessments

Page 27: Natural Language Processing & Semantic Modelsin an Imperfect World

Confidential

Guided Inference

Inference is guided by ranked knowledge

Analysis can be performed offline

Page 28: Natural Language Processing & Semantic Modelsin an Imperfect World

Guided Inference

Dynamic Inference / Rules A question/query is posed to initiate the

inference Knowledge-based is queried to collect

relevant data Certainty Thresholds can be used Relevancy Thresholds can be used

AKO Relations are asserted as “facts” to extend the inference

Process is repeated to add assertions

Page 29: Natural Language Processing & Semantic Modelsin an Imperfect World

Demonstrations

Alitora Newstracker Sage Commons, Biomedical Domain Match Engine, Consumer Application

Page 30: Natural Language Processing & Semantic Modelsin an Imperfect World

Alitora News Tracker

Track highly relevant news in domain niche

Use NLP to extract entities and relations of interest

Use certainty assessments as thresholds to consider entities/relations

Use a score (an embedded script) to assign a relevancy to news articles Heuristic including entities types in articles,

relationship types, et cetera

Page 31: Natural Language Processing & Semantic Modelsin an Imperfect World

Application: News Tracker

Page 32: Natural Language Processing & Semantic Modelsin an Imperfect World

Application: Sage Commons

Share networks of biomedical data across the community of researchers million node networks, billions of triples

Extended AKO with Sage Ontology Use for structured data and unstructured data

Allow combination of structured data with NLP derived data

Use certainty thresholds to cut down on noise Use relevancy for efficient queries Expose data for guided inferencing

Page 33: Natural Language Processing & Semantic Modelsin an Imperfect World
Page 34: Natural Language Processing & Semantic Modelsin an Imperfect World
Page 35: Natural Language Processing & Semantic Modelsin an Imperfect World
Page 36: Natural Language Processing & Semantic Modelsin an Imperfect World

Application: Match Engine

Match Engine Extended AKO with Match Ontology Foundry for extracting music event entities

Performer, Venue, Price, Genre Certainty for reducing noise Match Engine uses inference with multiple

source of “evidence” to match users with events

Demo Application: Bandalay Facebook App

Page 37: Natural Language Processing & Semantic Modelsin an Imperfect World
Page 38: Natural Language Processing & Semantic Modelsin an Imperfect World
Page 39: Natural Language Processing & Semantic Modelsin an Imperfect World
Page 40: Natural Language Processing & Semantic Modelsin an Imperfect World

NLP and (Un)Certainty

Capture Error / Uncertainty in Model from NLP “Reify” relationships so metadata will “fit” Use multiple types of analysis

Rules, Machine Learning, Topology, Curation, User Feedback

Separate general model and domain model Allows asserting a fact in the domain model or not (don’t

“decide” everything at once) Use semantics to make decisions about data Inference can use thresholds to decide to assert

facts (or not) Guided Inference can make informed choice about

facts to add/remove from model

Page 41: Natural Language Processing & Semantic Modelsin an Imperfect World

Contact Information

750 Menlo Ave, Suite 340 155 Water Street

Menlo Park, CA 94025 Brooklyn, NY 11201

(415) 310-4406 (917) 463-4776

[email protected]

[email protected]

ConfidentialCopyright Alitora Systems, Inc. 2009