introduction to anzo unstructured

51
©2014 Cambridge Semantics Inc. All rights reserved. Introduction to Anzo Unstructured June 29, 2016 Richard Mallah Director of Unstructured and Advanced Analytics [email protected]

Upload: cambridge-semantics

Post on 11-Apr-2017

462 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Introduction to Anzo Unstructured

©2014 Cambridge Semantics Inc. All rights reserved.

Introduction to Anzo Unstructured

June 29, 2016

Richard MallahDirector of Unstructured and Advanced [email protected]

Page 2: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Page 2.

Agenda

• Anzo Unstructured and the Anzo Smart Data Platform• Core Capabilities of Anzo Unstructured• Configuration, Operations, and Output• Example Use Cases in Pharma and Finance• Exploring Document-Derived Analytics• Visualizing Additional Annotators and Capabilities

Page 3: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 3.

Introduction to Cambridge Semantics (CSI)

The Anzo Smart Data Platform is used to create data analytics and management solutions with diverse data from varied sources

Company: Founded in 2007 by senior team from IBM’s Advanced Internet Technology Group Privately Funded Select customers:

Software: Market leading Anzo software suite is built on open Semantic Web standards Currently 3rd generation of the product in production use

Page 4: Introduction to Anzo Unstructured

Appl

icati

ons

Mid

dlew

are

Ente

rpris

eDa

ta F

abric

Anzo.js Client Library

Anzo Enterprise Server(SOA; OSGI, RDF & OWL over JMS)

Anzo.NetClient Library

Anzo .java/.NetClient Library Anzo Relational Replicator

Reasoning& Rules Workflow Semantic

Services

Anzo Connect

Enterprise Directory Connect

Anzo Unstructured

Anzo for Excel Applications and BI ToolsAnzo on the Web

AnzoGraph

Database

AnzoContent Repository

RDBMS

Data Mart/Warehouse

EnterpriseApplications

Directory(LDAP, AD)

• Virtualize data using W3C semantic standards

• Operationalize industry standards e.g., FIBO, LEI

• Real-time data events• Granular security and access

control• Ontology, Mapping,

Visualization & Service registries

Rich Client Apps

………

Full/Incremental ETL Web Services Federated SPARQL

NLP Text Analytics Semantic Analysis

3rd Party Databases & Applications External Data Sources

UnstructuredContent

RDBMS Teradata Hadoop SalesForce

The Anzo Smart Data Platform

Page 5: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved.

Anzo Smart Data Lake

Anzo Smart Data Lake Server

Anzo Enterprise Server

• Self-service analytics, visualization and data discovery

• Data curation, annotation and application workflow

• MPP graph query engine for interactive analytics at scale

• ODATA Integration for 3rd party analytics tools

• Metadata, ontology and mapping catalog

• Model-driven data provisioning and loading

• Text analytics• Canonical entity linking and

transformation

• Scalable Graph and Document Storage

Anzo Graph Query Engine

Anzo Ingestion ServersAnzo Unstructured

Page 6: Introduction to Anzo Unstructured

©2014 Cambridge Semantics Inc. All rights reserved. Page 6.

Anzo Ontology Editor

Page 7: Introduction to Anzo Unstructured

What Solutions Benefit From Anzo?

• For aggregation of data from multiple, diverse data sources• For integration of internal data with external data across the Web or

firewalls• For solutions involving data sources, business rules, analytics and actions

that are not evident in advance• For solutions that change often• For analyzing diverse data sources with a diverse variety of access control

requirements with a need for full provenance and traceability• For evolving solutions benefiting from ongoing involvement from domain

experts to update data models, data sources, and analytics as needed• For formal and informal day-to-day business activities that require

workflow, alerts, and automation• For collecting & analyzing data that doesn’t currently have any system of

record (e.g. “shadow IT” systems)

Page 8: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 8.

Anzo Unstructured CapabilitiesOverview

• Intake Sources– Social Media– Local Directories– Enterprise CMSs– Structured Databases– Web Sites & Boards– Spreadsheets– Google Search Appliance– Mail Servers– + dozens more

• File Formats– Office Documents– PDFs– Web Pages– Email Messages– + dozens more

• Multilingual– European, Asian, and Middle Eastern Languages– Native-Language Annotation– Document Translation– Annotation Translation– Phonetic Name Normalization/Indexing– Cross-Lingual Concepts Automapped

• Extraction Categories– Entities– Relationships– Granular Sentiment– Topic Classification– Patterns and Concepts– and more

• Concept Types Extracted– MedicalHistoryAilment– LegalStatuteSection– BiomarkerForDisease– AnalystEarningsEstimate– JobTitle– SentimentTopic– + thousands more– + easily user-extended/customized

• Semantic Analysis– Concept-Based Relationships– Relationship Compounding– Annotation Harmonization– Multi-NLP Weighting/Voting– Ontology Growing– Ontology Alignment

• Semantic Search– Concept-Based Full-Text Search– Facet On Concept or Type– Mix Structured & Unstructured Filters– Visualize Annotations In Context– External Index Federation– Multi-Stage Searching/Filtering/Clustering

• Structured/Unstructured Integration– Find/link structured resources in text– Analyze text within structured columns– Populate new structured resources

from text– Auto-enrich entities found in unstructured– Auto-extend schemas from unstructured properties

Page 9: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 9.

Anzo Unstructured NLP Plugins Overview

• Anzo Unstructured is both a pluggable framework supporting a large number of ready-made third-party NLP integrations, and also has significant NLP capabilities bundled along with it– Plugins on the following pages are a small number of our many supported

NLP capabilities from a variety of sources

• Among the annotators include out of the box are:– Autotagger and Classifier Annotator (Statistical, can fall back to rule-based)– Autotagger and Classifier Annotator (Rule-Based, can fall back to statistical)– Standard Entity Extractors (People, companies, locations, job titles, dates, etc.)– Custom Knowledgebase Annotator (Lever your taxonomies, thesauri, databases)– Fuzzy Rule Network Annotator (Find concepts by related, surrounding, contextual concepts)– Significant Phrase Annotator (Automatically extracts the important concepts)– Document Section Annotator (Autogenerate table of contents and contextualize more)– Pattern Annotators (Find part no., id no., statute section, or any custom pattern)– Custom Relationship Annotator (Find events or relationships spanning different extractions)

Page 10: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 10.

Optional NLP Plugin Technology Partners

Page 11: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 11.

Semantic Post-Processing of NLP

• Harmonization– Normalized formats for knowledge integration

• Cooperation– Multiple annotators strengthen, correct, and increase the network effect of

relationships• Probabilistic Reasoning

– Semantic knowledge integration includes both deduction and inference• Filtering

– The set of concepts, overlaps, affects, and relationships can be automatically filtered down to reduce noise

• Enrichment– Web services, semantic services, internal and external databases and

knowledgebases, and pluggable computations can be used to add more context and data to your new domain object

• Machine Learning and Predictive Analytics– Train on some gold standard and do some supervised classification– Incrementally build a conceptual cluster space for predictive analytics

Page 12: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 12.

Point and Click Configuration of Unstructured Pipelines

Page 13: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 13.

Point and Click Configuration of Annotation

Page 14: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 14.

Unstructured Pipeline Operations Monitor

Page 15: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 15.

Dashboarding Structured/Unstructured Knowledge Integration

Structuredproperty

Multiple NLP Technologies Harmonized

Overlapping annotations

Enrichedproperty

Unstructured entity

Unstructured relationship

Archived copy for review, validation & provenance

(both HTML Format & Original )

Page 16: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 16.

The CSI Semantic Knowledge Integration Approach to Enterprise Text Analytics

• Use Multiple NLP Engines or Annotators• Leverage a Knowledge Integration Platform

– Make the annotators cooperate– Enrich the annotations with internal or external data– Link annotations with existing structured data– Filter them down to the most relevant set– Harmonize ontologies and instances– Deal with probabilistic or uncertain information

• Quality Control– Manual curation and automated QC– Workflow, provenance lineage

• Easily Deal with Data Changes and Schema Changes– Both are dealt with in real-time at runtime– Maintenance is orders of magnitude more efficient

Page 17: Introduction to Anzo Unstructured

Use Cases in Pharma

• PV & Safety Data Management - Automatic tagging of case reports with customized curation workflow, text mining, and contextual search

• R&D Competitive Intelligence – Explore the competitive landscape for Therapeutic Area, Indication, Target, Company, Compound, & Partners

• R&D Informatics– Understand and correlate your internal research and how it may be related to any external developments or research

• Clinical Trial Site Selection and Optimization - Site selection, KOL search, trial planning

• Scientific Affairs/Medical Science Liaisons - Track Key Opinion Leaders (KOL) in literature and clinical trials & analyze feedback from medical professionals and patients

• Information Landscape - Track and monitor data stewardship and usage through the organization to drive more efficient usage.

• Commercial Analytics – Sales and Marketing, Rx Data, Text Analytics

Page 18: Introduction to Anzo Unstructured

Use Cases in Financial Services

• Compliance Policy & Procedure Management - Monitor structured and unstructured data sources for relevant regulatory changes; have collaborative workflows for policy & documentation development, approval, and control; and establish targeted policy dissemination and attestation workflows.

• Compliance Surveillance & Investigation– Legal and Compliance analysts can create structures and views that provide analysis, rules, and alert thresholds easily changed on-the-fly by investigators, who can then comprehend and interact with the big data picture.

• Market and Customer Intelligence- Understand how clients and prospects are thinking about your firm and competitors’ offerings

• Research - Automated analytics of news, chatter, IMs, secondary research reports, emails, sentiment, etc. for research alerts, semantic search, and relationship visualization, forming an integrated intelligence platform for analysts, including Complex Event Processing.

• Information Landscape - Track and monitor data stewardship and usage through the organization to drive more efficient usage.

• Commercial Analytics – Sales and Marketing, Tx Data, Text Analytics

Page 19: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 19.

Point and Click Configuration of Annotation

Page 20: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 20.

Point and Click Configuration of Annotation

Page 21: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 21

Relationship ExplorerFind Unexpected Connections Between Companies | Follow Paths Out or In From Anything | Follow the Money

Page 22: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 22.

Incremental Semantic Overlays: Product, Brand, Offering

Page 23: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 23.

Semantic Correspondence Linking and Overlay

Page 24: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 24.

Asking Cross-Ontology Questions

Page 25: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 25.

Cross-Ontology Questions Meet The Network Effect

Page 26: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 26.

Multi-Ontology Knowledge Graph Exploration

Page 27: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 27

Deep News ViewCustomizable Fundamental, Technical, and Thematic Filters | View Only Most Recent n Minutes | Semantic Search

Page 28: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 28

Rapid Concept DrilldownGPS for Concepts | Assisted Skimming | Interactive Annotation-Driven Navigation | Auto-translates Foreign Languages

Page 29: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 29

Example: Customizable Stock Centric SurveillanceDashboards Per Stock | Per Cohort | Per Industry | Per Custom Sector | Analyst Can Define Filters and Drilldowns

Page 30: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 30

Example: Competitor Sentiment ComparisonLongitudinal | Sentiment Aggregation | By Cohort | From Single Stock Selection | Visualize Leaders and Followers

Page 31: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 31

Example: Intraday SentimentDrill Down | Intraday Topic-Granular Sentiment | Attribute Price Action Drivers | Investigate Unusual Volumes

Page 32: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 32.

Longitudinal and Outlier Business IntelligenceUnstructured Data Becomes Structured

Page 33: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 33.

Anzo Unstructured NLP PluginsCSI Web Scraper Annotator

Page 34: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 34.

Contextual Semantic Overlay

Page 35: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 35.

I1

I2 I3

E1

E2

I4

I1

I2 I3

E1

E2

I4

I1

I2 I3

E1

E2

I4

MainPipeline

PurpleHelperPipeline

GreenHelperPipeline

Page 36: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 36.

Fuzzy Concept Matching Example: SkillsUnderstanding and Recognition in Semantic Search

Page 37: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 37.

Fuzzy Concept Matching Example: SkillsConcept Curation

• Use Excel to define each skill concept with any combination of methods• Multiple values are comma-separated• Patterns support wildcards, y within n words of x, and intuitive groupings• Define more atomic concepts before more compound concepts

Page 38: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 38.

Anzo Unstructured NLP Plugins CSI Document Classifier

Page 39: Introduction to Anzo Unstructured

©2013 Cambridge Semantics Inc. All rights reserved. Company Confidential Page 39

Indirect Filters on Domain-Specific SummariesAuto-Summarization | Extensive Filters | Integration with Multiple Sources of News and Research | Assisted Reader

Page 40: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 40.

Cross-Lingual Annotation and Optional Translation

Page 41: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 41.

Multiple Languages, One Concept

Page 42: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 42.

In Situ Translation and Annotation

Page 43: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 43.

Automated Redaction

Page 44: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 44.

Anzo Unstructured NLP Plugins CSI Significant Phrase Annotator

Page 45: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 45.

Anzo Unstructured NLP Plugins CSI Custom Relationship Annotator

Page 46: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 46.

Anzo Unstructured NLP Plugins Linguamatics I2E Annotator, Biomarkers for Diseases

Page 47: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 47.

Anzo Unstructured NLP Plugins SciBite Termite Annotator

Page 48: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 48.

Anzo Unstructured NLP PluginsLexalytics Salience

Page 49: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 49.

Simplified Views for Non-Technical UsersSemantic Search Made Easy

Page 50: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 50.

Anzo Unstructured CapabilitiesAPIs and SDK

Create new pipeline components for any of these tiers:– Document Crawler / Listener

• Obtain documents of any format from any source– Document Rich Text, Thumbnail, and Metadata Extraction

• Deal with custom or less-common file formats completely pluggably– Document Format Cleansing and Transformation

• Remove unwanted artifacts specific to your documents or translate to a particular format or language

– Full-Text Indexing• Pluggable corpus-level indexing and search

– Annotator• Already supports GATE, UIMA, and FrAU annotation frameworks• Provides access to annotations from any other annotator, cleansed text, format-

analyzed document, and original file, supporting mixed-representation annotation• Multithreading safe

– Semantic Postprocessor• Recombine, filter, and restructure annotations

Page 51: Introduction to Anzo Unstructured

©2015 Cambridge Semantics Inc. All rights reserved. Company Confidential. Page 51.

Click here to view the full webinar