applications of semantic technology in the real world today
DESCRIPTION
Amit Sheth, "Applications of Semantic Technology in the Real World Today," talk given at Semantic Technology Conference, San Jose, CA, March 2005. This talk reviews real-world applications mainly deployed in financial services industry developed over Semagix Freedom platform described in http://knoesis.org/library/resource.php?id=810 . Technology is based on this patent: "Semantic web and its applications in browsing, searching, profiling, personalization and advertising", http://knoesis.org/library/resource.php?id=843 . Amit Sheth founded Taalee in 1999, which merged with Voquette in 2002, and then with Semagix in 2004.TRANSCRIPT
04/11/23
2005 SEMAGIX All rights reserved.
1
Applications of Semantic Technology in the Real World TodayAmit Sheth, CTO, Semagix Inc
04/11/23
2004 SEMAGIX All rights reserved.
5
Things to Consider About the Semantic (Web) Technologies
Build Ontology • Build Schema (model level representation • Populate with Knowledgebase (people, location, organizations, events)
Automatic Semantic Annotation (Extract Semantic Metadata)• Any type of document, multiple sources of documents• Metadata can be stored with or sparely from documents
Applications: search (ranked list of documents of interest (semantic search), integrate/portal, summarize/explain, analyze, make decisions
• Reasoning techniques: graph analysis, inferencing
Types of content/documents
Use of standards
Scalability
Performance
opscenter
04/11/23
2004 SEMAGIX All rights reserved.
6
SchemaCreation
Ontology Population
Metadata Extraction
BSBQ Application
Creation
AnalyticApplication
Creation
Ontology API
MB KB
Ontology-driven Information System Lifecycle
Building a scalable and high performance system with support for:
Ontology creation and maintenanceOntology-driven Semantic Metadata
Extraction/AnnotationUtilizing semantic metadata and ontology
Semantic search/querying/browsing
Information and application integration - normalization
Analysis/Mining/Discovery – relationships
Semantic Technology Solves These Challenges
04/11/23
2004 SEMAGIX All rights reserved.
7
Upper ontologies: modeling of time, space, process, etc
Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), SWETO, WordNet ;
Domain-specific or Industry specific ontologies
News: politics, sports, business, entertainment
Financial Market
Terrorism
PharmaO
GlycO (Glycomics); PropeO (Proteomics)
GO (nomenclature), NCI (schema), UMLS (knowledgebase), …
Application Specific and Task specific ontologies
Anti-money laundering, NeedToKnow, (Employee or Vendor Whetting)
Equity Research
Repertoire Management
Fundamentally different approaches in developing ontologies: schema vs populated; community efforts vs reusing knowledge sources
Types of Ontologies (or things close to ontology)
04/11/23
2004 SEMAGIX All rights reserved.
8
More sophisticated semantic technologies exploit ontologies and
• Provide scalability and flexibility
• Handle all types of data (unstructured, semi-structured, structured)
• Create SmartData – enhancing raw data with context and relationships
• Accommodate SmartQuerying – flexible, intelligent querying
• Enable powerful enterprise decision making
Evolution of Meta Data
04/11/23
2004 SEMAGIX All rights reserved.
9
Real-World Applications(case studies)
04/11/23
2004 SEMAGIX All rights reserved.
10
Global Bank
Aim• Legislation (PATRIOT ACT) requires banks to identify ‘who’ they are doing business with
Problem• Volume of internal and external data needed to be accessed• Complex name matching and disambiguation criteria• Requirement to ‘risk score’ certain attributes of this data
Approach• Creation of a ‘risk ontology’ populated from trusted sources (OFAC etc); Sophisticated entity disambiguation• Semantic querying, Rules specification & processing
Solution• Rapid and accurate KYC checks• Risk scoring of relationships allowing for prioritisation of results• Full visibility of sources and trustworthiness
04/11/23
2004 SEMAGIX All rights reserved.
11
Watch list Organization
Company
Hamas
WorldCom
FBI Watchlist
Ahmed Yaseer
appears on Watchlist
member of organization
works for Company
Ahmed Yaseer:
• Appears on Watchlist ‘FBI’
• Works for Company ‘WorldCom’
• Member of organization ‘Hamas’
The Process
04/11/23
2004 SEMAGIX All rights reserved.
12
Global Investment Bank
Example of Fraud Prevention application used in financial services
User will be able to navigate the ontology using a number of different interfaces
World Wide Web content
Public Records
BLOGS,RSS
Un-structure text, Semi-structured Data
Watch ListsLaw
Enforcement Regulators
Semi-structured Government Data
Scores the entity based on the content and entity relationships
EstablishingNew Account
04/11/23
2004 SEMAGIX All rights reserved.
13
Law Enforcement Agency
Aim• Provision of an overarching intelligence system that provides a unified view of people and related information
Problem• Need to create unique entities from across multiple disparate, non-standardised databases; Requirement to disambiguate ‘dirty’ data• Need to extract insight from unstructured text
Approach• Multiple database extractors to disambiguate data and form relevant relationships• Modelling of behaviours/patterns within very large ontology (6Mn+ entities)
Solution• Merged and linked case data from multiple sources using effective identification, disambiguation, and link analysis• Dynamic annotation of documents • Single query across multiple datasets• 360 view of an individual and relevant associations
04/11/23
2004 SEMAGIX All rights reserved.
14
Complex querying and characteristic modelling across information sources
Application of bespoke and pre-configured ‘profiles’ for detailed investigation
Profile Creation
Complex Querying
Summary of Results
Investigation
Profile Creation
Complex Querying
Summary of Results
Investigation
04/11/23
2004 SEMAGIX All rights reserved.
15
User configurable scoring profiles
Profile based on direct matching with case characteristics
Profiling based on link analysis through indirect relationships with other cases and information
Profile Creation
Complex Querying
Summary of Results
Investigation
04/11/23
2004 SEMAGIX All rights reserved.
16
Free text searching across aggregated information sources
Gisondi, white ford expedition, main street, assault, traffic offences
Profile Creation
Complex Querying
Summary of Results
Investigation
04/11/23
2004 SEMAGIX All rights reserved.
17
Unified view of direct and indirect results that best match the complex query and the profile
Profile Creation
Complex Querying
Summary of Results
Investigation
04/11/23
2004 SEMAGIX All rights reserved.
18
Knowledge Annotation of known entities from within free text
Direct and indirect relationship scoring driven by risk weightings
Aggregated knowledge from disparate sources
Profile Creation
Complex Querying
Summary of Results
Investigation
04/11/23
2004 SEMAGIX All rights reserved.
19
Scoring of key characteristics to drive relevance to original profile and query
Identification of investigation path
Visualisation of results
Profile Creation
Complex Querying
Summary of Results
Investigation
04/11/23
2004 SEMAGIX All rights reserved.
20
3D navigation of relationships and knowledge around a query
04/11/23
2004 SEMAGIX All rights reserved.
22
Key Characteristics of the Key Cases
Scalable end-to-end platform driven by domain specific ontologies
Expressive representation with named relationshipsPopulated ontologies with millions of instancesSophisticated entity disambiguation of knowledge extracted from multiple knowledge sourcesSelf-maintaining ontologies – updated as needed
04/11/23
2004 SEMAGIX All rights reserved.
23
Key Characteristics of the Key Cases
Unified 360-degree view of entities across heterogeneous information sources
Domain specific semantic metadata extraction and enhanced annotation of heterogeneous documents and heterogeneous contentSemantic linking from internal and 3rd party content/sitesFull visibility of sources and trustworthiness
Comprehensive & high performance analytical processingRelationship linking of informationCustom scoring of relationships within information
04/11/23
2004 SEMAGIX All rights reserved.
24
Help create/maintain populated ontologies that capture domain terms (do these map to classification level?)
Automatic annotation of data; possibly value added metadata enhancement (could be consistent with any metadata standard)
Providing insight into the documents (show annotated data, link concepts in documents with ontologies or context for search)
Show conceptual similarity between documents
Rule-based or pattern-based processing
Discovering links
04/11/23
2005 SEMAGIX All rights reserved.
25
QUESTIONS?
04/11/23
2004 SEMAGIX All rights reserved.
26
Semagix Product Architecture R
AW
DA
TA
XML
SMARTCentral
Th
in A
gile
Ap
plic
atio
ns
Model Integrate Enhance Deliver
SMART Services
SMARTWorks
Freedom
SMARTSearch
SMARTExplore
SMARTConnect
SMARTNotify
SMARTView
SMART Services
Smart Data
Ontology
04/11/23
2004 SEMAGIX All rights reserved.
27
Technical Capabilities
Unified Ontology Representation Language Expressiveness
Ontology Quality and Freshness
Populated Ontology Size
Data: Type and Amount
Metadata Extraction: type
Computation: query expressiveness (over metadata and ontology), rules, ranking
Visualization