t h o m s o n f i n a n c i a l ian koenig chief architect – thomson financial c4. case study:...

26
T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution Fabric 20 September 2007

Upload: rosemary-blake

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

T H O M S O N F I N A N C I A L

Ian Koenig

Chief Architect – Thomson Financial

C4. Case Study: Event Processing as a Core Capability of Your Content Distribution Fabric

20 September 2007

Page 2: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Agenda

1. A framework for discussing Complex Event Processing

2. “Elementized” News as a data source

3. The “fabric” for distributing content and its emerging capabilities for stream processing

4. Enabling new types of data sources creating new ‘opportunities’

5. Hinting at a larger pattern for distributing content and the role that complex event processing will play.

2

Page 3: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

The CEP Framework model

CEP Engine Ap

plicatio

n

Lo

gic

News

Level 2

Level1

Stream

Ad

apters

ContentSources

ContentStreams

Other

User Interface

New ContentStreams

Event Proc

Event Proc

Event Proc

Stream Agents

Page 4: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Thinking Outside the CEP Box

News

Level 2

Level1

ContentSources

ContentStreams

Other

CEP Engine Ap

plicatio

n

Lo

gic

Stream

Ad

apters

User Interface

New ContentStreams

Event

Proc Event

Proc

Event

Proc

Stream Agents

Outside The CEP Box

Page 5: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Thinking Outside the Box

5

The Classic 9 Dots Puzzle: Connect all 9 dots with 4 straight lines without ever taking the pencil off of the paper

To solve the problem, you have to “think outside the box”

Page 6: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

CEP Engine Ap

plicatio

n

Lo

gic

Stream

Ad

apters

Event

EventEvent

Stream Agents

Other

Level 2

Level1

Content Sources and Content Distribution

6

News

ContentSources

ContentStreams

ContentDistribution

Fabric Application Logic

Page 7: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Why News?

7

SEC will Allow Companies to use the Internet to Improve Investor-Management Communications          NAFrom CFO.com - August 16, 2007According to SEC chairman Christopher Cox, the commission will allow companies to use the Internet to improve investor-management communications. As currently proposed by the commission, a company interested in offering this venue to shareholders would alert them via

….News Moves Markets ….Elementized News Moves Markets …

Page 8: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

The Metaverse

9

The Metadata Universe (or Metaverse) is the set of Categories (Entities and Subjects) that provide semantic understanding for text and data.

GeographyRegions, Countries, Physical Features

IndustrySector Hierarchy(Multiple Schemes)

EventCorp. ActionMeeting, et al

Is g

roup

ed b

y

Officer of

OrganizationGov’t, Agency, Company , NGOMarket Participant

Subsidiary ofAnalyst For

Analyst For

Person(Multiple Roles)

Quote, Trade, IOI, Advertisement,Order

Market(Equity, CommodFI, et al)

Index Financial Indexes

InstrumentSecurity, Future,Derivative, et al

IndicatorEconomics, Market Stats

Issues

List

ed (

Mar

ket

Par

ticip

ant)

Has Quotes

Operates within

Mkt. Part. – Provides Quotes

Indi

cato

r F

or

Index ForClassification Standard

GEOGRAPHY ISO 3166

INDUSTRY SIC + NAICS + TSE + GICS

MARKETS ISO 10962

CURRENCY ISO 4217

CORPORATE ACTIONS

ISO 15022

RESEARCH RIXML

Africa

Americas

Asia

Europe

Oceana

Ge

og

rap

hy

Equities

Central America

United States of America

Alabama

Arkansas

North America

Ind

ust

ryM

ark

ets

Debt

Package Units

Futures

Currency

other

Page 9: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Entity: Schering-Plough (SGP-US) – An Organization of type: CompanyEntity: Merck KGAA (MRK-US) - An Organization Entity of type: Company Entity: Pharmaceuticals - An Industry Entity

Categorization Mark-up Example

10

Page 10: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

NewsML Mark-up example

11

<?xml version="1.0" encoding="UTF-8"?> <newsItem guid="urn:newsml:CBS MarketWatch:20030620:20040903-000693:2" schema="0.0" dir="ltr" version="1" xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:toc="http://data.schemas.tfn.thomson.com/Common/2007-08-01/"> <catalogRef href="http://iptc.org/std-dev/NAR/1.0/specification/IPTC-TempCatalog-inc_4.xml"/> <catalogRef href="http://news.schemas.tfn.thomson.com/schemes/TF_NewsML-G2-catalog.xml"/> <rightsInfo> <copyrightNotice>(C) 1997-2004 MarketWatch.com, Inc. All rights reserved.</copyrightNotice> </rightsInfo> <itemMeta> <itemClass qcode="ccls:text"/> <provider qcode="org:TFN"/> <versionCreated>2001-12-17T09:30:47.0Z</versionCreated> <firstCreated>2001-12-17T09:30:47.0Z</firstCreated> <pubStatus qcode="stat:usable"/> <role qcode="rol:urgent"/> <service qcode="NewsServiceId:NSID1"> <name>News Service 1</name> </service> </itemMeta> <contentMeta toc:careVersion="1" toc:careTrainingSet="2007-07-01" toc:dexterVersion="1" toc:dexterTrainingSet="2007-07-01" toc:stratifyVersion="1" toc:stratifyTrainingSet="2007-07-01"> <urgency>3</urgency> <contentCreated>1967-08-13</contentCreated> <contentModified>1967-08-13</contentModified> <infoSource qcode="org:TFN"/> <headline>Staffing company shares mixed after jobs report</headline> <by>Ciara Linnane</by> <dateline>12:21 PM ET Sep 3, 2004</dateline> <language tag="en-us"/> <subject type="type:subject" qcode="CategoryId:1234567" creator="org:thomson"/> <subject type="type:subject" qcode="CategoryId:1234568" creator="sys:care" why="why:machine-generated" confidence="70" relevance="65"/> <subject type="type:organization" qcode="OrganizationId:0123456789"/> </contentMeta> <contentSet xmlns:tfc="http://news.schemas.tfn.thomson.com/Common/2007-07-06/" xsi:schemaLocation=" http://news.schemas.tfn.thomson.com/Common/2007-07-06/ NewsCommonTypes.xsd"> <inlineXML xml:lang="en-us" contenttype="application/xhtml+xml" xsi:schemaLocation="http://www.w3.org/1999/xhtml xhtml11-tfnews.xsd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Staffing company shares mixed after jobs report</title> </head> <body> <p>NEW YORK (CBS.MW) -- After rallying <span guid="xxxx">for the past few sessions, </span> shares of staffing firms and payroll processors were mixed Friday as investors digested the August jobs report, showing <toc:Category xsi:type="toc:Indicator" IndicatorId="0234556">U.S. payrolls</toc:Category> rebounding after two sluggish months.</p> <p>The <toc:Category xsi:type="toc:Organization" OrganizationId="9000000056"> Labor Department</toc:Category> said the economy added 144,000 jobs, well above the 32,000 reading in July.</p> <p>The <toc:Category xsi:type="toc:Indicator" IndicatorId="0234551">unemployment rate</toc:Category> fell one-tenth of a percentage point to 5.4 percent, the lowest rate since October 2001, primarily because 152,000 adults dropped out of the labor force.</p> <p>Economists surveyed by CBS MarketWatch were expecting job growth of about 158,000, close to the 177,000 average for the first seven months of the year, and a jobless rate of 5.5 percent. <a href="http://cbs.marketwatch.com/news/economy/economic_calendar.asp?siteid=mktw">See Economic Calendar. </a> </p>

Document Level Mark-up (Categories only)...

<subject type="type:subject" qcode="CategoryId:1234567" creator="org:thomson"/><subject type="type:subject" qcode="CategoryId:1234568" creator="sys:care" why="why:machine-generated" confidence="70" relevance="65"/>...

In-line Markup (Categories + Facts)<body>...

<p>The <toc:Category xsi:type="toc:Indicator" IndicatorId="0234551">unemployment rate</toc:Category> fell one-tenth of a percentage point to 5.4 percent, the lowest rate since October 2001, primarily because 152,000 adults dropped out of the labor force.</p>...<p>"We were encouraged to see the headline payroll number meet expectations after two months of

disappointments," said <toc:Category xsi:type="toc:Organization" OrganizationId="0234556">SunTrust Robinson Humphrey</toc:Category> analyst <toc:Category xsi:type="toc:Person" PersonId="122456">Tobey Sommer</toc:Category>. The report, he said, "is likely to improve investor sentiment on employment-related stocks."</p>

<p> <toc:Category xsi:type="toc:Quote" QuoteId="123456781">Manpower (MAN-US)</toc:Category> shares led the gainers, rising 2.5 percent to $44.52. <...</body>

Page 11: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Auto-categorization Technology

12

• Much Financial, Legal and Medical information exists in the form of textual documents

• Traditional “Editorial” processes to tag/index documents can no be augmented by algorithms that can achieve very high precision (~95%) against very large ontologies (10,000’s of terms)

• Thomson employs a technology called CaRE (Categorization and Recommendations Engine) to do this, which originated in the Thomson Legal and Regulatory division.

• CaRE uses a set of statistics-based algorithms that are trained to understand a specific ontology as a concept scheme.

Page 12: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Elementized News – Summary

13

• Each News story is tagged at three levels. • Document Level: The overall story lists all the category metadata

(Entities + Subjects + Genre + Sentiment) for the story.• In-line Entities: Each initial reference to an Entity is marked up “in-

line” in the document for additional context.• In-line Facts: Specific Numeric Elements (e.g. US GDP or Thomson

Q3 Revenue) are tagged using XML elements

In-line News vs. Document level Mark-up

• Sentiment tags (e.g. positive earnings or negative rating) and Subject tags provide semantic understanding of the news story

• Numeric Facts (when Elementized) are directly process-able by algorithms.

• Entity tags (e.g. Company references) allow news to be linked and correlated to Market data streams by CEP engines, for example, to make trading decisions

The Value of News Elements

Page 13: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

CEP Engine Ap

plicatio

n

Lo

gic

Stream

Ad

apters

Event

EventEvent

Stream Agents

Other

Level 2

Level1

Content Sources and Content Distribution

14

News

ContentSources

ContentStreams

ContentDistribution

Fabric Application Logic

Page 14: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

The Content Distribution Fabric

15

Initialization

Service Provider

Service Consumer

Synchronization

Service Provider

Service Consumer

Cont

ent A

war

e Ne

twor

k

Intermediation

Service Provider

Service Consumer

Service Contract

Service Broker

Fin

d

Bin

d

Reg

ister

Page 15: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

“X” Marks the spot

16

Page 16: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial CONFIDENTIAL17

Content-Aware Hardware Infrastructure

MobileDevices

Applications

Databases

Applications

Applications

Content-AwareNetwork

IP/MPLSNetwork

Routing Module

Transformation

Module

AdvancedInterface Module

AssuredDelivery Module

500, 000 routes 1000’s xforms / sec

>1MM msgs / sec

Active/active fail-over

0.7ms transit for a 4K XML document

Page 17: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

New Streaming Content Sources

18

ContentStreams

Financials

Deals (M&A)

Level 2

Level1

News

ContentSources

Research

Briefings

Filings

Estimates

CEP Engine Ap

plicatio

n

Lo

gic

Stream

Ad

apters

Event

EventEvent

Stream Agents

Complex Event

Processing Applications

Content Distribution Fabric

(Intermediation, Initialization,

Synchronization)

Page 18: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Logical

Entity

Entity

Entity

Business EntitiesBusiness

EntitiesCanonical Business

Entities

The Entity Model vs the Relational Model

19

Relational Data

Physical

Table

Table

Table

Table

Transform

Page 19: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Changed Data Capture as an Enabling technology

20

• .

Content Distribution

Fabric

TableTable

Transaction

Log

TableTriggers

or Log Mining

Content Source Changed Data

Capture (publish)

1: Publishing pipeline – For Databases built using a “publishing pipeline pattern”, events can be generated directly2: Database Triggers – Database triggers can be used to generate events, but this is not recommended 3: Log Mining – is a technique that watches the transaction log that modern databases use to capture all changes as they are made. 4: Transformation– The final step is transforming the transactional changes made to the databases to XML messages that capture the “business event” process-able downstream.

3

Transform4

Page 20: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Ingest Interface (Feeds + Authoring)

Content Source(s)

MetadataThe Application Database

Data Interface (Content Distribution)

Service Interface

Human Interface

21

Content Distribution Pattern

The Content Master

The Enterprise Database

Content Distribution Fabric

(Intermediation, Initialization,

Synchronization)

Canonical Data Model(in XML)

Page 21: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial22TF Information Architecture

v0.7

Content Master Database

Content Source(s)

Ingest Interface (Ripping)

Data Interface (Content Distribution)

Application Database

Service Interface

Human Interface

Metadata

And if you Squint just a little tiny bit …

Page 22: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

The World of Event-Oriented Content

23

ContentStreams

Financials

Deals (M&A)

Level 2

Level1

News

ContentSources

Orders

IOIs

Research

Briefings

Filings

Estimates

And More

CEP Engine Ap

plicatio

n

Lo

gic

Stream

Ad

apters

Event

EventEvent

Stream Agents

Complex Event

Processing Applications

In in this new world, all content has the potential to change “transactionally”. We have lots of interesting new content streams for CEP aware applications and a Content distribution fabric that itself has event stream processing capabilities.

Content Distribution Fabric

(Intermediation, Initialization,

Synchronization)

Page 23: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

The End

24

Page 24: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial

Appendix

25

Page 25: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial 26

Thomson Master Categories: Sample Structure

Canonical Presentation

Economics & Trade

Surveys & Cyclical Indexes

National Accounts

Money & Finance

Consumer Surveys

Business Surveys

Cyclical & Activity Indexes

GDP by Expenditure

GDP by Industry

Incomes

Investment Capital

Exports

Imports

Money Supply

Activity Index

Leading Indicator

Geography3353 Categories

Industry2482 Categories

Market354 Categories

InstrumentSecurity, Future,Derivative, et al

IndicatorEconomics, Market Stats

EventCorp Action, et al

Canonical Terms are mapped to presentation terms at the most precise Level of the hierarchy. The presentation hierarchy contributes to search relevance.

Page 26: T H O M S O N F I N A N C I A L Ian Koenig Chief Architect – Thomson Financial C4. Case Study: Event Processing as a Core Capability of Your Content Distribution

Copyright © Thomson Financial27

Intelligent Network Hardware Performance

Messaging Throughput(msgs/sec)

Tens ofthousands

>Million

Messaging Latency(at 50% of peak load)

Milliseconds

Microseconds

Content Routing(number of rules)

Small numberof thousands

Hundreds ofThousands

Content Routing Latency(with 1000+ content rules)

Seconds

Microseconds

Persistent Messaging(msgs/second)

A FewThousand

Many Tens ofThousands

Software Infrastructure Hardware Infrastructure

Transformations(sustained throughput)

MB/sec

GB/sec