Transcript

Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Copyright © 2012 Standard & Poor’s Financial Services LLC, a subsidiary of The McGraw-Hill Companies, Inc. All rights reserved.

Big Data: Wall Street Style

Jeff Sternberg Jen Zeralli S&P Capital IQ February 29, 2012

2 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Boring Financial Chart

3 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Boring Financial Chart: less boring with labels

As of 2/24/2012.

4 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Boring Financial Chart = kind of interesting, actually

More than $2.35 trillion dollars

invested in Information Technology

over the last 10 years.

Source: S&P Capital IQ Transaction Screening As of 2/24/2012.

5 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

How Does That Compare?

Total Investment over the last 10 years:

• Industrials = $3.49 trillion

• Energy = $2.61 trillion

•Healthcare = $2.47 trillion

• Information Technology = $2.35 trillion

• Telecom = $2.13 trillion

Source: S&P Capital IQ Transaction Screening. As of 2/24/2012.

6 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

So Is Big Data…

Big Money?

7 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Big Money?

Total Investment over the last three years:

• Information Technology = $774.4 billion

Source: S&P Capital IQ Transaction Screening. As of 2/24/2012.

8 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Big Money?

Total Investment over the last three years:

• Information Technology = $774.4 billion

•Big Data = $32.4 billion

Source: S&P Capital IQ Transaction Screening. As of 2/24/2012.

9 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Big Money?

Total Investment over the last three years:

• Information Technology = $774.4 billion

•Big Data = $32.4 billion

So, 4.2%

Source: S&P Capital IQ Transaction Screening. As of 2/24/2012.

10 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Big Money?

Total Investment over the last three years:

• Information Technology = $774.4 billion

•Big Data = $32.4 billion

So, 4.2%

Hey, at least we’re not just “the 1%”

Source: S&P Capital IQ Transaction Screening. As of 2/24/2012.

11 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

But What We Really Wanted To Talk About…

Strata: Making Data Work

February 29, 2012

12 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

But What We Really Wanted To Talk About…

• S&P Capital IQ: Data Is Our Product

•About Data Collection

• Standardization

• Linking: The Curious, Special Case of Entities

• Suggesting Data

•Projections

13 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

S&P Capital IQ: Data Is Our Product

Strata: Making Data Work

February 29, 2012

14 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Data Is Our Product

15 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Data Is Our Product

• Capital IQ started as an investment bank in 1999*

• Data = competitive advantage over other banks

• Built a database of financial investments,

relationships and transactions

*Acquired by Standard and Poor’s in 2004, now part of S&P Capital IQ.

16 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Hey, Let’s Sell That!

For illustrative purposes only. Source: S&P Capital IQ as of 2/2012.

17 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Data Is Our Product: What We Offer

Datasets

• Financials and

Valuation

• Qualitative Data

• Global Market Data

• Sell-Side Research

• Earnings Estimates

• News and Events

• Fixed Income

• Alpha and Risk Models

• Research Companies

• Generate Ideas

• Build Models

• Monitor Markets

• Analyze Performance

• Quantitative

Research

• Web Portal

• Real-Time

Workstation

• ClariFi

• Mobile

• Data Feeds

• Web Services

• Office Plug-Ins

Use Cases Tools

18 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Data Is Our Product: Who We Help

• Investment Bankers

• Asset Managers

• Private Equity Firms

• Venture Capital Firms

• Credit/Equity Analysts

• Corporations

• Consultants and Advisors

• Academia & Government

19 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Data Is Our Product: Some Stats

Company and Person Profiles

Companies with full quantitative data 100,000

Private company profiles 2.7 million

Professionals and board members 4.2 million

Quantitative data points per company 5,000

Qualitative data points per company 1,500

Transactions

M&A Transactions 425,000

Private Placements 190,000

Public Offerings 138,000

News and Key Developments

Daily News articles across 184 countries 16,000

Key Developments (curated news) 9.7 million

As of 2/2/2012.

20 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Data Is Our Product

DEMO

21 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

Strata: Making Data Work

February 29, 2012

22 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

To Have A Data Product, One Must First Acquire Data.

23 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

Data Collection Goals

• Coverage

• Quality

• Timeliness

• Auditability

24 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

• It starts with documents – 67,000 per day

• Sources

– Company filings (SEC)

– News feeds (press releases)

– Web crawling

• We store these in our document repository

25 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

• Document repository

– SQL for metadata

– “Regular” file storage for docs

– Solr/Lucene indexing for fast search

– 99.3 million documents

– 240.3 million versions (files)

As of 2/24/2012.

26 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

Document_tbl

documentID int PK sourceID smallint FK

Version_tbl

versionID int PK documentID int FK rootID smallint FK

versionIndex smallint filePath varchar(100)

html, pdf, text, sgml, …

+ Filesystem: Document Repository SQL db:

Element_tbl

elementID int PK [doc/vers/rel]ID int FK typeID int FK

value [strongly typed]

ObjectRel_tbl

relID int PK documentID int FK objectID int FK

27 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

• Content search

– Which docs have relevant content?

– Search rules drive collection workflow

– 1000+ search rules per doc

– 65,000+ automated searches

per day

28 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

• Collection workflow

– Core engine that routes work items

– Organized into Processes, Stages, Statuses

– Prioritization based on usage (and others)

– Simple GetNext(), Commit() API

– 177.8 million Commits in 2011

– Avg. 130K+ Commits per day in Financials

As of 2/24/2012.

29 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

About Data Collection

• Collection process

– Automated extraction

– Manual collection

– 1000s of quality checks

Basic integrity

Variance from prior period

– All data stored “as reported” with Doc ID

30 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Standardization

Strata: Making Data Work

February 29, 2012

31 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Standardization

Compare “apples to apples” (or Facebooks)

For illustrative purposes only. Source: S&P Capital IQ as of 2/24/2012.

32 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Linking: The Curious, Special Case Of Entities

Strata: Making Data Work

February 29, 2012

33 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Linking: Managing Entities

• Entities we like to think about

– Companies (public, private, investment firms)

– Government agencies (the Fed)

– Governments (munis, countries, the EU)

– Securities (equity or debt, issued by the above)

– Indices, funds, rates, other aggregations

– People (executives, board members,

investors, shareholders)

34 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Linking: Managing Entities

• Goal: Blend entity data from different sources

– Ex: unified view of stock price and ratings

• First: What’s the identifier? Or identifiers?

– Name, ticker, CUSIP®, others…

• Next: Can we auto-link?

– Use historical links to make future links easier

• Quality checks

– Look for outlier cases

• Remember that things change over time

– So entity links create a time series

35 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

An Example Of Difficult Entity Linking: Public Ownership

• Tracks portfolio holdings and values over time

– Example: Vanguard vs. Fidelity Funds

• Many disparate sources

– Reported from both “owner” and “owned” side

– Varied requirements by exchange (50+ countries)

• Many different entity types

– People, Institutions, Pension Funds, Mutual Funds…

– Common Equity, Derivatives, Options…

• Many different security identifiers

– CUSIP®, ISIN, SEDOL, Ticker, Name, etc.

36 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Suggesting Data

Strata: Making Data Work

February 29, 2012

37 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Suggesting Data

• Goal: Platform that learns from user behavior

• Suggest company profiles that the user may be

interested in viewing

• Use “data exhaust”

to build better

products

38 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Suggesting Data

• Challenges

– We’re an impartial

data platform

– We may not provide

investment advice!

– Clients are super-secret

about their deals

– Ergo, can’t use collaborative filtering approach

39 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Suggesting Data

• Advantage: We have lots of great data!

• Key developments

– Curated news product

– “Get smart” on a company

– News searches catch interesting press releases

– In-house researchers ensure:

Quality entity linking

Event typing (categorization)

40 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

For illustrative purposes only.

41 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Suggesting Data

• Key development event ranking

–Popular & infrequent events = interesting

–Example: Dividend increase is more noteworthy than dividend affirmation

• User selectivity

–Based on clicks

–Sector, region, company type

42 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Suggesting Data

• Score each suggestion for each user based on signals via Hadoop + Hive

• Remove items that the user has already seen!

• Present in a “widget” on the “dashboard”

• Measure the clickthroughs

• Rinse, wash, repeat

43 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Companies You May Be Interested In

For illustrative purposes only.

44 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Companies You May Be Interested In

For illustrative purposes only.

45 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Projections

Strata: Making Data Work

February 29, 2012

46 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Projections

As of 2/24/2012.

47 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Projections

As of 2/24/2012.

48 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Projections

As of 2/24/2012.

49 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Projections – Simple Growth Rates

Transaction Valuation

First Year ($ billion)

3-year Total ($ billion)

Information Technology 209.8 774.4

Big Data 5.0 32.4

• Let S represent the first year • Let T represent the 3-year total • Let x represent the yearly growth rate (%) • Solve for x:

As of 2/24/2012.

50 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Projections – Simple Growth Rates

Transaction Valuation

First Year ($ billion)

3-Year Total

($ billion)

Yearly Growth

Rate (%)

Information Technology 209.8 774.4 21.5%

Big Data 5.0 32.4 89.4%

• When will Big Data catch up with IT? • Let y be the number of years this will take • Solve for y:

As of 2/24/2012.

51 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

So Is Big Data…

Big Money? YES!

52 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Questions?

• We’re hiring! • We want to learn

from YOU! S&P Capital IQ http://www.spcapitaliq.com Jeff Sternberg [email protected] Jen Zeralli [email protected]

53 Permission to reprint or distribute any content from this presentation requires the prior written approval of S&P Capital IQ. Not for distribution to the public.

Copyright © 2012 by Standard & Poor’s Financial Services LLC (S&P), a subsidiary of The McGraw-Hill Companies, Inc. All rights reserved. No content (including ratings, credit-related analyses and data, model, software or other application or output therefrom) or any part thereof (Content) may be modified, reverse engineered, reproduced or distributed in any form by any means, or stored in a database or retrieval system, without the prior written permission of S&P. The Content shall not be used for any unlawful or unauthorized purposes. S&P, its affiliates, and any third-party providers, as well as their directors, officers, shareholders, employees or agents (collectively S&P Parties) do not guarantee the accuracy, completeness, timeliness or availability of the Content. S&P Parties are not responsible for any errors or omissions, regardless of the cause, for the results obtained from the use of the Content, or for the security or maintenance of any data input by the user. The Content is provided on an “as is” basis. S&P PARTIES DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR USE, FREEDOM FROM BUGS, SOFTWARE ERRORS OR DEFECTS, THAT THE CONTENT’S FUNCTIONING WILL BE UNINTERRUPTED OR THAT THE CONTENT WILL OPERATE WITH ANY SOFTWARE OR HARDWARE CONFIGURATION. In no event shall S&P Parties be liable to any party for any direct, indirect, incidental, exemplary, compensatory, punitive, special or consequential damages, costs, expenses, legal fees, or losses (including, without limitation, lost income or lost profits and opportunity costs) in connection with any use of the Content even if advised of the possibility of such damages. Credit-related analyses, including ratings, and statements in the Content are statements of opinion as of the date they are expressed and not statements of fact or recommendations to purchase, hold, or sell any securities or to make any investment decisions. S&P assumes no obligation to update the Content following publication in any form or format. The Content should not be relied on and is not a substitute for the skill, judgment and experience of the user, its management, employees, advisors and/or clients when making investment and other business decisions. S&P’s opinions and analyses do not address the suitability of any security. S&P does not act as a fiduciary or an investment advisor. While S&P has obtained information from sources it believes to be reliable, S&P does not perform an audit and undertakes no duty of due diligence or independent verification of any information it receives. S&P keeps certain activities of its business units separate from each other in order to preserve the independence and objectivity of their respective activities. As a result, certain business units of S&P may have information that is not available to other S&P business units. S&P has established policies and procedures to maintain the confidentiality of certain non–public information received in connection with each analytical process. S&P may receive compensation for its ratings and certain credit-related analyses, normally from issuers or underwriters of securities or from obligors. S&P reserves the right to disseminate its opinions and analyses. S&P's public ratings and analyses are made available on its Web sites, www.standardandpoors.com (free of charge), and www.ratingsdirect.com and www.globalcreditportal.com (subscription), and may be distributed through other means, including via S&P publications and third-party redistributors. Additional information about our ratings fees is available at www.standardandpoors.com/usratingsfees. STANDARD & POOR’S and S&P are registered trademarks of Standard & Poor’s Financial Services LLC.

www.standardandpoors.com


Top Related