different levels of information in financial data: an overview of some widely investigated databases...
TRANSCRIPT
Different levels of information in Different levels of information in financial data: an overview of financial data: an overview of
some widely investigated some widely investigated databasesdatabases Salvatore MiccichèSalvatore Miccichè
http://http://lagash.dft.unipa.itlagash.dft.unipa.it
Observatory of Complex Observatory of Complex SystemsSystems
Dipartimento di Fisica e Tecnologie RelativeDipartimento di Fisica e Tecnologie Relative Università degli Studi di Palermo Università degli Studi di Palermo
GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April 20082008
Overview of DatabasesOverview of Databases
Observatory of Complex SystemsObservatory of Complex Systems
S. Miccichè
F. LilloR. N. Mantegna
M. TumminelloG. Vaglica
C. Coronnello
EconophysicsEconophysics BioinformaticsBioinformatics Stochastic ProcessesStochastic Processes
M. Spanò
We will present an overview of some widely We will present an overview of some widely investigated financial and economic databases. investigated financial and economic databases.
Most financial databases include data about Most financial databases include data about transaction prices, bid and ask quotes, volume transaction prices, bid and ask quotes, volume
of transactions. of transactions.
In some financial databases the information In some financial databases the information about the coded identity of the market about the coded identity of the market
members acting on the order book is also members acting on the order book is also available. available.
The economic databases we will discuss The economic databases we will discuss contain financial and economic information on contain financial and economic information on over ten millions public and private companies over ten millions public and private companies
operating in Europe and USA. operating in Europe and USA.
Overview of DatabasesOverview of Databases
What do we do with them?What do we do with them?
Why Physicists are Why Physicists are interested in Financial interested in Financial
MarketsMarkets
Financial market can be considered as model complex
systems
•Many Agents/Factors•interactions are not always clear/known (NO equations, Hamiltonians ?)
G. Parisi cond-mat/0205297 Complex Systems: a Physicist's ViewPoint: “A system is complex if its behaviour crucially depends on the details of the system”
Econophysics Econophysics is a recently established discipline whose main aim is that of modeling some of the stylized facts empirically observed in the study of financial markets.
Overview of Databases: financial databasesOverview of Databases: financial databases
Methods of Statistical Physics can be applied :
•Stochastic Processes (Brownian motion, superdiffusivity, power-law tails, long-range correlation,...)•scaling•Network theory, clustering techniques, random matrix, ...•Agent-based models, ...•...
Last but not least: There is a huge amount of data!There is a huge amount of data!
1995: 1 CD per month 2003: 12-13 CD per month
Overview of Databases: financial databasesOverview of Databases: financial databases
FINANCIAL databases:FINANCIAL databases:
TAQ, Euronext, BI, TSETAQ, Euronext, BI, TSELSE, BMELSE, BME
MTSMTS
Overview of DatabasesOverview of Databases
Trade and Quote Trade and Quote (NYSE)(NYSE)- 1995 6.3 Gb
- 1996 8.1 Gb
- 1997 13.5 Gb
- 1998 20.0 Gb
- 1999 27.1 Gb
- 2000 63.1 Gb
- 2001 approx 110. Gb
- 2002 approx 180 Gb
- 2003 approx 215 Gb
Rebuild Order Book - Rebuild Order Book - LSELSE- 2002 19.5 Gb (now also 2004, 2005, 2006)
OPEN BOOK - OPEN BOOK - NYSENYSE- 2002 approx 110 Gb
Tokio (TSE)Tokio (TSE)- 2002 trades 1.6 Gb.
EURONEXTEURONEXT- 2002 6.7 Gb.
MTS- 4/2003-3/2004 4.0 Gb.
MILANO (BI)MILANO (BI)- 2002 trades 2.14 Gb.- 2002 best quotes 2.43 Gb.
Overview of Databases: financial databasesOverview of Databases: financial databases
1 Tb1 Tb
SizeSize
Transaction pricesTransaction pricesQuotesQuotes
Overview of Databases: financial databasesOverview of Databases: financial databases
Given a price S(t) at time t, the price return r(t) is:
AARRBBIITTRRAAGGEE
rt (t) S(t t) S(t)
S(t)
To start withTo start with
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
Multivariate descriptionMultivariate description
COMOVEMENTS
)(ln)(ln)( tSttStr iii )(
)()()(
tS
tSttStr ii
i
t=op-cl, 1995-2003
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
We are looking for a possible collective stochastic dynamics and/or links between
price returns / volatilities of different stocks.
PRICE RETURNS CLUSTERS
Cross-Correlation Clustering Procedure based on a similarity measure:
)ρ2(1d ijij
2
j2
j
2
i2
i
jiji
ij
rrrr
rrrrρ
where ri are the price returns time series.
subdominant ultrametric distancedistance.
Hierarchical Tree (HT) and Minimum Spanning Tree (MST).
Multivariate descriptionMultivariate description
At any t
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
Multivariate descriptionMultivariate description•Compare the dynamics of price returns of Compare the dynamics of price returns of stocks traded at different exchangesstocks traded at different exchanges -- industry sector identification at different time industry sector identification at different time horizonhorizon - sector dynamics - sector dynamics - LSE and NYSE - LSE and NYSE - are there common (stylized) facts ? - are there common (stylized) facts ?Single Linkage Clustering Analysis
MST construction (N-1)
At each step,when two elements or one element and a cluster or two clusters p and q merge in a wider single cluster t, the distance dtr between the new cluster t and any cluster r is recursively given by: dtr =min {d pr ,d qr}i.e. the distance between any element of cluster t and any element of cluster r is the shortest distance between any two entities in clusters t and r .
Planar Maximally Filtered Graph (3N-2)Planar Maximally Filtered Graph (3N-2)
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
Sinchronized dataSinchronized dataWe consider: NYSE - the 100 most capitalized stocks in 2002.
LSE - the 92 most traded stocks in 2002.
We consider high-frequency (intradayintraday) data. Transactions do not occur at the same time for all stocks.
We have to synchronizesynchronize/homogenizehomogenize the data:
NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day trading time 6trading time 6hh30’ 30’
LSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 dayLSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 day trading time 8trading time 8hh30’30’
TTrades AAnd QQuotes (TAQTAQ) database maintained by NYSE (1995-20031995-2003)
RRebuild OOrder BBook (ROBROB) database maintained by LSE (20022002)
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
The set of investigated stocksThe set of investigated stocks
NYSE 100 stocksNYSE 100 stocks
01 Technology 802 FinancialFinancial 2403 Energy 304 04 Consumer non-CyclicalConsumer non-Cyclical 11 1105 Consumer Cyclical 206 Healthcare 1207 Basic Materials 608 ServicesServices 2009 Utilities 210 Capital Goods 611 Transportation 212 Conglomerates 412 Conglomerates 4
LSE 92 stocksLSE 92 stocks
01 Technology 402 FinancialFinancial 2003 Energy 304 04 Consumer non-CyclicalConsumer non-Cyclical 12 1205 Consumer Cyclical 1006 Healthcare 607 Basic Materials 508 ServicesServices 1909 Utilities 610 Capital Goods 511 Transportation 212 Conglomerates 012 Conglomerates 0
Daily data: SLCA – hierarchy & topologyDaily data: SLCA – hierarchy & topologyNYSE daydayLSE dayday
High level of correlationHigh level of correlation High level of correlationHigh level of correlation
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
Daily data: PMFGDaily data: PMFGNYSE daydayLSE dayday
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
5-min data: SLCA – hierarchy & topology5-min data: SLCA – hierarchy & topologyLSE 5-min5-min NYSE 5-min5-min
FINANCIAL 04 out of 20FINANCIAL 04 out of 20SERVICES 02 out of 19SERVICES 02 out of 19
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
5-minute data: PMFG5-minute data: PMFGLSE 5-min5-min NYSE 5-min5-min
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
ConclusionsConclusions•The system is more hierarchically/topologically The system is more hierarchically/topologically structured at daily time horizons conferming that the structured at daily time horizons conferming that the market needs a finite amount of time to assess the market needs a finite amount of time to assess the correct degree of cross correlation between pairs of correct degree of cross correlation between pairs of stocks.stocks.•Financial and Energy seem to be structured even at Financial and Energy seem to be structured even at a low time horizon (LSE more than NYSE).a low time horizon (LSE more than NYSE).
Overview of Databases: financial databases – transaction prices - Overview of Databases: financial databases – transaction prices - synchronizedsynchronized
overnightovernight
A possible use of tick-by-tick dataA possible use of tick-by-tick data
Overview of Databases: financial databases – transaction prices – Overview of Databases: financial databases – transaction prices – tick-by-thicktick-by-thick
• The “extreme events” we consider will be related with the first crossing of any of the two barriers.
• The Mean Exit Time (MET) is simply the expected value of the time interval
Financial InterestFinancial Interest: the MET provides a timescale for market
movements.
dashed black=original datamagentamagenta = shuffle returns only
GE stock
2L2L
A possible use of tick-by-tick dataA possible use of tick-by-tick dataOverview of Databases: financial databases – transaction prices – Overview of Databases: financial databases – transaction prices – tick-by-thicktick-by-thick
QUOTESQUOTES
Time Time between between
consecutive consecutive quotesquotes
Another database: MTSAnother database: MTSOverview of Databases: financial databases – bondsOverview of Databases: financial databases – bonds
These are data of bonds traded in the European markets and managed by the MTS Group firm,
which is based in Italy. The bonds we have considered are those continuously traded In Italy in
the whole year from April 2003 to March 2004.
The state of the The state of the complete order complete order
book can bebook can be
visualized at any visualized at any period of time by period of time by
using ausing a
schematic schematic representationrepresentation
Order book data allows to follow the Order book data allows to follow the detailsdetails
of price formation in a financial marketof price formation in a financial market
Order book dataOrder book dataOverview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
The real behavior in a short time for a normal stock
- sell limit orders
- buy limit orders
○ sell market orders
x buy market orders
time (s)
pri
cex1
00
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Order book data: time evolutionOrder book data: time evolution
Representation of the order book focusing on the time dependence of order flow (the plot refers to a stock traded at London Stock Exchange)
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Order book data: time evolutionOrder book data: time evolution
A very special day
(20 Sept 2002)
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Order book data: time evolutionOrder book data: time evolution
(Coded) Identity(Coded) Identity
Overview of Databases: financial databasesOverview of Databases: financial databases
Tick-by-tick data, volume and identityTick-by-tick data, volume and identityOverview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
In the LSE and BME databases the In the LSE and BME databases the information about the coded identity of the information about the coded identity of the market members (market members (brokeragesbrokerages) acting on ) acting on the order book is also availablethe order book is also available
For LSE we have got these data under a For LSE we have got these data under a special special confidentiality agreementconfidentiality agreement: e.g. : e.g. people who uses these data MUST be people who uses these data MUST be traceable!traceable!
For BME the identity is For BME the identity is transparenttransparent in the in the market.market.
Inventory variationInventory variation: the value (i.e. price times volume) of an asset exchanged as a buyer minus the value exchanged as a seller in a given time interval .
price(2001-2004)
volumesign+1 for buys-1 for sells
In this talk, we focus on = 1 trading day
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Tick-by-tick data, volume and identityTick-by-tick data, volume and identity
i=1, …, 69(BBVA)
most activemost active
BBVA, TEF, SAN, REP
Inventory variation correlation matrix obtained by sorting the firms in the rows and columns according to their correlation of inventory variation with price return
BBVA 2003
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Tick-by-tick data, volume and identityTick-by-tick data, volume and identity
69696969
orderingordering
“trending” firms(momentum traders)
“reversing” firms(contrarians traders)
“noisy” firms
A brokerages/firms A brokerages/firms
classificationclassification
by considering the correlation
between its inventory
variation and the price return of
the traded stock;
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Tick-by-tick data, volume and identityTick-by-tick data, volume and identity
BBVA
2003
“Reversing”(negative correlation between inventory variation and price return).
“Noisy”(correlation between inventory variation and price return within noise confidence levels).
“Trending” (positive correlation between inventory variation and price return).
Number of firms in the group 37 21 11
TrendingTrending- Positively correlated with price return- Large institutions- Acting on a long time scales, splitting large orders
to build portfolio position by minimizing price impact
- Their trading activity tends to be localized in time
ReversingReversing
- Negatively correlated with price return
- Large and small institutions
- Typically acting on a short time scale, reverting continuously their position in the market
- Their trading activity tends to be homogeneous in time
NoisyNoisy- Poorly correlated with price return- Large and small institutions
Overview of Databases: financial databases – order book dataOverview of Databases: financial databases – order book data
Tick-by-tick data, volume and identityTick-by-tick data, volume and identity
ECONOMIC databases:ECONOMIC databases:
Amadeus, CompustatAmadeus, CompustatINPSINPS
Overview of Databases: economic databasesOverview of Databases: economic databases
AMADEUS is a comprehensive, pan-European database AMADEUS is a comprehensive, pan-European database containing financial information on over 10 million public and containing financial information on over 10 million public and private companies in 38 European countries. private companies in 38 European countries.
Standardised annual accounts (for up to 10 years), consolidated Standardised annual accounts (for up to 10 years), consolidated and unconsolidated, financial ratios, activities and ownership for and unconsolidated, financial ratios, activities and ownership for approximately 9 million companies throughout Europe, including approximately 9 million companies throughout Europe, including Eastern Europe. Eastern Europe.
A standard company report includes: 24 balance sheet items, 25 A standard company report includes: 24 balance sheet items, 25 profit and loss account items and 26 ratios, descriptive profit and loss account items and 26 ratios, descriptive information including trade description and activity codes (NACE information including trade description and activity codes (NACE 1, NAICS or US SIC can be used across the database), ownership 1, NAICS or US SIC can be used across the database), ownership information. information. A news module contains information from Reuters’, Dow Jones, A news module contains information from Reuters’, Dow Jones, the FT as well as M&A news and rumours from our own ZEPHYR. the FT as well as M&A news and rumours from our own ZEPHYR.
AMADEUS also contains security and price information and links AMADEUS also contains security and price information and links to an executive report with integral graphs plus a report to an executive report with integral graphs plus a report comparing the financials of the company’s default peer group.comparing the financials of the company’s default peer group.
Overview of Databases: economic databasesOverview of Databases: economic databases
The growth of a firm was initially describes by Gibrat in 1931.The growth of a firm was initially describes by Gibrat in 1931.Its model regards the logarithmic growth rateIts model regards the logarithmic growth rate
where S(t) is some proxy: total asset, employees, sells, revenue where S(t) is some proxy: total asset, employees, sells, revenue turnover, …turnover, …
Overview of Databases: economic databasesOverview of Databases: economic databases
)(ln)(ln)( tSttStr iii
The Gibrat Model is based on:The Gibrat Model is based on:
1)1) Law of proportionate effectsLaw of proportionate effects: r: rii(t) is independent on the initial (t) is independent on the initial size of the firmsize of the firm
2)2) rrii(t) and r(t) and rjj(t) are un-correlated(t) are un-correlated
By making use (i) of the Central Limit Theorem and (ii) of the By making use (i) of the Central Limit Theorem and (ii) of the additional assumption of indepenence, one can show that additional assumption of indepenence, one can show that the the logarithmic growth rate show be log-normally logarithmic growth rate show be log-normally distributeddistributed..
Logarithmic growth rateLogarithmic growth rate
Overview of Databases: economic databasesOverview of Databases: economic databases
All data are aggregatedAll data are aggregated
IC fixedIC fixed
AMADEUSAMADEUS
databasedatabase
Log-normal Log-normal laplacian laplacian what else? what else?
Overview of Databases: economic databasesOverview of Databases: economic databases
2)( rr
rrz
Z-Z-transformtransform
Data allow Data allow disaggregatiodisaggregation in terms of n in terms of
economic economic sectors of sectors of activityactivity
within sectorswithin sectors
Overview of Databases: economic databasesOverview of Databases: economic databases
Data allow Data allow disaggregatiodisaggregation year-by-yearn year-by-year
Overview of Databases: economic databasesOverview of Databases: economic databases
Exploring the role of correlation Exploring the role of correlation between firmsbetween firms
Shuffling experimentsShuffling experiments
Overview of Databases: economic databasesOverview of Databases: economic databases
ConclusionsConclusions
The availability of accurate The availability of accurate databases allows for the inspection databases allows for the inspection of the role that different variables of the role that different variables
play in the system.play in the system.
The EndThe Endmicciche@[email protected]
http://http://lagash.dft.unipa.itlagash.dft.unipa.it