financial information grid –an esrc e-social science pilot

43
FING RID R ES-149-25-0028 A llH ands M eeting, 31/08-3/09,2004 Nottingham Financial Information Grid –an ESRC e-Social Science Pilot Khurshid Ahmad Department of Computing, University of Surrey; Jon Nankervis Department of Accountancy and Finance, University of Essex

Upload: shada

Post on 11-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Financial Information Grid –an ESRC e-Social Science Pilot. Khurshid Ahmad Department of Computing, University of Surrey; Jon Nankervis Department of Accountancy and Finance, University of Essex. FINGRID Project. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Financial Information Grid –an ESRC e-Social Science Pilot

FINGRID

RES-149-25-0028

All Hands Meeting, 31/08-3/09, 2004 Nottingham

Financial Information Grid –an ESRC e-Social Science

Pilot

Khurshid AhmadDepartment of Computing, University of

Surrey;

Jon NankervisDepartment of Accountancy and Finance, University of

Essex

Page 2: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID ProjectThe FINGRID project is a collaboration

between econometricians at Essex, computing academics, particularly in grid computing and artificial intelligence, at Surrey (plus financial traders).

The FINGRID project aims to provide a solution for the information management/ processing challenge in social sciences: analysis and fusion of distributed quantitative and qualitative data and programs.

FINGRID is the third project at Surrey that deals with qualitative data (news and reports) and qualitative data (time series) EU Projects ACE (1996-99), GIDA (2001-03).

Page 3: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Objectives

Create a Grid environment based on Open Grid Services Architecture to provide a demonstrable software application, for analysing financial information in the form of quantitative and qualitative data.

Evaluate the benefits of the Grid approach.

Page 4: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Reflections

DAME (York): Engine Behaviour Time-series + Reports in a controlled language; Case-based Reasoning;

Belfast e-Science Centre: Value at Risk Computation;

MYGRID and MIAKT: Information Extraction + Image Annotation

Page 5: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Project Team

David Cheng, Research Officer, Text Analysis; (ESRC funded)Tuğba Taşkaya-Temizel, Tutor, Grid Computing, Grid

Architect;Lee Gillam, Research Officer, Grid Implementation;

Pensiri Manumapousat, Research Student, Text Categorisation;

Saif Ahmad, Research Student, Wavelet Analysis;Hayssam Trablousi, Research Student, Named Entity

Extraction;Ademola Popoula, Research Student, Fuzzy Logic Analysis;

Gary Dear, Computing Officer, Grid Implementation;

Khurshid Ahmad, Principal Investigator;Jon Nankervis, Co-Investigator (Essex)

ESRC Funding: Fifty Thousand Pounds Sterling (Gross).

Page 6: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Social science research requires the capture and analysis of data that is quantitative - numerical data - and data that is qualitative - opinions expressed in language or other sign systems.

The fusion of multi-modal information, is critical to social

sciences research.

The Problem

Page 7: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

The Problem – Decision Making

Challenges: Hypothesis formation and theory development in

financial and political economics,both by researchers and financial traders, now involves

analysis of streaming time serial data and financial and political news.

The Data:Numerical data Time series

price/value movement of financial instruments;

c. 5MB/day, per instrument

Textual data Text streams different genres:

news items; financial reports; company brochures; government documents

c. 20MB/day

Page 8: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Streaming Time-serial Data and News Service

STREAMING ECONOMIC/POLITICAL NEWS-

Reuters; Yahoo; Bloomberg, BBC! Al Jazeera

Page 9: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

The Problem – Decision Making

• Financial and political analysis requires data over short time periods (daily) or longer time periods (5-10 years).

• This is large volume of data which requires instant processing – much like data emerging from particle or gene factories- except that the data is in two or more modalities in our case.

•The financial/political analysis requires access to data tombs (archives) and data nurseries (streaming news and time-series)

Page 10: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

The Problem – Decision Making

•Decision making involves dealing with factual news (who, where, what,

when) and news related to ‘market sentiment’ news

•Decision making involves dealing with time-ordered data which lacks stochastic stability and has considerable variance changes.

Page 11: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment?

In addition to the very quantitative data related to trading volumes and price movements, the financial traders, and increasingly economists, rely on market sentiment.

Behaviour of the investors, security analysts, and financial/monetary theoreticians, is influenced by information other than market data: investor credulity; herding sentiment analysis

Page 12: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment? MotivationBounded Rationality

Herbert Simon(Nobel Prize in Economics 1978)Rational Decision Making in Business Organisations:

Mechanisms of Bounded Rationality –failures of knowing all of the alternatives, uncertainty about relevant exogenous events, and

inability to calculate consequences .

Daniel Kahneman (Nobel Prize in Economics 2002)Maps of bounded rationality –intuitive judgement & choice:

Two generic modes of cognitive function: an intuitive mode: automatic and rapid decision making; controlled mode deliberate and

slower.

E-Economics? FINGRID?Computing at the limits of rationality

distributed multi-modal data analysis and fusion

Page 13: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment, Behavioral PsychologyInvestor sentiment & stock market bubbles has some causal relationship with:

Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.

1961 -tronics mania

1967 franchise and computer ‘crazies’

1983 high tech issues

2001 dot.com

Page 14: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment, Quantitative Behavioral

Psychology Investor sentiment can be affected by: Closed-end fund discount (CEFD); Turnover ratio (in NYSE for example) (TURN) Number of Initial Public Offerings (N-IPO); Average First Day Returns on R-IPO Equity share S Dividend Premium Age of the firm, external finance, ‘size’(log(equity))…….

A novel composite index: Sentiment = -0.358CEFDt+0.402TURNt-1+0.414NIPOt

+0.464RIPOt+0.371 St-0.431Pt-1

A very complex non-linear regression on large data sets – computed on monthly basis

Baker, M., & Wurgler, J. (2003). ‘Investor sentiment and cross-section of stock returns. Proc. Conf on Investor Sentiment.

Page 15: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Contribution• Extraction of market sentiments using a ‘local

grammar’ of rise/fall, growth/decay coupled with attributed and un-attributed news (rumours).

• Automatic analysis of terminology and ontology: Financial Trading has 25 sub-domains.

• An integrated framework of time-series analysis (pre-processing, filtering, trend and seasonality, variance change) using wavelet analysis and fuzzy-logic.

• Neural network based classifiers for classifying streaming news.

• Implementation of a Grid-based solution and ‘daily’ market report service.

Page 16: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.

Page 17: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

What we need…

A common infrastructure: for interoperability and reusability for aggregating distributed

resources to create a single-source computing power and provides seamless access

which allows sharing geographically distributed resources

Page 18: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Is Grid Computing the Solution?

IBM on Financial Grid Computing: Grid computing enables the virtualisation of distributed computing and data resources

@ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml

Page 19: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Is Grid Computing the Solution?

GRID Resource Sharing; Collaboration: Financial Economics, Sociology

of Poverty, Policy Formation Working with living data

much Grid work relates to data tombs social sciences with data nurseries

living data is unstable, incomplete, and requires at least two interdependent modalities – one compensates for the other

Software, including legacy, is in silos and its operation based on tradition. Packages come with experts!

‘Home’ punters – everybody plays the market

Speed up – factor of 5 in text analysis; 3-4 in Monte Carlo simulations

@ IBM “What is grid computing?” http://www-1.ibm.com/grid/about_grid/what_is.shtml

Page 20: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Infrastructure in Surrey

A 24-node data and compute Grid interfaced to a ‘real world’ data stream (Reuters News and Financial Time series Feed) for capturing, analysing and fusing quantitative and ‘qualitative’ data.

Reuters Feed: 2 dedicated data lines, PC and Sun for feed management and associated networking

Page 21: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Infrastructure: Reuters Financial Services Streaming Data

and News Service

Page 22: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Architecture

A 3 tier Architecture

The first tier facilitates the client in sending a request to one of the services: Text Processing Service or Time Series Service;

The second tier facilitates the execution of parallel tasks in the main cluster and is distributed to a set of slave machines (nodes);

The third tier comprises the connection of the slave machines to the data providers

Page 23: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Architecture

Streaming Textual Data

GRID Cluster24 Slaves

Streaming Numeric DataMain Cluster

Text and Time Series Service

Notify user about results

Distribute Tasks

Receive Results

Send Service Request

1

2

34

Surrey Grid•Given an allocated task, the corresponding data is retrieved from the data providers by the slave machines. •The main cluster monitors the slave machines until they have completed their tasks, and subsequently combines the interim results. •The final result is sent back to the client machine.

Page 24: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID TechnologyGlobus Toolkit 3.0 (based on Open Grid

Services Architecture (OGSA)) Java CogKit (Java Commodity Grid) for resource

management Languages for Development JAVA + Reuters SSL

Developer’s Kit (Java) for the connection with the Reuters streaming data

Applications Integrated: Existing statistical programs in FORTRAN

Matlab: JMatlink (adapted to Linux environment for the communication with Matlab environment)

Other Technologies: XML (NewsML) for the news information CGI for communication of Java Applet with the server side

Page 25: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Services

News Analysis: service for extracting MARKET SENTIMENT.

Correlation: Market sentiment correlation with financial time series.

Bootstrapping: service for computing standard errors, confidence intervals and hypothesis testing by a simulation of the time series or market sentiment series.

Page 26: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Service: Market Sentiment

At one level market sentiment is often expressed in news reports and editorials, and ranges from views about national economies to the imminent take-overs, mergers and acquisitions and from people leaving/joining an organization to news about political and economic successes and failures.

Page 27: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment

Sentiments are expressed using metaphors.The metaphors, bullish and bearish, so-called

animal metaphors, refer to the aggressive or recessive (shy) mood of the investors and perhaps of the traders.

The sentiment words are typically used metaphorically and in general are ambiguous (‘rose’ may be used in different contexts and indeed as a proper noun).

The local grammar reduces the ambiguity by constraining the use of the sentiment words.

Page 28: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment

A finite state automata (local grammar), learnt by our system, from a news corpus, for identifying ‘sentiments’ in free text unambiguously, was used for extracting sentiment information.

Page 29: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Market Sentiment

A finite state automata (local grammar), was learnt by our system, from a news corpus, for identifying names of persons and organisations in free text unambiguously, was used for attributing sentiment information to people and organisations.

Page 30: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Case Studies & ResultsText Analysis Service

For the Brown Corpus, the number of words processed per second is similar to Hughes et al.: 7,120 versus 6,670 in a single CPU system.

Our 2-node grid implementation shows a 98% gain of performance, whereas Hughes et al. (SMP configuration, equivalent to our 2-node grid) implementation shows a 27% gain.

Relative performance of the word frequency counting experiment on the RCV1 corpus is lower than the Brown corpus - it is necessary to parse the XML files prior to processing.

  Brown RCV1Words/s (1 machine) 7,120 -

Words/s (2 machines) 14,091 5,334

Words/s (4 machines) 23,944 10,532

Words/s (8 machines) 31,453 14,590

Page 31: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Case Studies & ResultsText Analysis Service

A Java program for sentiment extraction has been developed.

Experiments on Reuters RCV1 corpus (2.3GB) were conducted. Significant improvement on processing time: 15.9 hours on a 4-node grid to 13.1 hours on a 8-node grid.

Text Analysis

0

100

200

300

400

500

600

1 2 4 8

# of machines

Tim

e in

sec

onds

Text Analysis (process time in ms)

Time required to process a month news with different configurations

Page 32: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Service: Fusing quantitative & qualitative

information Time serial data related to financial instruments, for

example, currency, stocks, derivatives, often exhibit nonstationarity.

In order to extract long-term trends, seasonal variation, and the random component, in a complex time-series, increasingly multi-scale analysis and fuzzy-logic is used.

The positive and negative sentiments related to a financial instrument may be ordered as a time series.

This sentiment series is then correlated with the movement of a financial instrument.

Such correlation can be used for prediction, or better still for the analysis of (volatile) movements in the market.

Page 33: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.

Page 34: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

FINGRID Service: Bootstrapping & Large-scale

simulations

Bootstrap method assumes that the observed data is a representative of the unknown population.

Bootstrap procedures are data-based simulation methods that estimate the distribution of estimators by re-sampling observed data.

Statistical inferences obtained from distributions of simulated data are reported to be more reliable than inferences gained from asymptotic theory when the sample size is infinitely large (MacKinnon 2002).

Bootstrap tests and Monte Carlo tests are examples of simulation-based tests.

Page 35: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Case Studies & Results Bootstrapping

Java-wrapped (Fortran) implementations of bootstrapping algorithm.

processing time of the bootstrapping program with different grid node configurations, starting from two-node to eight-node, was measured.

Simple Bootstrapping

0

500

1000

1500

2000

2500

1 2 4 8

# of machines

Tim

e in

se

con

ds

Bootstrap rep=500 Bootstrap rep=1000

When the number of bootstrap replications set to 1000, 1050 seconds was required on a 2- node grid; and 404 seconds on a 8-node grid

Page 36: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

We have developed a Sentiment and Time Series: Financial analysis system (SATISFI) for visualising and correlating the sentiment and instrument time series both as text (and numbers) and graphically as well.

Page 37: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

Page 38: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

Page 39: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

Page 40: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Fusing Qualitative and Quantitative Data Analysis

Page 41: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Conclusion

We have identified the following problems that may cause performance degradation in a grid environment:

The configurations of the machines: During the distribution of tasks, we did not consider the configuration of the machines faster machines were idling while the rest were still processing.

One common data source: Network latency occurs due to the number of nodes using the same bandwidth to retrieve files.

Amdahl’s law: Amdahl’s law is applicable to our grid, where the fraction of code f, which cannot be parallelised, affects speedup factor.

Program constraints: In the task distribution process, the file size is not considered.

Page 42: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Conclusions

The FinGrid project has achieved three major objectives.

The project demonstrates how both quantitative and qualitative data from multiple sources can be processed, analysed, and fused.

It has raised considerable interest in the financial news information market ( Ahmad et al. 2004).

Contribution in terms of improvements to goods and services and financial software houses, and news vendors have shown interest in the project.

A Master’s level Grid Computing module has been developed based on our experience in FinGrid.

Page 43: Financial Information Grid –an ESRC e-Social Science Pilot

Fingrid (RES-149-25-0028)All-Hands Meeting, Nottingham, 3 September 2004

Next StepsInvestigate and evaluate Condor-G, MPICH2

and OGSA-DAI for effective job management, parallel processing and database management.

Towards a knowledge grid PARALLEL and DISTRIBUTED KNOWLEDGE DISCOVERY:

Continual analysis and fusion of text and numerical data both real-time and historical data.

KNOWLEDGE GRID SERVICES:

KNOWLEDGE RETRIEVAL: Adapt information extraction methods and systems (e.g. Surrey’s SYSTEM QUIRK) onto a GRID architecture for extended semantic analysis.

KNOWLEDGE MODELLING: Representation of non-stationary time series using Wavelet Analysis, Neural Networks and Fuzzy Logic, such that the system learns from its past experience.