big data analytics in finance - seoul national...

117
Big Data Analytics in Finance Tze Leung Lai Stanford University December, 2015

Upload: others

Post on 13-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Big Data Analytics in Finance

Tze Leung Lai

Stanford University

December, 2015

Page 2: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Outline

Big Data Analytics for the Insurance IndustryBig data in insurance industryUsage Based Insurance

Big Data Problems: Bank, SME and P2P LendingBank’s risk managementSmall and Medium-Sized Enterprise (SME)Peer to Peer (P2P) Lending

Page 3: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Big data in insurance industry

I The amount and variety of data available to insurance companiestoday provide a wealth of new opportunities to increase revenue,control costs, and counter competitive threats.

I Huge volumes of data related to demographics, psychographics,claims trends, and product- related information are starting toenable better risk assessment and management, new productstrategies and more e�cient claims processing.

I Some of the use cases for Big Data analytics in insurance include:

I Risk avoidanceI Product personalizationI Cross selling and upsellingI Fraud detectionI Catastrophe planningI Customer needs analysis

Page 4: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Risk avoidance

I Today, relationships between insurance agents and their customersand communities are decentralized and virtual. Insurers can,however, access a myriad of new sources of data and buildstatistical models to better understand and quantify risk.

I These Big Data analytical applications include behavioral modelsbased on customer profile data compiled over time cross-referencedwith other data that is relevant to specific types of products. Forexample, an insurer could assess the risks inherent in insuring realestate by analyzing satellite data of properties, weather patterns,and regional employment statistics.

Page 5: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Product personalization

I The ability to o↵er customers the policies they need at the mostcompetitive premiums is a big advantage for insurers. This is moreof a challenge today, when contact with customers is mainly onlineor over the phone instead of in person.

I Scoring models of customer behavior based on demographics,account information, collection performance, driving records, healthinformation, and other data can aid insurers in tailoring productsand premiums for individual customers based on their needs and riskfactors.

I Some insurers (in usage-based insurance) have begun collectingdata from sensors in their customers cars that record average milesdriven, average speed, time of day most driving occurs, and howsharply a person brakes.

I This data is compared with other aggregate data, actuarial data,and policy and profile data to determine the best rate for eachdriver based on their habits, history, and degree of risk.

Page 6: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Cross selling and upselling

I Collecting and gathering data across multiple channels, including

I Web site click stream data,I social media activities,I account information,

and other sources can help insurers suggest additional products tocustomers that match their needs and budgets.

I This type of application can also look at customer habits to assessrisks and suggest alteration of habits to reduce risks.

Page 7: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Fraud detection

I Insurance providers are looking beyond algorithmic fraud detectiontechniques that are claim-centric, to ones that are personcentric.

I These techniques focus on analyzing beneficiary behavior acrossclaims, providers, and other sources of information (e.g. how manysimilar claims were submitted by the same individual, reported bythe same individual), and extend to data sources beyond the firewallto analytics based on external information (e.g. cohort analysis -using a persons social graph to look for similar activities amongconnected individuals), and considering networks of people ratherthan just individuals.

I Collecting data on behaviors from online channels and automatedsystems help determine the potential for and existence of fraud.

I These activities can help create new models to identify patterns ofboth normal and suspect behavior that can be used to combat theincreasingly sophisticated perpetration of insurance fraud.

Page 8: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Catastrophe planning

I Being proactive instead of reactive when extreme weather ispredicted or during or after its occurrence can in some cases lessenthe extent of claims and accelerate responses by insurers.

I In the past, this type of analysis was done through statistical modelsat headquarters but with the ability to gather data directly fromcustomers and other sources in real-time, more actionableinformation can be gathered and acted upon.

I USGS researchers found that people Tweeting about actualearthquakes kept their Tweets really short, even just to ask,earthquake? Concluding that people who are experiencingearthquakes arent very chatty, they started filtering out Tweets withmore than seven words.

I They also recognized that people sharing links or the size of theearthquake were significantly less likely to be o↵ering firsthandreports, so they filtered out any Tweets sharing a link or a number.Ultimately, this filtered stream proved to be very significant atdetermining when earthquakes occurred globally.

Page 9: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Customer needs analysis

I Automating the discussion between prospects and advisors aboutcomplex insurance products such as life and annuity, based on acustomers desires and resources can enhance the sales process.

I These applications based on business rules go beyond simpledecision trees and algorithms to provide faster and more dependableinformation and options as part of the sales dialogue.

Page 10: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Others

I Other Big Data analytics applications in insurance include

I loyalty management,I advertising and campaign management,I agent analysis,I customer value management, andI customer sentiment analysis.

I These applications can enhance marketing, branding, sales, andoperations with business insights that lead to informed actions.

Page 11: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Examples

I MetLife is using Big Data applications to look at hundreds ofterabytes of data for patterns to gauge how well the company isdoing on minimizing risk, understanding how various products areperforming, and what the trends are.

I Travelers is using Big Data applications to rationalize product linesfrom new acquisitions and to understand the risks from globalgeopolitical developments.

I Progressive Insurance and Capital One are conducting experimentsto segment their customers using Big Data and to tailor productsand special o↵ers based on these customer profiles.

Page 12: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Examples

I In China, Zhongan Insurance has been developing the Shipment FeeInsurance with Alibaba, which covers a specific amount of theshipment fee, like 9 RMB, if a customer wants to ship back theproduct he purchased on Alibabas online market Taobao.

I Previously, the insurance premium is set using rule of thumb, like0.5RMB without precise definition. Applying Big Data analysis, theinsurance company is trying to develop models using data like therate of return of goods, characteristics of the buyers, etc.

Page 13: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Usage-based insurance (UBI)

I Also known as pay as you drive (PAYD) and pay how you drive(PHYD) and mile-based auto insurance is a type of vehicleinsurance whereby the costs are dependent upon type of vehicleused, measured against time, distance, behavior and place.

I The simplest form of usage-based insurance bases the insurancecosts simply on the number of miles driven.

I However, the general concept of PAYD includes any scheme wherethe insurance costs may depend not just on how much you drive buthow, where, and when one drives.

Page 14: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Pay as you drive (PAYD)

Pay as you drive (PAYD) means that the insurance premium is calculateddynamically, typically according to the amount driven. There are threetypes of usage-based insurance:

I Coverage is based on the odometer reading of the vehicle.

I Coverage is based on mileage aggregated from GPS data, or thenumber of minutes the vehicle is being used as recorded by avehicle-independent module transmitting data via cellphone or RF(radio frequency connector)technology.

I Coverage is based on other data collected from the vehicle,including speed and time-of-day information, historic riskiness of theroad, driving actions in addition to distance or time travelled.

The formula can be a simple function of the number of miles driven, orcan vary according to the type of driving or the identity of the driver.Once the basic scheme is in place, it is possible to add further details,such as an extra risk premium if someone drives too long without a break,uses their mobile phone while driving, or travels at an excessive speed.

Page 15: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Telematic usage-based insurance

I The latter two types, in which vehicle information is automaticallytransmitted to the system

I It provides a much more immediate feedback loop to the driver, bychanging the cost of insurance dynamically with a change of risk.This means drivers have a stronger incentive to adopt saferpractices.

I For example, if a commuter switches to public transport or toworking at home, this immediately reduces the risk of rush houraccidents. With usage-based insurance, this reduction would beimmediately reflected in the cost of car insurance for that month.

Page 16: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Measurement using smartphones

The smartphone as measurement probe for insurance telematics has beensurveyed. Benefits to drivers include:

I Reduced accident frequency and severity

I Faster emergency response time following an accident

I Improved tracking to recover stolen vehicles

I Greater accuracy in establishing fault when settling claims

I Reduced driving, pollution, tra�c congestion and energyconsumption

Page 17: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

UBI in US and Canada

I Most of the top 20 auto insurers in the U.S. have implemented orare developing UBI products.

I The expansion in Canada may be even more dramatic. In 2013,there were only small pilot UBI products in Canada before the massmarket launch of Ajusto by Desjardins in Ontario and Quebec. Justover a year later, more than half of the top 10 Canadian insurershave either launched or are developing UBI products.

Page 18: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

DriveAbility

Towers Watson has provided the DriveAbility service for UBI. DriveAbilityincludes consulting guidance by industry leaders, telematics devices andservices, hosted data cleansing, expert analytics and UBI risk scoring.

I Pooled data: With access to the industrys only pooled databasethat merges driver behavior with actual loss costs, insurers canimmediately implement our predictive DriveAbility score withconfidence.

I Predictive score: Our driving score is three times more predictivethan existing rating characteristics, and future scores will onlyimprove on this.

I Hosted infrastructure: We collect, scrub, link, store and analyzedata for you, saving you time and money.

I Project management: Pre-built tools for planning andimplementation will increase your speed to market.

Page 19: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Challenges: Privacy and cost

I Privacy concerns

I Legislation requiring disclosure of tracking practices anddevices has been enacted by some states in US.

I The data can be collected is limited.I Thanks to the increasing usages of mainstream technology

devices (such as smartphones, tablets, and GPS devices) andsocial media networks (such as Facebook and My Space),acceptance of information sharing is growing.

I Costly

I Collecting and sensitizing driving data need costly technology.I Still much uncertainty about the selection and analysis of the

collected driving data. It is also sometimes unclear how tointegrate the data into existing or new pricing models tomaintain profitability.

I Putting lower-risk drivers into UBI programs that o↵er lowerpremium could lower overall insurer profitability.

Page 20: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Challenges: regulatory requirements

I Regulatory requirements management

I Many states require insures to obtain approval for the use ofnew rating plans. Rate filings usually must include statisticaldata that supports the proposed new rating structure.

I Although there are general studies demonstrating the linkbetween mileage and risk, individual driving data and UBI planspecifics are considered proprietary information of the insurer.This can make it di�cult for an insurer who does not havepast UBI experience.

I Other requirements that could prevent certain UBI programsinclude the need for continuous insurance coverage, upfrontstatement of premium charge, set expiration date, andguaranteed renewability.

I However, it should be noted that a Georgia Institute ofTechnology survey of state insurance regulations (2002) foundthat the majority of states had no regulatory restrictions thatwould prevent PAYD programs from being implemented.

Page 21: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Bank retail loans and FICO score

I Bank retail loans consist of residential mortgages, credit cards, carloans, personal unsecured loans and other consumer loans. Toapprove a loan, the bank evaluates the borrowers income, net worthas well as his/her FICO score.

I The FICO score was first introduced in 1989 by FICO, then calledFair, Isaac, and Company. The FICO model is used by the vastmajority of banks and credit grantors, and is based on consumercredit files of the three national credit bureaus: Experian, Equifax,and TransUnion.

I A consumers credit file has the data each time he/she applies for aloan or a credit card plus his/her payment history. Because aconsumer’s credit file may contain di↵erent information at each ofthe bureaus, FICO scores can vary depending on which bureauprovides the information to FICO to generate the score.

I The large number of consumers makes it as a big data problem.

Page 22: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Bank retail loans and FICO scoreCredit scores are designed to measure the risk of default by taking intoaccount various factors in a person’s financial history. Although the exactformulas for calculating credit scores are secret, FICO has disclosed thefollowing components:

I 35%: payment history:I This is best described as the presence or lack of derogatory

information.I Bankruptcy, liens, judgments, settlements, charge o↵s,

repossessions, foreclosures, and late payments can cause aFICO score to drop.

I 30%: debt burden:I considers a number of debt specific measurements.I According to FICO there are some six di↵erent metrics in the

debt category including the debt to limit ratio, number ofaccounts with balances, amount owed across di↵erent types ofaccounts, and the amount paid down on installment loans.

I 15%: length of credit history aka Time in File:I As a credit history ages it can have a positive impact on its

FICO score.I There are two metrics in this category: the average age of the

accounts on your report and the age of the oldest account.

Page 23: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Bank retail loans and FICO score

I 10%: Types of credit used (installment, revolving, consumerfinance, mortgage):

I Consumers can benefit by having a history of managingdi↵erent types of credit.

I 10%: recent searches for credit:

I Hard credit inquiries, which occur when consumers apply for acredit card or loan (revolving or otherwise), can hurt scores,especially if done in great numbers.

I Individuals who are ”rate shopping” for a mortgage, auto loan,or student loan over a short period (two weeks or 45 days,depending on the generation of FICO score used) will likely notexperience a meaningful decrease in their scores as a result ofthese types of inquiries.

I It is because the FICO scoring model considers all of thosetypes of hard inquiries that occur within 14 or 45 days of eachother as only one. Further, mortgage, auto, and student loaninquiries do not count at all in your FICO score if they are lessthan 30 days old.

Page 24: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Bank retail loans and FICO score

I While all credit inquiries are recorded and displayed on personalcredit reports for two years they have no e↵ect after the first yearbecause FICO’s scoring system ignores them after 12 months.

I Credit inquiries that were made by the consumer (such as pulling acredit report for personal use), by an employer (for employeeverification), or by companies initiating pre-screened o↵ers of creditor insurance do not have any impact on a credit score:

I these are called “soft inquiries” or “soft pulls”, and do notappear on a credit report used by lenders, only on personalreports. Soft inquiries are not considered by credit scoringsystems.

Page 25: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Bank retail loans and FICO score

I Getting a higher credit limit can help a consumers credit score. Thehigher the credit limit on the credit card, the lower the utilizationratio average for all of your credit card accounts.

I The utilization ratio is the amount owed divided by theamount extended by the creditor and the lower it is the higherFICO rating, in general.

I So if a consumer has one credit card with a used balance of $500and a limit of $1,000 as well as another with a used balance of $700and $2,000 limit; the average ratio is 40 percent ($1,200 total useddivided by $3,000 total limits). If the first credit card companyraises the limit to $2,000; the ratio lowers to 30 percent; whichcould boost the FICO rating.

I Other special factors:

I Any money owed because of a court judgment, tax lien, etc.,carries an additional negative penalty, especially when recent.

I Having one or more newly opened consumer finance creditaccounts may lower the FICO score.

Page 26: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Types FICO score

I There are several types of FICO credit score: classic or generic,bankcard, personal finance, mortgage, installment loan, auto loan,and NextGen score.

I The generic or classic FICO score is between 300 and 850, and 37%of people had between 750 and 850 in 2013.

I According to FICO, the median classic FICO score in 2006 was 723,and 711 in 2011. The U.S. median classic FICO score 8 was 713 in2014. The FICO bankcard score and FICO auto score are between250 and 900. The FICO mortgage score is between 300 and 850.Higher scores indicate lower credit risk.

Page 27: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Classic FICO score

I Each individual actually has more than 49 credit scores for the FICOscoring model because each of three national credit bureaus,Equifax, Experian and TransUnion, has its own database.

I Data about an individual consumer can vary from bureau to bureau.

I Di↵erent FICO scores’ names at each of the di↵erent creditreporting agencies: Equifax (BEACON), TransUnion (FICORisk Score, Classic) and Experian (Experian/FICO RiskModel).

I Four active generations of FICO scores: 1998 (FICO 98), 2004(FICO 04), 2008 (FICO 8), and 2014 (FICO 9)

I Consumers can buy their classic FICO Score 8 for Equifax,TransUnion, and Experian from the FICO website (myFICO). Othertypes of FICO scores cannot be obtained by individuals, only bylenders. Some credit cards o↵er to include the customers FICOscore in the monthly bills.

Page 28: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

NextGen Risk Score

I The NextGen Score is a scoring model designed by the FICOcompany for assessing consumer credit risk. This score wasintroduced in 2001, and in 2003 the second generation of NextGenwas released.

I Each of the major credit agencies markets this score generated withtheir data di↵erently:

I Experian: FICO Advanced Risk ScoreI Equifax: PinnacleI TransUnion: FICO Risk Score NextGen ( formerly Precision )

I Prior to the introduction of NextGen, their FICO scores weremarketed under di↵erent names:

I Experian: FICO Risk ModelI Equifax: BEACONI TransUnion: FICO Risk Score, Classic (formerly EMPIRICA)

Page 29: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

VantageScore

I In 2006, to try to win business from FICO, the three majorcredit-reporting agencies introduced VantageScore.

I According to court documents filed in the FICO v. VantageScorefederal lawsuit the VantageScore market share was less than 6% in2006.

I The VantageScore score methodology initially produced a scorerange from 501-990, but VantageScore 3.0 adopted the score rangeof 300-850 in 2013. Consumers can get free VantageScores fromfree credit report websites.

Page 30: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Other Credit Score

I Many lenders have their own credit score models compiled from theapplication data. These lenders often have internal data notreported or used by the three credit bureaus. To obtain updatedcredit scores from the credit bureaus, each lender needs to reporttheir customer’s payment performance to the three credit bureaus.

I As a result of the FACT Act (Fair and Accurate Credit TransactionsAct), each legal U.S. resident is entitled to a free copy of his or hercredit report from each credit reporting agency once every twelvemonths.

I The law requires all three agencies to provide reports: Equifax,Experian, and Transunion. These credit reports do not containcredit scores from any of the three agencies. The three creditbureaus run Annualcreditreport.com, where users can get their freecredit reports.

Page 31: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Non-traditional uses of credit scores

I Credit scores are often used in determining prices for auto andhomeowner’s insurance. Insurance companies use them to rate theinsurance risk of potential customers.

I Studies indicate that the majority of those who are insured payless in insurance through the use of scores. These studies pointout that people with higher scores have fewer claims.

I In 2009, TransUnion representatives testified before the Connecticutlegislature about their practice of marketing credit score reports toemployers for use in the hiring process.

I Legislators in at least twelve states introduced bills, and threestates have passed laws, to limit the use of credit check duringthe hiring process.

Page 32: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

CriticismCredit scores are widely used because they are inexpensive and largelyreliable, but they do have their failings.

I Easily gamed

I Because a significant portion of the FICO score is determinedby the ratio of credit used to credit available on credit cardaccounts, one way to increase the score is to increase thecredit limits on one’s credit card accounts.

I Not a good predictor of risk

I According to a Fitch study, the accuracy of FICO in predictingdelinquency has diminished in the past few years. In 2001there was an average 31-point di↵erence in the FICO scorebetween borrowers who had defaulted and those who paid ontime. By 2006 the di↵erence was only 10 points.

I Some banks have reduced their reliance on FICO scoring. Forexample, Golden West Financial (which merged with WachoviaBank in 2006) abandoned FICO scores for a more costlyanalysis of a potential borrower’s assets and employmentbefore giving a loan.

Page 33: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Criticism

I Use in employment decisions

I The use of credit reports for employment screening is allowedin all states, although some have passed legislation limiting thepractice to only certain positions.

I Eric Rosenberg, director of state government relations forTransUnion, has stated that there is no research that showsany statistical correlation between what’s in somebody’s creditreport and their job performance or their likelihood to commitfraud.

Page 34: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

SME in European Union

The European definition of SME:

I Micro: up to 10 employees

I Small: 11 to 50 employees

I Medium: 51 to 250 employees

I have an annual sales not exceeding 50 million euro, and/or

I total assets not exceeding 43 million euro.

In 2009 in EU, 92.2% business are Micro, 6.5% are Small, 1.1% areMedium and 0.2% are Large

Page 35: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

SME in the US

I Census Bureau data indicates that in 2011

I there were 5.68 million employer firms in the United StatesI 99.7% have fewer than 500 workersI 89.8% have less than 20 workers

I Add in 22.7 million of nonemployer firms in 2012

I 99.9% have fewer than 500 workersI 98% have less than 20 workers

I They produced 46 percent of the private nonfarm GDP in 2008 andaccounted for 63 percent of the net new jobs created in the US.

Page 36: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

SME Statistics

I 51.6% of businesses were operated primarily from someones home.

I 23.8% of employer firms operated out of a home.

I 62.9% of non-employer businesses were home-based.

I About 28% of firms were family-owned. These family-owned firmsaccounted for 42% of all firms receipts.

I Business owners were well-educated: 50.8% of owners of respondentfirms had a college degree.

I And 13.6% of business owners were foreign born.

Page 37: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

SME Default Models

I Edward Altman and Gabriele Sabato published a paper on SMEoneyear default prediction model in November 2006. The model wasdeveloped using the logistic regression for companies with annualsale less than $65 million. They pulled these companies financialdata from 1994 to 2002 from WRDS COMPUSTAT database.

I In the data, there were 120 defaults (with no missing data) duringthis period for the SMEs.

I A Moody’s 2004 study about the small and medium sized firms inthe US showed that the average default rate is 6%.

I To maintain the overall average expected default rate at 6%, 1,890other non-defaults were selected for their study.

Page 38: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Peer to Peer (P2P) Lending

I Wikipedia describes P2P lending as:

I the practice of lending money to unrelated individuals, or“peers”, without going through a traditional financialintermediary such as a bank or other traditional financialinstitution.

I Most P2P loans are unsecured personal loans. The interest rates are setby P2P companies based on the borrower’s credit. Borrowers with ahigher default risk are assigned higher rates. P2P investors can reducetheir credit risk (i.e., borrowers not paying back the loan) by choosingwhich borrowers to lend to. P2P lenders can further reduce their principalloss risk by diversifying their investments among di↵erent borrowers.Because P2P lenders can choose their borrowers, the P2P loans arelegally di↵erent from deposits in financial institutions. The P2P lendersinvestment in the P2P loan is not protected by any governmentguarantee. Even the bankruptcy of the P2P company that facilitate theloan may also put a lender’s investment at risk.

I The lending intermediaries are for-profit businesses; they generate revenueby collecting a onetime fee on funded loans from borrowers and by takinga loan servicing fee from investors (either a fixed amount annually or apercentage of the loan amount).

Page 39: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Most P2P companies provide the following services:

I Providing on-line investment platform to enable borrowers to attractlenders, and investors to identify and purchase loans that meet theirinvestment criteria

I Developing credit models for loan approvals and pricing

I Verifying borrower identity, bank account, employment and income

I Checking borrower credit history and filtering out the unqualifiedborrowers

I Processing payments from borrowers and sending payments tolenders who invested in the loan

I Servicing loans by providing customer service to borrowers andattempting to collect payments from borrowers who are delinquentor in default

I Legal compliance and reporting

I Through their marketing e↵orts, finding new lenders and borrowers

Page 40: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

I Because many P2P services are automated, the intermediarycompanies can operate with lower overhead and can provide theservice more cheaply than traditional financial institutions.

I P2P borrowers may be able to obtain money at lower interest ratesand P2P lenders may be able to earn higher returns. Compared tostock markets, peer-to-peer lending tends to have both lowervolatility and less liquidity.

I Because P2P loans are not secured, they are likely to have muchhigher loss rates than the secured consumer loans such as first andsecond residential mortgages and auto loans.

I P2P loans are most comparable to credit card loans, and hencemuch higher interest rates comparable to credit cards.

I However, credit card loans are short-term (one to three months)while P2P loans have 3 or 5-year terms. It is reasonable to expectthat the P2P loans will have much higher default rates than thesecured consumer loans, or credit card loans.

Page 41: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Managing Catastrophic Losses is Imperative to Credit Risk ControlEmphasis is not often low loss,and in a one-o↵ catastrophiclosses

I How much is the loss?

I How often?

I Why did it happen?

How to prevent similar loss in thefuture

I Hedge

I Portfolio sale

The ultimate question is whetherthere is enough capital to absorbthe massive loss

The key to these questions lies in understanding the credit lossdistributions

Page 42: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Typical credit portfolio loss distributions

I Tail risk refers to the end ofthe loss distribution

I The risk of catastrophicloss, usually from theunimaginable time , alsoknown as “Black Swan”

I estimate the amount ofcapital required to survive acatastrophic event

Page 43: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Control Tail Risk in Credit PortfolioNumerous causes of tail risk:

I Downward credit cycle (e.g. in 2007 the overall decline in UShousing prices )

I The system of internal financial system risks (such as theaccumulation of 2008 year of complex structured credit products)

I Economic slowdown (China 1993 banking crisis)

I Geo-political events (the oil crisis of the 1970s)

I Tail risk is often di�cult to predict, di�cult to measure and evenmore di�cult to manage

I Common symptoms: Stealth leverage expanded credit, over relaxedstate of mind, lax lending standards

I The signs are easy to see for the experienced risk control o�cer, butnot easy to quantify

I Systematic and systemic causes which can not be dispersed

I Reliable hedge management ... but often di�cult to convincemanagement to spend money

I Reliable regular portfolio rebalancing

Page 44: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

US Example: Prosper Marketplace

I Based in San Francisco, California, Prosper Marketplace is the first(Feb 5, 2006) P2P lending company in the US, with more than 2.2million members and over $5 billion in funded loans. Borrowersrequest personal loans on Prosper website, and investors (individualor institution) can fund anywhere from $2,000 to $35,000 per loanrequest. In addition to credit scores, ratings, and histories, investorscan consider borrowers’ personal loan descriptions, endorsementsfrom friends, and community a�liations. Prosper handles theservicing of the loan and collects and distributes borrower paymentsand interest back to the loan investors.

I Prosper verifies borrowers’ identities and select personal data beforefunding loans and manages all stages of loan servicing. Theseunsecured personal loans are fully amortized over a period of threeor five years, with no prepayment penalties. Prosper generatesrevenue by collecting a one-time fee ranging from 1% to 5% onfunded loans from borrowers and assessing a 1% annual loanservicing fee based on the original loan amount to investors.

Page 45: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

US Example: Lending Club

Page 46: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Lending Club: Overview

I Investors can search and browse the loan listings on Lending Clubwebsite, and select loans that they want to invest in based on theinformation supplied about the borrower, amount of loan, loan grade, andloan purpose. The loans can only be chosen at the interest rates assignedby Lending Club but investors can decide how much to fund eachborrower, with the minimum investment of $25 per note.

I Investors make money from interest. Rates vary from 5.32% to 28.49%,depending on the credit grade assigned to the loan. Lending Club makesmoney by charging borrowers an origination fee and investors a servicefee. The size of the origination fee depends on the credit grade andranges to be 1.1%-5.0% of the loan amount. The size of the service fee is1% on all amounts the borrower pays. The company facilitates interestrates that are better for lenders and borrowers than they would receivefrom most banks. It has averaged between a six and nine percent returnto investors between its founding and 2013. However, because lenders aremaking personal loans to individuals on the site, their gains are taxable aspersonal income instead of investment income. Therefore, income fromLending Club loans may be taxed at a higher rate than investments thatare taxed at the capital gains rate.

Page 47: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Lending Club: OverviewI Lending Club enables borrowers to create loan listings on its website by

supplying details about themselves and the loans that they would like torequest.

I All loans are unsecured personal loans and can be between $500 to$35,000. On the basis of the borrower’s credit score, credit history,desired loan amount and the borrower’s debt-to-income ratio, LendingClub determines whether the borrower is credit worthy and assigns to itsapproved loans a credit grade that determines payable interest rate andfees.

I The standard loan period is three years; a five-year period is available at ahigher interest rate and additional fees. About 29% of the 650,000 loanshave the 5-year term.

I The loans are fully amortizing, and can be repaid at any time withoutpenalty. Among the “Fully Paid” borrowers, about 40% paid o↵ theirloans during the first year, 30% the second year. Only 16% of the “FullyPaid” borrowers wait until the end of the term to pay o↵ their loans.

I From the investors perspective, the P2P loans have both the prepaymentrisk and default risk. While the bad credit borrowers default, the goodones prepay their loans. The double option risk will likely to make theP2P loans unprofitable from an option adjusted basis.

Page 48: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Default Rate Comparison: Prosper vs. Lending Club

Page 49: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

China’s credit cycle in declining currently while the tail riskis rising

I Macroeconomic weakness

I GDP Growth: an average of 9.77 percent for 34 years (1979-2014)– 2012: 7.8%; 2013: 7.7%; 2014: 7.4%

I Structural and cyclical factorsI Soft landing is most likely that no large-scale real estate, banking

system collapse

I Supply of credit crunch, but money supply is still relax

I Financing exist , but will tightenI Bad debts gradually revealed, but will be capitalized

I Chinese P2P negative press coverage

I Despite the financial / banking system is relatively stable, P2Pindustry systemic crash risk cannot be ignored

I Survivor is king

Page 50: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Conclusions

I Big Data: High Volume and Multi-Dimensional Data; Precise Modeling

I Insurance Industry: Risk Avoidance; Product Personalization;Cross-Selling and Upsetting; Fraud Detection; Catastrophe Planning;Customer Needs Analysis

I Bank, SME and P2P Lending: Credit Scores; Default Models; Tail RiskControlling

Page 51: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Incoming Books

I Active Risk Management: Financial Models and Statistical Methods,Chapman and Hall/CRC, 2016. (Lai and Xing)

I Quantitative Trading: Algorithms, Analytics, Data, Models, Optimization,Chapman and Hall/CRC, 2016. (Guo, Lai, Shek and Wong: academicsand hedge funds).

Page 52: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Quantitative TradingAlgorithms, Analytics, Data, Models, Optimization

Xin GUO1 Tze Leung LAI2

Howard SHEK3 Samuel Po-Shing WONG4

1Department of Industrial Engineering and Operations Research ,UC Berkeley2Department of Statistics, Stanford University

3Tower Research Capital LLC45Lattice Securities Limited

Chapman & Hall/CRC, 2016 July

1

Page 53: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of the book: Chapters

1. Introduction

2. Statistical Models and Methods for Quantitative Trading

3. Active Portfolio Management and Dynamic Investment Strategies

4. Econometrics of Transactions in Electronic Platforms

5. Limit Order Book: Data Analytics and Dynamic Models

6. Order execution and placement

7. Market Making and Smart Order Routing

8. Risk management

2

Page 54: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 2

Chapter 2 Statistical Models and Methods for Quantitative Trading

2.1 Stylized facts on stock price data

2.1.1 Time series of low-frequency returns

2.1.2 Discrete price changes in high-frequency data

2.2 Brownian motion at Paris Exchange and random walk down Wall Street

2.3 Modern Portfolio Theory (MPT) as a “walking shoe” down Wall Street underEMH

2.4 Statistical underpinnings of MPT

2.4.1 Multifactor pricing models

2.4.2 Bayes, shrinkage and Black-Litterman estimators

2.4.3 Bootstrapping and the resampled frontier

3

Page 55: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 2

2.5 A new approach incorporating parameter uncertainty

2.5.1 Solution of the optimization problem

2.5.2 Computation of the optimal weight vector

2.5.3 Bootstrap estimate of performance and NPEB rule

2.6 From random walks to martingales that match stylized facts

2.6.1 From Gaussian to Paretian random walks

2.6.2 Random walks with optional sampling times

2.6.3 From random walks to ARIMA, GARCH and general martingale regres-sion models for time series data

2.7 Neo-MPT involving martingale regression models ’

2.7.1 Incorporating time series e↵ects in NPEB procedure

2.7.2 Optimizing information ratios along e�cient frontier

2.7.3 An empirical study of neo-MPT

4

Page 56: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 2

2.8 Statistical arbitrage and strategies beyond EMH

2.8.1 Technical rules and the statistical background in nonparametric regressionand change-point modeling

2.8.2 Time series, momentum, and pairs trading strategies

2.8.3 Contrarian strategies, behavioral finance, and investors’ cognitive biases

2.8.4 From value investing to global macro strategies

2.8.5 In-sample and out-of-sample evaluation of investment strategies and sta-tistical issues of multiple testing

2.9 Supplements and problems

5

Page 57: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 3

Chapter 3 Active Portfolio Management and Dynamic Investment Strategies

3.1 Active alpha and beta in portfolio management

3.1.1 Sources of alpha

3.1.2 Exotic beta beyond active alpha

3.1.3 A new approach to active portfolio optimization

3.2 Transaction costs, and long-short constraints

3.2.1 Components of cost of transaction

3.2.2 Long-short and other portfolio constraints

6

Page 58: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 3

3.3 Multiperiod portfolio management

3.3.1 The Samuelson-Merton theory of “lifetime portfolio selection” of risky assetsvia stochastic control

3.3.2 Incorporating transaction costs into Merton’s problem

3.3.3 Multiperiod capital growth and volatility pumping

3.3.4 Multiperiod mean-variance portfolio rebalancing

3.3.5 Dynamic mean-variance portfolio optimization in the presence of transactioncosts

3.3.6 Dynamic portfolio selection in the presence of parameter uncertainty

3.4 Supplementary notes and comments

3.5 Exercises

7

Page 59: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted Evaluation of Default Predictionand Other Probability Forecasts

Tze Leung Lai, StanfordJoint work with Zhiyu Wang, Stanford

Jan7-8, 2016

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 60: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Outlines

1 Introduction

2 Martingale-Based Approach to Forecasting Evaluation

3 Asymptotic Normality of Time-Adjusted Accuracy Ratios

4 Simulation Study and Empirical Application to SMEs

5 Comparison of the Two Models

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 61: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 62: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

Diebold and Mariano (1995). After noting that “the literaturecontains literally thousands of forecast-accuracy comparisons;almost without exception, point estimates of forecast accuracy areexamined, with no attempt to assess their sampling accuracy,”they proposed an asymptotic inference approach to comparing twoforecasting methods (or models) by testing the null hypothesis ofequal predictive ability in terms of the mean of a loss function g ofthe prediction errors. Specifically, letting eit be the prediction errorof method i at time t, which is the di↵erence between Yt and theforecast ˆYit, their null hypothesis is that

Eg(e1t) = Eg(e2t), 1 t T. (1)

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 63: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

Assuming dt = g(e1t)� g(e2t) to be covariance stationary, with anabsolutely summable autocovariance function, their test statistic ispT ¯d/�, where ¯d is the sample mean of dt and �2 is a consistent

estimate of the variance of dt. The test statistic has a limitingstandard normal distribution under the null hypothesis andadditional regularity conditions which they summarize categoricallyas “short memory”. A multivariate extension of the test wasrecently proposed by Mariano and Preve (2012) for testing equalpredictive ability of more than two forecasting models.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 64: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

West (1996) considered the case of forecasts based on estimates ofa parameter �⇤. A vector fk(�⇤

) of k-step ahead predictors(k = 1, · · · ,K) is defined assuming knowledge of �⇤. Letting ˆ

�t

be the estimate of � at time t based on observations up to thattime, the corresponding vector of adaptive predictors is ˆft = ft( ˆ�t).Under the assumption that ft(�⇤

) is covariance stationary andadditional conditions, West (1996) studies how well the average¯f = n�1PR+n

t=R+1ˆft of the adaptive predictions over the period

between R+ 1 and R+ n approximate the mean f = Eft(�⇤) by

using the variance of the limiting zero-mean normal distribution ofpn(¯f � f), which he derived under certain assumptions.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 65: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

Earlier, Lai and Zhu (1991) proposed an alternative way tocompare ft+1|t( ˆ�t) with ft+1|t( ˆ�

⇤) for k-step ahead predictions in

non-linear ARX models. They used the cumulative squaredprediction error

PR+nt=R+1[ft+k|t( ˆ�t)� ft+k|t( ˆ�

⇤)]

2 with R > k, andmade use of martingale theory to show how predictor can beconstructed to attain the asymptotically minimal rate of log n.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 66: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

Giacomini and White (2006) made use of the martingale CLT(central limit theorem) to derive asymptotically normal Wald-typetest statistics of the null hypothesis

H0 : Eh

L(Yt+k, ft+k|t( ˆ�t))� L(Yt+k, f0t+k|t(

ˆ

0t))|Ft

i

= 0, 1 t T � k

(2)of equal predictive performance, for k-step ahead forecasts, of twoforecasting methods f and f 0 that involve estimates ˆ

�t and ˆ

0t of

parameters � and �

0 in the respective models assumed by f and f 0.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 67: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

The predictive performance is measured by the conditionalexpectation, given the information set up to time t, of a lossfunction L of the discrepancy between the forecast and the actualoutcome variable Yt+k. Giacomini and White (2006, Sect. 2.2)pointed out that an important advantage of this approach overthose of Diebold and Mariano (1995) and West (1996) is that itsasymptotic distribution theory does not require stationarity overtime. Giacomini and Rossi (2010) subsequently refined the teststatistics to incorporate time-varying relative performance inunstable environments.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 68: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Introduction

Another direction in the development of forecast evaluation isrelated to regulatory evaluation of a bank’s internal loan defaultprediction models, as in risk assessment of the bank’s retail loans.A measure widely used by regulators and rating agencies is theAccuracy Ratio. The Basel Committee on Banking Supervision(BCBS, 2006, Sect 414) wants banks to use multi-year data inassigning ratings on a yearly basis and evaluating them. Althoughthere has been substantial interest in statistical inference on theaccuracy ratio of a bank’s default prediction methods, not muchhas been done concerning the time series aspects of historical dataon the predicted default probability and the actual occurrence andnon-occurrence of default.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 69: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted Version of Accuracy Ratio

We propose a time-adjusted version of the accuracy ratio, forwhich there is a comprehensive asymptotic theory for inferencewithout subjective modeling of the time series data.

The basic idea underlying the time-adjusted accuracy ratio is amartingale structure relating the predictor to the predictand. Lai,Gross and Shen (2011), abbreviated by LGS hereafter, noticed thisstructure and made use martingale theory to develop a newapproach to evaluating probability forecasts.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 70: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

The LGS Approach

This approach is based on estimation of the averageLn = n�1Pn

t=1 L(pt, pt), where pt is the predicted probability ofoccurrence of an event at time t and pt is the actual probability ofthe event, conditional on the information set prior to t. Theestimator is simply the score ˆLn = n�1Pn

t=1 L(Yt, pt), where Yt isthe Bernoulli variable taking the value 1 if the event occurs, and 0otherwise. When L(p, p) is linear in p, L(Yt, pt)� L(pt, pt), t � 1,is a martingale di↵erence sequence, so it follows from martingaletheory that ˆLn � Ln converges to 0 in probability and isasymptotically normal under some regularity conditions.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 71: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

The LGS Approach

For a general class of loss functions, LGS showed that thedi↵erence in average scores ˆL0

n � ˆL00n between two forecasting

methods p0t and p00t is a consistent and asymptotically normalestimate of the di↵erential loss L0

n �L00n. An important observation

of LGS is that even though commonly used loss functions L(p, p)are not linear in p, they have linear equivalence ˜L(p, p) that arelinear in p and satisfy

L(p, p)� ˜L(p, p) does not depend on p. (3)

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 72: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

The LGS Approach

n this case, L(pt, p0t)� L(pt, p00t ) =

˜L(pt, p0t)� ˜L(pt, p

00t ) is a linear

function of pt and therefore the di↵erential loss L0n � L00

n can beconsistently estimated by the di↵erence in average scores ˆL0

n � ˆL00n,

to which the martingale CLT can be applied to establishasymptotic normality.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 73: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

The LGS Approach to Non Probabilistic Forecasts

Unlike the probability forecast pt that gives a predictive distributionof the Bernoulli variable Yt, the forecasts considered by Dieboldand Mariano, West, and Giacomini and White are non probabilisticforecasts that predict the value of the future outcome Yt. Theaverage loss di↵erential ˆ�n = n�1Pn

t=1{L(Yt, ˆY 0t )� L(Yt, ˆY

00t )} is

an unbiased estimate of the conditional expected loss di↵erential�n = n�1Pn

t=1E{L(Yt, ˆY 0t )� L(Yt, ˆY

00t )|Ft�1}, in which

E(·|Ft�1) is with respect to the actual but unknown probabilitymeasure generating the future outcomes Yt given the history(information set) up to time t� 1.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 74: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

The LGS Approach to Non Probabilistic Forecasts

In fact, ˆLn = n�1Pnt=1 L(Yt,

ˆYt) is an unbiased estimate ofLn = n�1E[L(Yt, ˆYt)|Ft�1] and n(ˆLn � Ln), n � 1, is amartingale. The point is that L(Yt, ˆYt)� E[L(Yt, ˆYt)|Ft�1], t � 1,is a martingale di↵erence sequence, and LGS makes use of this andthe martingale CLT to establish the asymptotic normality ofpn(ˆLn � Ln). Instead of hypothesis testing, the approach of LGS

is targeted towards estimating the conditional expected lossdi↵erential �n = n�1Pn

t=1E{L(Yt, ˆY 0t )� L(Yt, ˆY

00t )|Ft�1} and

developing confidence intervals based on ˆ

�n.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 75: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Martingale-Based Approach to ForecastingEvaluation

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 76: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Adaptive weights for loss functions and segmentation

Giacomini and Rossi (2010) note that the relative performance ofmodels may change over time in unstable environments, and thatin existing econometric literature “a forecaster may select themodel that performed best on average over a historical sample,ignoring the fact that the competing model produced moreaccurate forecasts when considering only the recent past.” Theypropose to modify previous tests of equal predictive ability of twoforecasting methods accordingly.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 77: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Adaptive weights for loss functions and segmentation

Instead of using hypothesis testing for inference on comparingforecasting methods, we modify the LGS approach by redefiningLn (or L0

n � L00n) to take into consideration regime switches over

time and modifying the estimates ˆLn (or ˆL0n � ˆL00

n) similarly. A keyidea underlying this modification is to partition time (up to n) intoregimes by using stopping times ⌧1 < · · · < ⌧k(n), with ⌧k(n) n.A stopping time ⌧ is a positive integer-valued random variable suchthat {⌧ = t} 2 Ft for every t. Suppose there are J � 2 regimes. Inparticular, the case J = 2 corresponds to two regimes (stable andunstable) considered by Giacomini and Rossi.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 78: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Adaptive weights for loss functions and segmentation

To segregate evaluations for di↵erent regimes which are assumedto be piecewise constant over time, we propose to use current andpast data to estimate the regime at each time, and let ⌧i be thefirst time that the estimated regime changes from ˆji�1 to ˆji.

k(n)+1X

i=1

1{ji�1=j}

X

⌧i�1<t⌧i

{L(Yt, ˆYt)� E[L(Yt, ˆYt)|Ft�1]} (4)

is still a martingale for fixed j = 1, · · · , J , setting ⌧0 = 0 and⌧k(n)+1 = n. Hence the martingale CLT can still be applied toderive from (4) an asymptotically normal estimate of theconditional mean loss (for regime j)

Ln,j =

n

Pk(n)+1i=1 1{ji�1=j}

P

⌧i�1<t⌧iE[L(Yt, ˆYt)|Ft�1]

o

,

Pk(n)+1i=1 (⌧i � ⌧i�1)1{ji�1=j}.

(5)Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 79: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Adaptive weights for loss functions and segmentation

SinceP

⌧i�1<t⌧iE[L(Yt, ˆYt)|Ft�1] =

P⌧i�1t=⌧i�1+1

E[L(Yt+1, ˆYt+1)|Ft], the conditional means in (5) are taken overthe information sets Ft with t in the segregated period between⌧i�1 < t ⌧i � 1 during which there is no regime change.Although ⌧i � 1 is not a stopping time, the sumP

⌧i�1<t⌧i{L(Yt � ˆYt)� E[L(Yt � ˆYt)|Ft�1] in (4) involves the

stopping time ⌧i instead of the ⌧i � 1. Thus we can use themartingale strong law and the martingale CLT to show that

ˆLn,j =

n

Pk(n)+1i=1 1{ji�1=j}

P

⌧i�1<t⌧iL(Yt, ˆYt)

o

,

Pk(n)+1i=1 (⌧i � ⌧i�1)1{ji�1=j}

(6)is a consistent and asymptotically normal estimate of Ln,j for1 j J .

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 80: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Adaptive weights for loss functions and segmentation

More generally, the weighted sum

nX

t=1

wt

n

L(Yt, ˆYt)� E[L(Yt, ˆYt)|Ft�1]

o

(7)

is a martingale if the weight wt only depends on the informationset Ft�1 for 1 t n. We call such weights adaptive. Hence theapproach of LGS that uses the martingale structure in ˆLn � Ln

can be extended by incorporating adaptive weights wt for thelosses L(Yt, ˆYt). The martingale CLT can be applied to constructconfidence intervals for the weighted loss di↵erential

�n = n�1nX

t=1

w( ˆY 0t , ˆY

00t )E[L(Yt, ˆY

0t )� L(Yt, ˆY

00t )|Ft�1].

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 81: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted grouping of multiple forecasts at each time

Grouping probability forecasts for their evaluation is a popularmethod because it leads to graphical displays for how well thegroup’s forecasts approximate the relative frequency of actualoccurrences in the group. A widely used approach in forecastingthe probability of rainfall is to group the probability forecasts ptinto bins so that they are rounded to a finite set of values, denotedby p(1), · · · , p(J). Correspondingly to each p(j) is a set ofobservations Yi, i 2 Ij , taking values 0 and 1, whereIj = {i : pi = p(j)}. In weather forecasting, the reliability diagramplots ¯Y (j) = (

P

i2Ij Yi)/nj versus pj , where nj is the size of Ij .

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 82: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted grouping of multiple forecasts at each time

Statistical inference for reliability diagrams has been developed inthe meteorology literature under the assumption of “independenceand stationarity” that the (pi, Yi) are i.i.d. samples from abivariate distribution; see Wilks (2005) and Brocker and Smith(2007). Under this assumption, a (1� ↵)-level confidence intervalfor the mean p(j) of the Yi with i 2 Ij is

¯Y (j)± z1�↵/2{ ¯Y (j)(1� ¯Y (j))/nj}1/2, (8)

where zq is the qth quantile of the standard normal distribution.However, this assumption is clearly violated in weather forecastingand Wilks (2005, p.331) has raised the concern that the confidenceintervals (8) “are probably too narrow”.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 83: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted grouping of multiple forecasts at each time

This di�culty is addressed in LGS by using time-dependentgrouping of multiple forecasts so that the martingale structure offorecast-observation pairs is preserved. The basic underlying idea isthat many applications involve multiple forecasts at a given time t.For example, in risk assessment of a bank’s mortgage loans, theobligors are grouped into risk buckets within which they can beregarded as having similar risk. Forecasting Kt outcome variablesYt,1, ..., Yt,Kt that occur at time t leads to a vector of forecasts(

ˆYt,1, ..., ˆYt,Kt) at time t� 1. Instead of the J groups, LGS use thetime-adjusted modification that divides at every time t the Kt

forecast-observation pairs into J groups Ij,t, 1 j J , such thatIj,t is based on Ft�1. Typically Ij,t = {k : 1 k Kt, ˆYt,k 2 Cj}.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 84: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted grouping of multiple forecasts at each time

Let nj,t be the cardinality of Ij,t, nj =PT

t=1 nj,t,µt,k = E(Yt,k|Ft�1) and

¯Yt(j) = (

X

k2Ij,t

Yt,k)/nj,t, µt(j) = (

X

k2Ij,t

µt,k)/nj,t,

¯Y (j) = (

TX

t=1

X

k2Ij,t

Yt,k)/nj , µ(j) = (

TX

t=1

X

k2Ij,t

µt,k)/nj , (9)

vt(j) =

TX

t=1

X

k2Ij,t

(Yt,k � ¯Yt(j))2/(nj,t � 1), v(j) =

TX

t=1

nj,tv(j)/nj ,

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 85: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted grouping of multiple forecasts at each time

assuming that nj,t � 2 for 1 j J . LGS applies the martingaleCLT to show that

TX

t=1

X

k2Ij,t

(Yt,k � µt,k)/(

TX

t=1

X

k2Ij,t

�2t,k)

1/2 D�! N(0, 1) (10)

under certain regularity conditions, where�2t,k = E((Yt,k � µt,k)

2|Ft�1),. Let v(j) = (

PTt=1

P

k2Ij,t �2t,k)/nj .

We can show that v(j) � v(j) + op(1). Therefore a valid, albeitconservative, a (1� ↵)-level confidence interval for µ(j) is¯Y (j)± z1�↵/2(v(j)/nj)

1/2, which is an adjustment of (8) for thetemporal dependence between the forecast-observation pairs.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 86: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Asymptotic Normality of Time-AdjustedAccuracy Ratios

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 87: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

U-statistic representation of accuracy ratio

As note by Englemann, Hayden and Tasche (2003), (abbreviatedby EHT hereafter), the most popular measure fo the predictiveaccuracy of a rating method or risk model for loans is the accuracyratio (AR), which is the ratio of (a)the area between thecumulative accuracy profile (CAP) of the rating method and thatof random prediction, to (b) the corresponding area for perfectprediction.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 88: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Accuracy Ratio and Power Curve

Figure: Accuracy Ratio Illustration

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 89: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

AR and AUC

It is shown in ETH that AR=2AUC-1, where AUC denotes the areaunder the ROC (receiver operating characteristic) curve, which iswidely used to evaluate the discriminating performance of aclassification rule in diagnostic tests. Using this formula, relatingAR to AUC, ETH applies known results on the asymptoticdistribution of AUC, to derive approximate tests and confidenceintervals for accuracy ratios.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 90: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

U-statistic representation of accuracy ratio

The U-statistic representation of AUC assigns value 1 to the pair(i, j) if Yi < Yj and Xi < Xj , �1 to (i, j) with Yi < Yj andXi > Xj and 0 to all other (i, j) pairs. In the context of defaultprediction, Yi is the indicator variable (1 or 0) of the event of thedefault for the ith obligor and Xi is the corresponding predictedprobability. The asymptotic theory of U-statistics assumes that(Xi, Yi) are i.i.d pairs, which is clearly violated where the defaultpredictions are made sequentially over time.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 91: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Symmetric statistics and its CLT

Let Z1, ...,Zn be independent (not necessarily identicallydistributed) random vectors and let Sn = Sn(Z1, ...,Zn) be astatistic that is invariant under permutation of its arguments. Suchstatistics, which include U-statistics, are called symmetric and havethe Hoe↵ding-Efron-Stein decomposition

˜Sn =

nX

i=1

E( ˜Sn|Zi)+X

1i<jn

E( ˜Sn|Zi,Zj)+X

1i<j<kn

E( ˜Sn|Zi,Zj ,Zk)+...

(11)under the assumption ES2

n < 1, where ˜Sn = Sn � E(Sn) and the2

n � 1 random variablesE( ˜Sn|Zi),E( ˜Sn|Zi,Zj),E( ˜Sn|Zi,Zj ,Zk), ... have mean 0 and arepairwise uncorrelated; see Lai and Wang (1993).

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 92: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Symmetric statistics and its CLT

Assuming a Lyapunov-type condition,

max1in E(|E( ˜Sn|Zi)|�)/⌫�/2n ! 0 for some � > 2, where ⌫n =

Pni=1

E( ˜Sn|Zi)�2

(12)Hoe↵ding’s (1948) classical proof of the central limit theorem forthe U-statistic can be modified to show that ˜Sn/

p⌫n has the same

limiting distribution as that of ⌫� 1

2n

Pni=1 E( ˜Sn|Zi), which is the

standard normal by Lyapunov’s central limit theorem.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 93: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted accuracy ratio and its applications toforecast evaluation

The assumption of i.i.d pairs (Xi, Yi) underlying EHT’s derivationof the asymptotic normality of AR is to strong since it excludes thetime series e↵ects of the outcome variable of Yi and the predictedprobability Xi. In particular, the same obligor may have multiple(Xi, Yi) values over di↵erent periods in the evaluation data. Theidea of time-adjusted grouping of multiple forecasts in Section 2.2can be used to remove this unrealistic assumption via

Time-adjusted accuracy ratio =

TX

t=1

wtARt (13)

in which ARt is the accuracy ratio of the multiple forecasts basedon (Xt

i , Yti ) pairs (i = 1, ...,Kt) at time t and the weights wt are

prespecified positive numbers summing to 1, e.g., wt = 1/T .Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 94: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Time-adjusted accuracy ratio and its applications toforecast evaluation

Note that the predicted default probabilities Xti are based on data

up to time t� 1 and that (13) is an unbiased estimate of theprediction accuracy measure

PTi=1wtE(ARt), where E(ARt) is

given by

E(ARt) =P

1i 6=jKt{P(Xt

i < Xtj , Y

ti = 1, Y t

j = 0)� P(Xti > Xt

j , Yti = 1, Y t

j = 0)}/�Kt

2

(14)

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 95: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Simulation Study and Empirical Application toSMEs

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 96: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Small and Medium Sized Enterprises

Small and medium sized enterprises (SMEs) are reasonablyconsidered the backbone of the economy of many countries all overthe world. For OECD members, the percentage of SMEs out of thetotal number of firms is greater than 97 percent. In the US, SMEsprovide approximately 75 percent of the net jobs added to theeconomy and employ around 50 percent of the private workforce,representing 99.7 percent of all employers.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 97: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Small and Medium Sized Enterprises

SMEs can respond quickly to changing economic conditions andmeet local customers needs, growing sometimes into large andpowerful corporations or failing within a short time (Altman,Sabato 2006). From a credit risk point of view, SMEs are di↵erentfrom large corporates for many reasons. For example Diesch andPetey (2004) conclude that they are riskier but have a lower assetcorrelation with each other than large businesses. Presumably adefault prediction model developed on large corporate data toSMEs will result in lower prediction power.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 98: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Data Sets

The data set being analyzed includes records 2384 di↵erentcompanies from year 1994-2004. We take all the data before(including) year 2000 as in sample training set. We normalize allthe variables by the total assets. For example, we use “retainedearnings” divided by total assets as our covariate instead ofincluding the “retained earnings” directly.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 99: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Default Rates over Time

Year1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

De

fau

lt R

ate

(%

)

0

2

4

6

8

10

12

SMEs

High-yield Corporate

Figure: Trailing 12 Month Default Rate

Source: Wrds, Comp&Stats, Wharton Research Institute, Moody’sTze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 100: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Methods and Model Selection

We first fit logistic regression and apply forward BIC to select theimportant covariates. Then we fit a GLMM model using the samecovariates selected by BIC in the logistic regression as fixed e↵ect.We only add one term using the tic information as random e↵ect.The GLMM model in this case is the following

logit(⇡i,t) = �

Txi,t + bi + ✏i,t (15)

Where P(Yi,t = 1) = ⇡i,t, xi,t are the fixed e↵ect covariate, bi isthe random e↵ect and ✏i,t is i.i.d noise.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 101: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Recall Generalized Linear (Mixed) Models

Let Y be the response variable and X be the predictor variable,µ = EY . Let (Xi, Yi), 1 i n be the n samples, a generalizedlinear model assumes that

g(µi) = �

TXi, 1 i n

where g(·) is a smooth and invertible link function.A generalized linear mixed model (GLMM) takes the form of

g(µij) = �

TXij + b

Ti Zij

where bi are subject-specific random e↵ect such that bi are from adistribution with mean 0. � are the fixed e↵ects which is the sameacross all subjects.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 102: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Illustration for GLMM

Figure: Linear Mixed Models

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 103: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Methods and Model Selection

The model selected by BIC is

bkrpt ~ teqq + req_atq + req + cheq + DLTTQ + RECTQ

+ ln_dlcq_teqq + ln_cheq_atq + dlcq_teqq

The definition of the variables is summarized in Table shown nextslide.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 104: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

bkrpt a Bernoulli variable indicating defaulted (1) or not (0)

teqq Stockholders Equity - Total

req Retained Earnings

atq Total Asset

cheq Cash and Short-Term Investments

DLTTQ Long-Term Debt - Total

RECTQ Receivables - Total

dlcq Debt in Current Liabilities

Table: Definition of Variables

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 105: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Comparison of the Two Models

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 106: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Accuracy Ratio Comparison

Proportion of all firms0 0.2 0.4 0.6 0.8 1

Pro

port

ion o

f defa

ulte

d firm

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Year 2001

logistic AR=0.61562GLMM AR=0.85333y=x

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 107: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Accuracy Ratio Comparison

Proportion of all firms0 0.2 0.4 0.6 0.8 1

Pro

port

ion o

f defa

ulte

d firm

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Year 2002

logistic AR=0.7GLMM AR=0.8734y=x

Proportion of all firms0 0.2 0.4 0.6 0.8 1

Pro

port

ion o

f defa

ulte

d firm

s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Year 2003

logistic AR=0.5739GLMM AR=0.9515y=x

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 108: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Brier Score Comparison

For the prediction in year 2002, the Brier score for logisticregression is ˆLn = 0.0135, the Brier score for GLMM isL = 0.0098. So the di↵erence between the scores isˆLn � Ln = 3.6763⇥ 10

�3 and using 14 as an upper bound for the

p(1� p), we obtain the conservative 95% confidence interval for itis [1.9951⇥ 10

�3, 5.3576⇥ 10

�3]. This means that GLMM is

definitely better than logistic regression since even the conservativeconfidence interval doesn’t contain 0.

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 109: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Brier Score Comparison

For the prediction in year 2003, while the score for GLMM is lower,with ˆLn � Ln = 7.9902⇥ 10

�4, the conservative 95% confidencefor it is now [�7.5765⇥ 10

�4, 2.3557⇥ 10

�3], which contains zero.

This is because there are very few defaulted companies (about0.6% in year 2003) so the defaulted probability is likely to be lowand therefore using a upper bound 1/4 for p(1� p) is tooconservative. We will be using, for example, reliability diagram toestimate the p .

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 110: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Introduction

Martingale-Based Approach to Forecasting Evaluation

Asymptotic Normality of Time-Adjusted Accuracy Ratios

Simulation Study and Empirical Application to SMEs

Comparison of the Two Models

Reliability Diagram

Tze Lai (Stanford) Time-adjusted Evaluation of Default Prediction

Page 111: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

Quantitative TradingAlgorithms, Analytics, Data, Models, Optimization

Xin GUO1 Tze Leung LAI2

Howard SHEK3 Samuel Po-Shing WONG4

1Department of Industrial Engineering and Operations Research ,UC Berkeley2Department of Statistics, Stanford University

3Tower Research Capital LLC45Lattice Securities Limited

Chapman & Hall/CRC, 2016 July

1

Page 112: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of the book: Chapters

1. Introduction

2. Statistical Models and Methods for Quantitative Trading

3. Active Portfolio Management and Dynamic Investment Strategies

4. Econometrics of Transactions in Electronic Platforms

5. Limit Order Book: Data Analytics and Dynamic Models

6. Order execution and placement

7. Market Making and Smart Order Routing

8. Risk management

2

Page 113: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 2

Chapter 2 Statistical Models and Methods for Quantitative Trading

2.1 Stylized facts on stock price data

2.1.1 Time series of low-frequency returns

2.1.2 Discrete price changes in high-frequency data

2.2 Brownian motion at Paris Exchange and random walk down Wall Street

2.3 Modern Portfolio Theory (MPT) as a “walking shoe” down Wall Street underEMH

2.4 Statistical underpinnings of MPT

2.4.1 Multifactor pricing models

2.4.2 Bayes, shrinkage and Black-Litterman estimators

2.4.3 Bootstrapping and the resampled frontier

3

Page 114: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 2

2.5 A new approach incorporating parameter uncertainty

2.5.1 Solution of the optimization problem

2.5.2 Computation of the optimal weight vector

2.5.3 Bootstrap estimate of performance and NPEB rule

2.6 From random walks to martingales that match stylized facts

2.6.1 From Gaussian to Paretian random walks

2.6.2 Random walks with optional sampling times

2.6.3 From random walks to ARIMA, GARCH and general martingale regres-sion models for time series data

2.7 Neo-MPT involving martingale regression models ’

2.7.1 Incorporating time series e↵ects in NPEB procedure

2.7.2 Optimizing information ratios along e�cient frontier

2.7.3 An empirical study of neo-MPT

4

Page 115: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 2

2.8 Statistical arbitrage and strategies beyond EMH

2.8.1 Technical rules and the statistical background in nonparametric regressionand change-point modeling

2.8.2 Time series, momentum, and pairs trading strategies

2.8.3 Contrarian strategies, behavioral finance, and investors’ cognitive biases

2.8.4 From value investing to global macro strategies

2.8.5 In-sample and out-of-sample evaluation of investment strategies and sta-tistical issues of multiple testing

2.9 Supplements and problems

5

Page 116: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 3

Chapter 3 Active Portfolio Management and Dynamic Investment Strategies

3.1 Active alpha and beta in portfolio management

3.1.1 Sources of alpha

3.1.2 Exotic beta beyond active alpha

3.1.3 A new approach to active portfolio optimization

3.2 Transaction costs, and long-short constraints

3.2.1 Components of cost of transaction

3.2.2 Long-short and other portfolio constraints

6

Page 117: Big Data Analytics in Finance - Seoul National Universitystat.snu.ac.kr/mvstat/STSworkshop/plenarytalk1.pdf · I Cross selling and upselling I Fraud detection I Catastrophe planning

TOC of Chapter 3

3.3 Multiperiod portfolio management

3.3.1 The Samuelson-Merton theory of “lifetime portfolio selection” of risky assetsvia stochastic control

3.3.2 Incorporating transaction costs into Merton’s problem

3.3.3 Multiperiod capital growth and volatility pumping

3.3.4 Multiperiod mean-variance portfolio rebalancing

3.3.5 Dynamic mean-variance portfolio optimization in the presence of transactioncosts

3.3.6 Dynamic portfolio selection in the presence of parameter uncertainty

3.4 Supplementary notes and comments

3.5 Exercises

7