presenter: jo prichard · – flipping – short-sale flipping (flopping) •seller and buyer risk...

RED/082311

Presenter: Jo Prichard Computerworld, Phoenix AZ, 9/20/11

LexisNexis HPCC Systems Mortgage Fraud Case Study

1

LexisNexis HPCC Systems Mortgage Fraud Case Study

Presenter: Jo Prichard

Computerworld Phoenix, AZ 9/20/11

RED/082311

• Massively Parallel Extract, Transform and Load (ETL) engine

– Built from the ground up as a parallel data environment. Leverages inexpensive locally attached storage. Doesn’t require a SAN infrastructure.

• Enables data integration on a scale not previously available:

– Current LexisNexis person data build process generates 350 Billion intermediate results at peak

• Suitable for:

– Massive joins/merges

– Massive sorts & transformations

– Any N2 problem

– “identify and catalog all the DNA in the oceans”

HPCC Data Refinery (Thor)

HPCC Data Delivery Engine (Roxie)

• A massively parallel, high throughput, structured query response engine

• Ultra fast due to its read-only nature.

• Allows indices to be built onto data for efficient multi-user retrieval of data

• Suitable for

– Volumes of structured queries

– Full text ranked Boolean search

– “I want that fish there”

Enterprise Control Language (ECL)

• An easy to use , data-centric programming language optimized for large-scale data management and query processing

• Highly efficient; Automatically distributes workload across all nodes.

– Industry analysts estimate 80% more efficient than C++, Java and SQL and 1/3 reduction in programmer time to maintain/enhance existing applications

– Benchmark against SQL (5 times more efficient) for code generation

• Automatic parallelization and synchronization of sequential algorithms for parallel and distributed processing

• Large library of efficient modules to handle common data manipulation tasks

Three Main Components

2

RED/082311

Mortgage Fraud Continues to Impact the Economy

• Per FBI, pending investigations increased 12% in the fiscal year ended September 30, 2010, to 3,129 cases – this represents a 90% jump from the previous fiscal year.

• The collapse of the housing boom and financial crisis has increased foreclosures – in 2010, 2.5 million foreclosures were initiated. 2011 should see the same number.

• Per FBI, mortgage origination schemes have decreased because of depressed housing market.

• But Fraud targeting troubled borrowers increased and includes loan modification scams and foreclosure rescue schemes in which perpetrators convince borrowers they can save their homes through deed transfers and upfront fees

• Mortgage fraud hotspots include California, New York and Florida

• Source: http://www.reuters.com/article/2011/08/15/us-usa-mortgages-fraud-idUSTRE77E3UP20110815

3

RED/082311

Why is Mortgage Fraud so Difficult to Detect?

• Systems built to manage loan portfolios are not well suited to fraud detection in scale.

• Mortgage Fraud is prolific and can be hard to detect since Mortgage data is not consolidated into one database.

• Data is spread through various places: financial services organizations, FinCen SARS, government agencies, and public records such as property and assessment deeds.

• Government Agencies have limited resources to detect and investigate the bigger mortgage fraud schemes.

• The challenge is to quickly leverage readily available data to help organizations detect, prioritize and investigate large mortgage fraud schemes.

4

RED/082311

How Can HPCC Systems Detect Mortgage Fraud ?

• Leverage publicly available data such as Property Deed and Assessments, which is almost 700 million records

• Public Records Data has data hygiene challenges:

• Limited information on mortgage participants

• No information on the appraiser or realtor

• Names are misspelled, no SSN, no DOB

• How can HPCC Systems detect mortgage fraud leveraging the big data of public records?

5

RED/082311

Rules Based Fraud Detection Falls Short

6

Fraudsters know all the thresholds and game the system. • Advanced Persistent Threat (APT) is not just Cyber.

• Rules based detection plays a key role in the “Giant

Mortgage Fraud Magic Act”.

• Key Differentiator is in how to leverage BIG DATA to measure proximity of seemingly low risk events commonly associated with high risk activities to detect organized fraud syndicates.

RED/082311

Isolated risk? Lone Individuals vs. Organized Group

7

Variables that describe the proximity and connectedness of risk through relationships. • Non-visual rank ordering, prioritizing for investigation and mitigating of risk.

– Suspicious insurance claims by proximity to other suspicious insurance claims, providers and body

shop contacts.

– New unsecured accounts by proximity to secured accounts and other newly unsecured accounts.

– Suspicious property transactions proximity to associated suspicious property transactions.

• Predictive analytics based on variables that contain awareness of proximity through relationships

– Predict risk through associations to keep step with emerging fraud schemes.

– Measure the predictive nature within networks of personal injury claims, suspicious mortgage transactions and potential bust out activities.

RED/082311

Property Transaction Risk

8

Three core transaction variables measured • Velocity

• Profit (or not)

• Buyer to Seller Relationship Distance

(Potential of Collusion)

Flipping

Profit

Collusion

RED/082311

Large Scale Suspicious Cluster Ranking

±700 mill Deeds

Derived Public Data Relationships from +/- 50 terabyte database

Collusion Graph Analytics

Chronological Analysis of all property Sales

Historical Property Sales Indicators and Counts

Person / Network Level Indicators and Counts

Data Factory Clean

Overview

RED/082311

Isolated risk of Mortgage Fraud? Lone Individuals vs. Organized Group

10

Rank the nature, connectedness and proximity of suspicious property transactions for every identity in the U.S. • Property History Risk

– Chronological flow of transactions for a property – Collusion to artificially inflate and strip equity – Collusion to artificially deflate – Flipping – Short-sale flipping (Flopping)

• Seller and Buyer Risk

– Seller and Buyer property transaction history – Seller and Buyer cluster variables representing a variety of variables identifying

– Equity Stripping schemes – Foreclosure Generating Clusters – Flipping schemes (short-sale flipping) – Straw buyer recruiting

RED/082311

Example: Suspicious Equity Stripping Cluster

11

RED/082311

Results

12

Large scale measurement of influencers strategically placed to potentially direct suspicious transactions. • All BIG DATA on one supercomputer measuring over a decade of property

transfers nationwide.

• BIG DATA Products to turn other BIG DATA into compelling intelligence.

• Large Scale Graph Analytics allow for identifying known unknowns.

• Florida Proof of Concept – Highest ranked influencers

Identified known ringleaders in flipping and equity stripping schemes.

Typically not connected directly to suspicious transactions. – Known ringleaders not the Highest Ranking.

• Clusters with high levels of potential collusion. • Clusters offloading property, generating defaults. • Agile Framework able to keep step with emerging schemes in real estate.

RED/082311

BIG DATA Insights on Complex Real Estate Behavior. Deeds and Flipping

13

RED/082311

Total Sales vs. Flipping

14

Contrast Sales with Flipping and Potential Collusion • Sales decline post 2003.

• Percentage of Flipping and Potential

Collusion increasing in spite of declining sales.

RED/082311

Emerging Fraud Trends – Short-Sale Flipping

15

RED/082311

16

Appendix

RED/082311

• Started in 1973 as Mead Data Central and launched the Lexis® service, which pioneers online legal research by allowing attorneys to search case law database in firm via private telecommunications network. Added more data sources and services for the next thirty years, included public records.

• Acquired by Reed Elsevier in 1994. Reed Elsevier is a world leading provider of professional information and workflow solutions in the science, medical, legal, risk management and business sectors. Stock Symbols: valued at $12 billion in 2010; [NYSE: ENL; NYSE: RUK]; 34,000 employees.

• Today, LexisNexis has billions of searchable documents and records from more than 45,000 information sources. Headquarters: New York, NY (Legal and Professional) and Alpharetta, GA (Risk Solutions). Global reach: customers in more than 100 countries with about 15,000 global employees. Revenue was $6 billion in 2010.

• LexisNexis® Risk Solutions is a leader in providing essential information that helps companies across all industries and government predict, assess and manage risk. Formed in 2000, the business unit grew via organic growth and four acquisitions (RiskWise, Dolan, Seisint, ChoicePoint). The core capabilities of the business unit are data, linking and data analytics for the customers in enterprise organizations (financial services, insurance carriers, government, law enforcement).

LexisNexis® Over 35 Years of Big Data Experience

17

RED/082311

Processing power to allow for complex matching, scoring and processing in real time, at point of need for Big Data

Data Analytics Unique and proven analytic tools

based on all data sources Our advanced technology which

matches and links files across disparate data sources

Linking

Advanced Technology

Public record and proprietary data on consumers and businesses

ƒx(x,y) = (x-y)2

Big Data is in the DNA of LexisNexis Risk Solutions

Big Data Examples

Over 4 petabytes of content (4 thousand terabytes)

• 34 billion records

• 45,000 sources

• 800,000 records added daily

• 4.2 billion names and addresses

• 585 million identities

• 739 million business contacts

• 3.5 billion documents

• Adding 2 million documents per day

• Processing over 100 million documents per day

18

RED/082311

• High Performance Computing Cluster Platform (HPCC) enables data integration on a scale not previously available and real-time answers to millions of users. Built for big data and proven for 10 years with enterprise customers.

• Offers a single architecture, two data platforms (query and refinery) and a consistent data-intensive programming language (ECL)

• ECL Parallel Programming Language optimized for business differentiating data intensive applications

HPCC Systems Built for Big Data, Proven for 10 Years with Enterprise Customers

Big Data

Open Source Components

Insurance

Financial Services

Cyber Security

Government

Health Care

Retail

Telecommunications

Transportation & Logistics

Weblog Analysis

INDUSTRY SOLUTIONS

Customer Data Integration Data Fusion Fraud Detection and Prevention Know Your Customer Master Data Management Weblog Analysis

Online Reservations

19

RED/082311

Qu

ery

Tim

e

TB of Storage

50 100 150 200 250 300

10 sec 13 sec

Scalability: Little Degradation in Performance

Scalability

• Scales to support 1000+TB (up to petabytes) of data

• Purposely built system to do massive I/O

• Rapidly performs complex queries on structured and unstructured data to link to a variety of data sources

• Suitable for massive joins/mergers, beyond limits of relational DBs

• Scale increases with the addition of low-cost, commodity servers

HPCC – in production

• Current production systems range from 20 to 2000 nodes

• Currently supports over 150,000 customers, millions of end users

• Currently handling over 20 million transactions per day for our online and batch products, and innovation leading to better results

Complex query example demonstrates below

• Transaction latencies increase logarithmically while data sizes grow linearly

20

presenter: jo prichard · – flipping – short-sale flipping (flopping) •seller and buyer risk...

Documents