usama fayyad, barclays, presentation at the chief data & analytics officer forum, singapore

61
The Rapidly Evolving Big Data Landscape: Implications & Challenges for Enterprise Data Analytics Usama Fayyad, Ph.D. Group Chief Data Officer (departing) Barclays Twitter: @usamaf Keynote: Chief Data & Analytics Officer Forum Singapore- July 27, 2016

Upload: corinium-global

Post on 20-Jan-2017

62 views

Category:

Data & Analytics


0 download

TRANSCRIPT

PowerPoint Print - PowerPoint 2007

The Rapidly Evolving Big Data Landscape:Implications & Challenges for Enterprise Data Analytics

Usama Fayyad, Ph.D. Group Chief Data Officer (departing) BarclaysTwitter: @usamaf

Keynote: Chief Data & Analytics Officer Forum Singapore- July 27, 2016

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 OutlineBig Data all around usData Governance Some of the issues in BigDataCase studies Context and sentiment analysisA case-study on IOT4 use cases at BarclaysSummary and conclusions

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 DATA: the key to restoring the lost customer intimacy in the digital banking era

100 years agoFront office was intimatePersonal knowledgeDirect interactionsStaff knows all that is happening with client and familyPersonalized service automaticNowBack office was simpleEasy to understand riskEasy to score and set limits by intuitionKYC trivially easy and naturalControls straightforward

Front office has no knowledge or intimacyMore complex product lineData silos and high latenciesNo unified view of the customerPersonalization a challengePoor understanding of customerBack office is complexDifficult to scale because tech did not evolve No culture of automationRisk and Finance expensive hard to manage because the data is a mess and hard to accessControls a challengeFront OfficeBack Office

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016

Why data is important: Every customer interaction in an opportunity to capture data, learn & actAn instant car loan product offering is displayed in the appConnie logs into BMB to check her balance. From browsing behaviour we know she seeks a car loanCRM data is used to pre-calculate Connies borrowing limit for a car loanUsing internal/external data sources, predictive models, identify cross-sell opportunitiesConnie is offered a competitively priced bespoke offer for car servicing / MOT

Connies journey is enhanced based on previous multivariate testing results

User experience is continuously, iteratively improved by capturing user interaction in real-time every sessionConnie has a personalised journey based on calculated limit for the car loan amount Data Capture/Opportunity to learnActions Customer Interaction Event (branch, telephony, digital, mobile, sales)

Millions of customer interactions per day

Customer InteractionCRMPredictive AnalyticsMultivariate TestingTargeted offers during browsingProduct DiscoveryCross-sell Measure feedback per session

Every interaction is an opportunity to capture data, to learn and to act

Big Data platform

Imagine we could move back to this model 100 years ago on demand, consumable, understandable information to build intimate relationships & an understanding of the customer

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 What matters in the age of analytics?

1. Being able to exploit all the data that is available Not just what you have available What you can acquire and use to enhance your actions

2. Proliferating analytics throughout the organizationMake every part of your business smarterActions and not just insights

3. Driving significant business value Embedding analytics into every area of your business can significantly drive top line revenues and/or bottom line cost efficiencies

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 The DATA situation Expectation vs RealityBusiness expectationReliableAffordableTimelyAccurateComprehensive

UnifiedAccessibleEasy to understandEasy to embed into business actionsRealityUnreliableExpensiveBatch and slowQuestionable QualityIncomplete & No unstructured DataFragmentedDifficult to getConfusingUnusableBigData is a new generation of Data Technology that promises to make it easier to address the issues rapidly and affordably

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Data Fusion: Can we bring all data together for analysis & action?

Customer InteractionsCentral Data Fusion EngineIngesting, persisting, processing and servicing in Real-time.

Analysis

Transactions

Social Data

Trade

Application Logs

Network Traffic

Risk

Marketing

Financial Crime

Fraud

Cyber Security

Enabled by Hadoop

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016

Big Data: is a mix of structured, semi-structured, and unstructured dataTypically breaks barriers for traditional RDB storageTypically breaks limits of indexing by rowsTypically requires intensive pre-processing before each query to extract some structure usually using Map-Reduce type operations

Above leads to messy situations with no standard recipes or architecture: hence the need for data scientists Conduct Data Expeditions Discovery and learning on the spotWhy Big Data?A new term, with associated Data Scientist positionsA Data Scientist is someone whoKnows a lot more software engineering than Statisticians& Knows a lot more Statistics than software engineers

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016

I do not like volume metrics, but they indicate appetiteWide-ranging claims mostly bout data in the worldWe look at Enterprise Data Where companies pay money for storage as a good metric for real data appetiteHow Much Data? How Much of it is Structured?

60,000 Petabytes = 60 million Terabytes = 60 Trillion Gigabytes = 60 Exabytes

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 The 4-Vs of Big Data

Big Data is Characterized by the 3-Vs:

Volume: larger than normal challenging to load/processExpensive to do ETLExpensive to figure out how to index and retrieveMultiple dimensions that are keyVelocity: Rate of arrival poses real-time constraints on what are typically batch ETL operationsIf you fall behind catching up is extremely expensive (replicate very expensive systems)Must keep up with rate and service queries on-the-flyVariety: Mix of data types and varying degrees of structureNon-standard schemaLots of BLOBs and CLOBsDB queries dont know what to do with semi-structured and unstructured data

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016

How Classic Data Explodes: really big

age 32, MaleLives in LondonLawyer

Applied for a Barclays Personal Loan

Applied for a Barclaycard

Opened Savings Account

Searched for social medias and forums re mortgages products Searched for low Mortgage rate

Used new iPhone, iPad, downloaded Banking apps

Activated offer

Downloaded Rightmove app

Social Graph (FB)

Likes & friends likes

Professional netwk- reputation

Web searches on Banking Services and Products

MetaData on everything

Blogs, publications, news, local papers, banking, mortgages providers, credit cards

CIDAgeaddressjob

Gender

#Invited keynote talk Copyright Usama Fayyad 2015

The Distinction between Classic Data and Big Data is fast disappearing

Most real data sets nowadays come with a serious mix of semi-structured and unstructured components:ImagesVideoText descriptions and news, blogs, etcUser and customer commentaryReactions on social media: e.g. Twitter is a mix of data anyway

Using standard transforms, entity extraction, and new generation tools to transform unstructured raw data into semi-structured analyzable data

#Invited keynote talk Copyright Usama Fayyad 2015

First Principles for Data Governance & Architecture

13

#Invited keynote talk Copyright Usama Fayyad 2015

13

Reality CheckSo what should users of analytics in Big Data world do?

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Load your Data into a Data Lake !Your life will be so much easier as you can now do Data Acrobatics & other amazing data feats

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 So What About this Data Lake?What happens to this Data Lake in next weeks? Months? And years?

What is really in this store?Who defined the attributes?How do various pieces of data relate to each other?How is change in data sources tracked?What happens when business entities change?

JuiceColaMilk CreamToothpaste Soap 1 23 4 76 5 WS N 151012205010

Enterprise warehouse

Metadata

StarschemaDataMart

DataMart

DataMart

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 The Data Lake according to Waterline

We loaded the Data!CongratulationsNow What?

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Reality check: Is this problem solvable?

What does it take to make a Data Lake Sustainable?

Is it possible to govern such a thing?How do people/groups contribute to this in a rational way?What is the representationWhat is the operating model?

Can you think of an example functional healthy Data Lake that lasts for a while?

Answer?

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 From a Data Lake to Amazon Browser

Amazon Simple Search

Amazon gives you facets

Product details

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Reality check: So where do analysts & Data Scientists spend all their time?

Prepared view of DataManage data viewsManage accessData cleaningData aggregationsManaging the Data sources and Data feedsAttribute & Data changesNormalization and RegularizationFeature ExtractionData lineageETLMetadata ManagementData SelectionEntity ExtractionLets Mine the Data

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Reality check: So what do Technology people worry about these days?To Hadoop or not to Hadoop?

When to use techniques requiring Map-Reduce and grid computing?

Historically organizations tried to use Map-Reduce for everything to do with Big DataThis is actually very inefficient and often irrationalCertain operations require specialized storageUpdating segment memberships over large numbers of usersDefining new segments on user or usage data

Today, Map-Reduce is not really in use and many in-memory and other alternatives (SPARK, Lambda Achitecture) are more typical

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Drivers of Hadoop in large enterprises

Cost of storage

Fastest growing demand is more storage

Data in Data Warehouses have traditionally required expensive storage technology:

$20K to $50k per terabyte per year cost of premium storage

$1K per terabyte much lower per year cost of Hadoop on commodity storage

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Hadoop Stack

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016

Big Data Landscape

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 http://blogs.the451group.com/information_management/files/2011/04/Figures-Aslett_web.jpg25

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 http://blogs.the451group.com/information_management/files/2011/04/Figures-Aslett_web.jpg26

Data Platforms Landscape Map

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Reality check: If storage is biggest driver of Hadoop adoption, what is next biggest economic driver?

ETLExtract Transform and LoadReplaces expensive licensesMuch higher performance with lower infrastructure costs (processors, memory)Flexibility in changing schema and representationFlexibility on taking on unstructured and semi-structured dataPlus suite of really cool tools

From ETL ELT

Extract Load - Transformcomputation comes to data, data does not have to leave for processing

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016

Avoiding the Apps Database Induced Mess

Front Office Operations

App

App

App

App

App

App

App

App

App

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016

Bring the Apps to the Data rather than the reverseReal Time Data FeedsBatch Load Data Feeds

Front Office Operations

App

App

App

App

App

App

App

App

AppData Collection / Capture Platform(leveraging BigData and open source platforms)

Data Instrumentation Standards and Platform

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Overview talk Copyright Usama Fayyad 2016 Case Studies:1. IOT Case Study 2. Case Studies at Barclays

31

The Connected CowThis part of the talk is borrowed from:

Joseph SiroshVP, Machine Learning Microsoft

I'm Joseph Sirosh, Corporate VP of Machine Learning at Microsoft. I wanted to tell you a story today about connected cows. Yes, you heard me right. There is such a thing as cloud connected cows. It is a story about all the sexy buzzwords of today - the internet of things, cloud services, data and analytics. It's a sexy story. But it's also about down to earth, simple innovations that revolutionize even some of the oldest industries in the world such as farming. And as you will see it's also about clever ideas and human ingenuity.

7/26/1632 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

video courtesy of Fujitsu and MTN

So these are the connected cows were talking about. Notice the pedometers on their legs. They are connected to WiFi. They are sending data into Microsofts Azure Cloud about the number of steps the cow is taking. Constantly. Every step from every cow in the dairy farm is being counted. It's Fujitsu's GyuHo - "Cow Step" - service in the cloud.

33

video courtesy of Fujitsu and MTN

So these are the connected cows were talking about. Notice the pedometers on their legs. They are connected to WiFi. They are sending data into Microsofts Azure Cloud about the number of steps the cow is taking. Constantly. Every step from every cow in the dairy farm is being counted. It's Fujitsu's GyuHo - "Cow Step" - service in the cloud.

34

Do cows need to take 10,000 steps a day?

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Do cows also need to take 10,000 steps a day?

35

Every company is a Bigdata company

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 But before I give you the answer, let's take a step back. These days, every company is a data company, even the ones you least expect. Even dairy farms. Farms have all the constraints of a typical business. Their output is milk and beef. They need to maximize that output under the constraint of the fixed assets they have - the cow herd, the pasture, and the facilities. They need to reduce loss due to disease, and maximize the efficiency of farm labor, which btw, turns out to be expensive.

36

What a dairy farmer can do

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016

37

Improve cattle production by accurate detection of estrusDetect health issues early and prevent loss

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Think of Joe the dairy farmer. What can he do? Here are a couple of things Joe can do. First, he can prevent loss of animals due to disease - detect early, vaccinate and so on. Second, to maximize output, he can maximize calf production by accurate detection of estrus. If you remember high school biology, estrus is the scientific word for when an animal goes into heat and is ready to mate. Being in the mood, so to speak. When the magic is ready to happen. This turns out to be very leveraged as you'll see. But as with all things on this subject it turns out to be harder than it looks. What is Joe to do to detect when the time is right for hundreds of cows? It's a real heat in the haystack problem.38

70%IMPROVEMENTUP TO

Effect of estrus detection rate on increasing pregnancy rateCONCEPTION RATEESTRUS DETECTION RATEPREGNANCY RATE70%55%39%70%65%46%70%75%53%70%85%60%70%95%67%

Source: Detection of Standing Estrus in Cattle FS921B George Perry, Beef Reproduction and Management Specialist Animal and Range Sciences Department

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 To see how important accurate estrus detection is, let's look at data from a research paper. These days, with Artificial Insemination, conception rate is typically at 70%. But only if you detect estrus accurately does it work. When you multiply the two probabilities together, you see the effect - at 55% estrus detection rate, which by normal methods is good, you get a 39% probability of calf production. At 95% detection rate, you get 67%, almost 70% improvement over baseline.

39

But this is hard

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 But can you detect at 95% accuracy? It turns out to be hard.

40

Estrus lasts only for 12-18 hours every 21 daysOccurs mostly between hours of 10 pm and 8 am

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Estrous lasts only 12-18 hours and like in the case of many mammals it turns out to be mostly between 10pm and 8am. When Joe and his farm hands have other things to do as well. Remember - he gets up at 5am in the morning, and is working his farm, feeding, cleaning, milking, and the hundreds of other tasks he has to do all day.

41

How can farmers tell when the time is right for hundreds of cows?

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Could you use technology to monitor the dairy? Get a heat map?

42

AI finally meets AI

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Here's where human ingenuity steps in. A farmer in Japan asked our partner Fujitsu for help with this problem. Fujitsu consulted a university researcher. Based on some ingenious research that had already been published by then, they came up with a clever detection system. Something that was indeed 95% accurate for the detection of estrus. It's the hottest system for detecting heat.

43

SENSOR

COWSHEDDIPOLE ANTENNA

3G CARDROUTER

RECEIVER

COMMUNICATION

ANALYTICS

CLOUD CENTER

NETWORK

BASE STATION

HOUSE/OFFICEINTERNETPEDOMETER

GYUHO Cow Step Service

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 This is that system. The Fujitsu GyuHo ("Cow Step") SaaS service on Microsoft Azure. Step count data from the pedometers on each cow are sent over WiFi and the internet to a service on Azure, which analyzes the data and produces alerts for the farmer.

44

16 Hours LaterProbableFemale

ProbableMale

Steps

12th 9:0012th 9:0012th 17:0013th 7:0013th 9:0013th 17:00Elapsed timeSource: http://www.fujitsu.com/global/about/resources/news/press-releases/2013/1015-01.html

Start of Estrus DetectedOptimum For Artificial Insemination

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 It turns out there is a simple secret to detecting when a cow is in heat. You see, at the onset of estrus there is a marked increase in the number of steps a cow takes. Right - when the cow feels hot, she starts pacing furiously. And you can detect this with a simple day over day comparison. This chart, courtesy of Fujitsu, illustrates this well. The X-axis is time of night. Y-axis is the number of steps taken. The lower line is the number of steps on a normal night by our sample cow. The upper line is the number of steps when the cow goes into estrus. The number of steps per hour just shot way up! It turns out that this signal is about 95% accurate for detection of heat, especially in a controlled environment such as the dairy farm. The optimal time for Artificial Insemination is 16 hours from the onset of estrus, for maximum conception rate. That is when AI meets AI.

45

ANIMALTYPE OF ESTRUS CYCLELENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUSTIME OF OVULATIONCATTLEPolyestrous19-23Average:216-30 hoursAverage: 18 hours12 hours after end of estrusGOATSeasonal polyestrous in Fall12-24Average:201-4 daysAverage: 39 hours30-36 hours after start of estrusSHEEPSeasonal polyestrous in Fall14-20Average:1720-42 hoursAverage: 30 hoursAt or near end of estrusHORSESeasonal polyestrous in Spring10-37Average:212-6 daysAverage: 4 days24-48 hours before the start of estrus to 24 hours after the end of estrusSWINEPolyestrous 18-23Average:211-2 daysAverage: 36 hours8-12 hours before the end of estrus or 37-40 hours after start of estrus

Variability in estrus cycles of farm animals

ANIMALCATTLEHORSELENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUS19-23Average:2119-23Average:2110-37Average:212-6 daysAverage: 4 days

ANIMALCATTLEHORSELENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUS19-23Average:216-30 hoursAverage: 18 hours10-37Average:212-6 daysAverage: 4 days

HORSE10-37Average:212-6 daysAverage: 4 days

19-23Average:216-30 hoursAverage: 18 hoursLENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUSSource: Handbook of Livestock Management, Richard Battaglia

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 It turns out that all types of animal breeding are needle in a haystack problems. The duration of estrus is a short period in a fairly variable estrous cycle. We're only starting to scratch the surface of what might be possible with real time monitors and data. And we haven't even started to talk about the multi-billion dollar human fertility industry. Maybe you could learn more from your Fitbit than you thought you could.

46

Source: FujitsuSTOCK FARMER NAMEBREEDING NUMBERSBEFORE GYUHO INTRODUCTION

8 MONTHSAFTER GYUHO INTRODUCTION

8 MONTHSSHORTENED DAYSANNUAL PRODUCTION INCREASED HEADSANNUAL INCREASEIN %INCOME INCREASE BY PRODUCTION INCREASEJPY350,000/ CATTLEAVERAGEDAYS-OPENINTRAPARTUM INTERVALDAYS-OPENINTRAPARTUM INTERVAL1ASTOCK FARMER 18078363633481584%2,800,0002BSTOCK FARMER 262743595934415125%4,200,0003CSTOCK FARMER 11096377663512687%2,800,0004DSTOCK FARMER 2025433940330963%2,100,0005ESTOCK FARMER 498783635133627408%14,000,0006FSTOCK FARMER 20115443974359803718%12,950,0007GSTOCK FARMER 53711540062347537514%26,250,0008HSTOCK FARMER 273217502663511518531%29,750,0009ISTOCK FARMER 17313742267352702917%10,150,00010JSTOCK FARMER2488336850335332410%8,400,00011KSTOCK FARMER 1511023876935433139%4,550,000AVERAGE258108393 61346473112%10,722,727

Economic effects before and after utilizing Fujitsu GYUHO system on 11 dairy farms

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 So here's the table of actual experimental results from 11 farms over an 8 month period, courtesy of Fujitsu. On an average, they found a 12% increase in calves produced, with some farms as high as 31%. And these benefits do not count the labor savings for monitoring estrus, and the potential benefits of health monitoring of the animals. 47

Case Studies:1. IOT Case Study 2. Case Studies at Barclays

48

Data Fusion: Can we bring all data together for analysis & action?

Customer InteractionsCentral Data Fusion EngineIngesting, persisting, processing and servicing in Real-time.

Analysis

Transactions

Social Data

Trade

Application Logs

Network Traffic

Risk

Marketing

Financial Crime

Fraud

Cyber Security

DaaS

Use Case 1: Real-Time Notifications and Decisioning in Retail Banking

Fatafat ( ) in Hindi means FAST or INSTANTBuilt on Open Source PlatformUtilizes BigData for fast movement and processing over large data streamsAllows for decisions/prediction while data is moves

Barclays Africa Retail BankingHelp customers avoid overdraft chargesAllow customers to react in time to prevent overdraftBased on historical interactions, we PREDICT likelihood of overdraftReal-time SMS sent to customer to notify of impending event

Results? 60% response rate 80% favourability rating by customers as part of initial 50K pilotLaunched at mass end Feb 2016We took the Open Source approach See www.Kamanja.org for leveraging the approach + contributing

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Real-Time Notifications and Decisioning in Retail Banking

Results? 60% response rate 80% favourability rating by customers as part of initial 50K user pilotLaunched at mass end Feb 2016

See: http://bit.ly/1VbpaZD for press coverageWe took the Open Source approach See www.Kamanja.org for leveraging the approach + contributing

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Fast Results in Multiple Application at Barclays Africa1. Predictive Overdraft Alerts> 65% response rateConnection: 10x increase in responsivenessFeedback: +80 NPS Score

2. Influencing Customers to Move to Digital Channel10X increase in Mobile App Usage10% increase in # Transactions on DigitalEngagement: 1 in 2 more likely to use digital

3. Credit Limit Increase PersonalizationUptake: 2X increaseConnection: 30% increase via voiceValue: 10% average increase in CLI offered

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Data Exhaust is often very strategically valuable1. Financial Crime Monitoring/DetectionE.g. AML Monitoring: most transactions are goodWhere does this data go?Does FinCrime Data enrich customer understanding?

2. Fraud DetectionWhen a transaction is analyzed, and verification conductedWhere does this data go?Do fraud operations enhance my knowledge of my customer?

3. From KYC to UYCData on Know-Your-Customer is required by regulatorsWhere does this data go?Does it flow into CRM? Does it Enhance Understanding of Customer?

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Use Case 2: Why does a banks Treasury department need Big Data or Data as a Service?Treasury are major consumer of data from across the Barclays groupA large proportion of Treasurys change portfolio involves data sourcing activity - mining & refining from across Risk, Finance and Front Office origination of all of the firms business lines and legal entities.

Treasurys data infrastructure is complex and grows more so as each project reaches completionTreasury alone use and consume 2,000 different pieces of data and we presently source this from more than 100 separate systems, via 350 individual feedsThis complexity has an organisational costIt is estimated that on larger projects, data sourcing represents 25-50% project budgets. Mining and Refining is a throttle to our investment.This gets harder as we move through the Structural Reform agenda.

System LayerBig Data Platform (Hadoop - Scoop, Scala, Hive, Hue), SAS, Tableau, Solr, etcSingle Sanctions Payment ViewCentralised Watch List DepositorySingle Customer View Centralised SAR DepositoryEnabling a holistic view of Financial Crime Risk

Enhanced Financial Crime Intelligence capabilitiesData & Analytics Layer

Customers(Due Diligence, ID&V, Accounts, Risk Rating)Transactions(Wire Transfers, Checks, Trades)Payments (Cross-Border, Domestic, IBAN, Internet)

Data Detection/IntelligenceCourt OrdersFraud DataUse Case 3: Financial Crime Intelligence - Big Data in Action Statistical & Threshold-based alertingEarly warning & notificationNear repeat pattern analysisLoad forecasting and cyclical patternsBehavioral targetingAbility to handle billions of records Trending and intelligence dashboards Real time access to data information fabric integrates heterogeneous data and content repositories

Terrorism ActivityHuman & Drug TraffickingStandard Financial Crime Reports (SAR, STR, CTR)Sanctions CircumventionFinCEN & other Regulatory ReportsLaw Enforcement, Industry Forums Fraudulent transactionsHawalaMoney LaunderingE-Commerce FraudOnline Bank TheftInside Trading

Sanctions, ABC & AML Policies and Standard

Regulatory Agencies and Law Enforcement

Use Case 4: Big Data and IFRS9

Admin & Control FunctionsShared Data PlatformRisk & P&LFinancial ReportingFCCredit RiskMarket RiskPCIBWealthCorporateCardsRBB

Price TestingReporting

ReportingVaR Calc

CA, CL,HF, BB,Risk

Personal,Corporate, RiskTrades,Vals,Risk

Trades,Vals,RiskTrades,Vals,RiskReportingMCCalcRWA CalcReportingRe-RunExplainPnL PostingReference Data Sources

Market RiskTradesCredit RiskTrial BalanceInstrument StaticNetting & CollateralBook HierarchyAfrica

Trades,Vals,RiskIFRSIFRSReportingCalcsLoansvalidateAdjustThe Risk and Finance Big Data Platform (Mercury) designed to drive:Optimisation and Integrity across Risk, Finance & Treasury data and processesProcesses run off single set of standardised and consistent dataShared processes leveraging common platformsCompute/reporting comes to data avoiding dangerous extractsEnd to end process and control framework

Where does this leave us?

1. What matters in the age of analytics?. Being able to exploit all the data that is availableProliferating analytics throughout the organizationDriving significant business value

2. Where are we today?New Data Landscape via HadoopConfusion by marketers, analysts, and technical communityStruggling with basics of managing data because of the new flexibility

3. What are the real issues?Data management and governanceTalent for Data and Data Science is rare but criticalData and Analytics are specialisms that need management and know-how: not for generalists

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 The Real Issues Facing Data Scientists1. How do I get to the DataWhat data sets do I need? How do I get to them?What is the meaning of this data?Which version of the multitude of Data copies? All with small variants

2. Where is the Meta Data?Usually an afterthoughtOften insufficient, despite its criticality

3. How do I get the Data together?How can I get data from traditional RDBMS, BI systems, Logs, and Data Lakes?How much time do I need to access all these data stores?

4. How do I act on Data in Real-Time?Real-time streaming Real-time decisioningDeploy in operational systems

5. How do I get the Value fast?How do I get to results quicklyHow do I maximize productivity? -- So much Data, So little time

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Lessons LearnedReal economic drivers of Big Data not technologyLower cost, open source ETLFlexibility and avoidance of Data movement are key acceleratorsNot only unstructured data, but Apps come to Data insteadA game changer in Data ArchitectureA lot more data than qualified talentFinding talent in BigData is very difficult, Retaining talent is even harderData people need the critical mass to build effective teamsDrive data efforts by business need, not by technology prioritiesCredibility begets budget and business appetiteThe dangers of Data Chaos can be avoided Data Architecture and Data Governance are keys to avoiding the inevitable

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016 Concluding ThoughtsEvolving and confusing Data landscapeMany transitions are or will be failing before we understand itDo not confuse Technology issues with Privacy and Policy IssuesSeparate domains, and Policy/Law need a lot more attentionTrue Consumer Opt-in is the only sustainable approachRequires a true exchange of value to be sustainableRequires transparency and controlNew regulations are encouraging openness and consumer controlE.g. PSD II in EUIf solved appropriately, Technology can enable end-user controlMatters less WHERE Data is stored, what matters is WHO has the decryption keys/rightsBest for security, and best for privacy, best for business and ScienceAvoid the mess of the Consumer Internet/Advertising economy

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016

Usama Fayyad

https://uk.linkedin.com/in/ufayyad

Email: [email protected]

Assistant: [email protected]

Thank You! & Questions

Twitter @Usamaf

251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Invited keynote talk Copyright Usama Fayyad 2016