PowerPoint Print - PowerPoint 2007
From BigData to Data Science: Predictive Analytics in a Changing Data Landscape Usama Fayyad, Ph.D. Group Chief Data Officerand CIO Risk, Finance, & Treasury TechnologyBarclaysTwitter: @usamaf
Overviews for South AfricaJuly 15-16, 2015
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225OutlineBig Data all around us: the problems that Data Science is NOT addressingThe CDO role and Data AxiomsSome of the issues in BigDataCase studies Context and sentiment analysisA case-study on IOTYahoo! predictions at scaleSummary and conclusions
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Why should banks worry about data?
100 years agoSmallerMuch more localDeep understanding of customer needsDeep knowledge of customerNowDigital, decline of the branchGlobalNo face of the customerNew generation of millennials A big part of banking was knowing the customer intimately personal relationships making KYC, risk modelling simple
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Why data is important: Every customer interaction in an opportunity to capture data, learn & act
An instant car loan product offering is displayed in the appConnie logs into BMB to check her balance. From cookies and browsing behaviour we know she is looking for a car loanCRM data is used to pre-calculate Connies borrowing limit for a car loan
Based on multiple internal and external data sources and predictive models, we identify cross-sell opportunitiesConnie is offered a competitively priced bespoke offer for car servicing / MOT
Connies journey is enhanced based on previous multivariate testing results
User experience is continuously, iteratively improved by capturing user interaction in real-time during every sessionConnie has a personalised journey based on pre-calculated limit for the car loan amount Data Capture/Opportunity to learnActions Customer Interaction Event (branch, telephony, digital, mobile, sales)
Millions of customer interactions per day
Customer InteractionCRMPredictive AnalyticsMultivariate TestingTargeted offers during browsingProduct DiscoveryCross-sell Measure feedback per session
Every interaction is an opportunity to capture data, to learn and to act
Big Data platform
Imagine we could move back to this model 100 years ago on demand, consumable, understandable information to build intimate relationships & an understanding of the customer
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225What matters in the age of analytics?
1. Being able to exploit all the data that is available Not just what you have available What you can acquire and use to enhance your actions
2. Proliferating analytics throughout the organizationMake every part of your business smarterActions and not just insights
3. Driving significant business value Embedding analytics into every area of your business can significantly drive top line revenues and/or bottom line cost efficiencies
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Data Fusion: Can we bring all data together for analysis & action?
Customer and ClientInteractionsCentral Data Fusion EngineIngesting, persisting, processing and servicing in Real-time.
Analysis
Transactions
Social Data
Trade
Application Logs
Network Traffic
Risk
Marketing
Financial Crime
Fraud
Cyber Security
DaaS
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Big Data: is a mix of structured, semi-structured, and unstructured dataTypically breaks barriers for traditional RDB storageTypically breaks limits of indexing by rowsTypically requires intensive pre-processing before each query to extract some structure usually using Map-Reduce type operations
Above leads to messy situations with no standard recipes or architecture: hence the need for data scientists Conduct Data Expeditions Discovery and learning on the spotWhy Big Data?A new term, with associated Data Scientist positionsA Data Scientist is someone whoKnows a lot more software engineering than Statisticians& Knows a lot more Statistics than software engineers
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225The 4-Vs of Big Data
Big Data is Characterized by the 3-Vs:
Volume: larger than normal challenging to load/processExpensive to do ETLExpensive to figure out how to index and retrieveMultiple dimensions that are keyVelocity: Rate of arrival poses real-time constraints on what are typically batch ETL operationsIf you fall behind catching up is extremely expensive (replicate very expensive systems)Must keep up with rate and service queries on-the-flyVariety: Mix of data types and varying degrees of structureNon-standard schemaLots of BLOBs and CLOBsDB queries dont know what to do with semi-structured and unstructured data
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Male, age 32Lives in SFLawyer
Searched on from London last week
Searched on:Italian restaurantPalo Alto
Checks Yahoo! Mail daily via PC & Phone
Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k peopleSearched on:Hillary Clinton
Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown
Spends 10 hour/week On the internet
Purchased Da Vinci Code from Amazon
Classic Data: e.g. Yahoo! User DNA
#Invited talk #ODSC, Boston Copyright Usama Fayyad 2015
9
Male, age 32Lives in SFLawyer
Searched on from London last week
Searched on:Italian restaurantPalo Alto
Checks Yahoo! Mail daily via PC & Phone
Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k peopleSearched on:Hillary Clinton
Clicked on Sony Plasma TV SS ad
Spends 10 hour/week On the internet
Purchased Da Vinci Code from Amazon
How Data Explodes: really big
Social Graph (FB)
Likes & friends likes
Professional netwk- reputation
Web searches on this person, hobbies, work, location
MetaData on everything
Blogs, publications, news, local papers, job info, accidents
#Invited talk #ODSC, Boston Copyright Usama Fayyad 2015
10
The Distinction between Classic Data and Big Data is fast disappearing
Most real data sets nowadays come with a serious mix of semi-structured and unstructured components:ImagesVideoText descriptions and news, blogs, etcUser and customer commentaryReactions on social media: e.g. Twitter is a mix of data anyway
Using standard transforms, entity extraction, and new generation tools to transform unstructured raw data into semi-structured analyzable data
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Text Data: The Big Driver
We speak of big data and the Variety in 3-Vs
Reality: biggest driver of growth of Big Data has been text dataMost work on analysis of images and video data has really been reduced to analysis of surrounding text
Nowhere more so than on the internet
Map-Reduce popularized by Google to address the problem of processing large amounts of text data: Many operations with each being a simple operation but done at large scaleIndexing a full copy of the webFrequent re-indexing
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225A few words on: The Chief Data OfficerWhy are companies creating this position?
There is a fundamental realisation that Data needs to become a primary value driver at organizationsWe have lots of DataWe spend much on it: in technology and peopleWe are not realising the value we expect from it
A strong business need to create the CDO role:Traditional companies are not following, but adopting the model that actually works in other data-intensive industries
CDO has a seat at executive table: the voice of Data
Data done right is an essential element to unify large enterprises to unlock value from business synergies
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Fundamental Data Principles to Support AnalyticsUsamas Obvious Data Axioms
1. Data gains value exponentially when integrated and coalesced. When fragmented: dramatic value loss takes place;Increased costs; Reduced utility/integrity; Increased security risks
2. Fusing Data together from disparate/independent sources is difficult to achieve and impossible to maintainHence only viable approach is:Intercepting and documenting at the sourceFusing at the source Controlling lifecycle and flow
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Fundamental Data Principles to Support AnalyticsUsamas Obvious Data Axioms
3. Standardisation is essentialFor sustained ability to integrate data sources and hence growing value; For simplifying down-stream systems and appsFor enforcing discipline as a firm increases its data sources
4. Data governance and policy must be centralised Needs to be enforced strongly else we slip into chaos and a Babylon of terms/languagesAn Enterprise Data Architecture spanning structured and unstructured data
5. Recency Matters -> data streaming in modelling and scoringOften, accuracy of prediction drops quickly with time (e.g. consumer shopping)Value of alerts drop exponentially with timeAbility to trigger responses based on real-time scoring criticalStreaming, real-time model updates, real-time scoring
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Fundamental Data Principles to Support AnalyticsUsamas Obvious Data Axioms
6. Abstraction layer of data from Apps/platforms is a must Rapid renewal & modernization: the pace of change and development of technology are very rapidDesign for migration and infrastructure replacement via abstraction layers that remove tech dependenciesEncryption and Masking: Persisting unencrypted confidential and secret data (even within secure firewalls) is an invitation for problems and risks
7. Data is a primary competency and not a side-activity supporting other processesHence specialized skills and know-how are a mustGeneralists will create a hopeless messData is difficult: modelling, architecture, and design to support analytics
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Driving the need for integrated processes, data & architecture
Regulatory DemandData quality, completeness, lineage and aggregationCompany financial reporting presented at most granular level through various lenses: business units, legal entities, regional, sector level, Faster turnaround for increasingly complex ad hoc data requestsAdditional sophisticated stress tests delivered in a joined-up manner between Risk and FinanceEnhanced hybrid capital computation methodologies
Business DriversBetter and smarter controls
Capital precision
Better customer experience
Better colleagues experience
More cost effective
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Data Governance is BORING for most
Policy, governance & control
Business ownership & data stewardship
Definitions & metadataPermit to build(change governance)
Reporting & analytics
Data architecture
Data sourcing
Reconciliation
Data qualitymanagement
Documentation,tracking & audit
but it shouldnt be!
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Reality check: So what should users of analytics in Big Data world do?Load the Data into a Data Lake
Your life will be so much easier as you can now do Data Acrobatics & other amazing data feats
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225The Data Lake according to Waterline
We loaded the Data!CongratulationsNow What?
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225From a Data Lake to Amazon Browser
Amazon Simple Search
Amazon gives you facets
Product details
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Reality check: So where do analysts & Data Scientists spend all their time?
Prepared view of DataManage data viewsManage accessData cleaningData aggregationsManaging the Data sources and Data feedsAttribute & Data changesNormalization and RegularizationFeature ExtractionData lineageETLMetadata ManagementData SelectionEntity ExtractionLets Mine the Data
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Reality check: So what do Technology people worry about these days?To Hadoop or not to Hadoop?
When to use techniques requiring Map-Reduce and grid computing?
Typically organizations try to use Map-Reduce for everything to do with Big DataThis is actually very inefficient and often irrationalCertain operations require specialized storageUpdating segment memberships over large numbers of usersDefining new segments on user or usage data
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Drivers of Hadoop in large enterprises
Cost of storage
Fastest growing demand is more storage
Data in Data Warehouses have traditionally required expensive storage technology:
$20K to $50k per terabyte per year cost of premium storage
$2K per terabyte much lower per year cost of Hadoop on commodity storage
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Analysis & programming software
PIGHIPI
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Hadoop Stack
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Big Data Landscape
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225http://blogs.the451group.com/information_management/files/2011/04/Figures-Aslett_web.jpg28
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225http://blogs.the451group.com/information_management/files/2011/04/Figures-Aslett_web.jpg29
Data Platforms Landscape Map
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Reality check: If storage is biggest driver of Hadoop adoption, what is next biggest?
ETL
Replaces expensive licensesMuch higher performance with lower infrastructure costs (processors, memory)Flexibility in changing schema and representationFlexibility on taking on unstructured and semi-structured dataPlus suite of really cool tools
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Turning the 3-Vs of Big Data into Value
Understand context and contentWhat are appropriate actions?Is it Ok to associate my brand with this content?Is content sad?, happy?, serious?, informative?
Understand community sentimentWhat is the emotion?Is it negative or positive?What is the health of my brand online?
Understand customer intent?What is each individual trying to achieve?Can we predict what to do next?Critical in cross-sell, personalization, monetization, advertising, etc
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Many business uses of predictive analytics
Analytic techniqueUses in businessMarketing and salesIdentify potential customers; establish the effectiveness of a campaign Understanding customer behaviormodel churn, affinities, propensities, Web analytics & metricsmodel user preferences from data, collaborative filtering, targeting, etc.Fraud detectionIdentify fraudulent transactionsCredit scoringcredit worthiness of a customer requesting a loan, secured and unsecured loansManufacturing process analysisIdentify the causes of manufacturing problemsPortfolio tradingoptimize a portfolio of financial instruments by maximizing returns & minimizing risksHealthcare Applicationfraud detection, cost optimization, detection of events like epidemics, etc...Insurancefraudulent claim detection, risk assessmentSecurity and Surveillanceintrusion detection, sensor data analysis, remote sensing, object/person detection, link analysis, etc...Application Log File AnalyticsUnderstanding application, network, event logs in IT
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Case Studies:1. Context Analysis (unstructured data) 2. IOT Case Study 3. Yahoo! Predictive Modeling
Reality CheckSo who is the company we think is best at handling Big Data?
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Case Study: Biggest Big Data in advertising?Understanding context for ads
What Ad would you place here?
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Case Study: Biggest Big Data in advertising?Understanding context for ads
Damaging to Brand?
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225The Display Ads Challenge Today
What Ad would you place here?
#Invited talk #ODSC, Boston Copyright Usama Fayyad 2015
38
NetSeer: Solving accuracy issuesAmbiguity, waste, brand, safety
Why did Google Serve this Ad?39
this is how NetSeer actually sees this content
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225NetSeer: How it works40
high MPGfordlow emissionfuel efficiencyECONOMY CARSeconomy vehiclesmicroscope lensesreading glassesautofocusbifocal refractionVISION TOOLS eye chartfocus groupsA/B testingconsumer studysurveyingblind studyanalyticsMARKET RESEARCH~ ~ ~ ~ ~~ ~ ~ ~ ~
~ ~ ~ ~
~ ~ ~ ~ ~
WEBSITE.COM~ ~ ~ ~ ~
~ ~ ~ ~
~ ~ ~ ~ ~~ ~ ~ ~ ~
~ ~ ~ ~
electric vehicles
service recordsafety rating
focus
DISCERNS AND MONETIZES HUMAN INTENT+Identifies Concepts expressed on a page+Disambiguates language+Builds increasingly rich profile over time52M2.3BCONCEPTSRELATIONSHIPS BETWEEN CONCEPTS
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225NetSeer: Intent for DisplayPositive vs negative content re a particular topic
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Problem: Hard to understand user intent
Contextual Ad served by GoogleWhat NetSeer Sees:
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Case Studies:1. Context Analysis (unstructured data) 2. IOT Case Study 3. Yahoo! Predictive Modeling
The Connected CowJoseph SiroshVP, Machine Learning Microsoft
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225I'm Joseph Sirosh, Corporate VP of Machine Learning at Microsoft. I wanted to tell you a story today about connected cows. Yes, you heard me right. There is such a thing as cloud connected cows. It is a story about all the sexy buzzwords of today - the internet of things, cloud services, data and analytics. It's a sexy story. But it's also about down to earth, simple innovations that revolutionize even some of the oldest industries in the world such as farming. And as you will see it's also about clever ideas and human ingenuity.
7/15/1544 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
video courtesy of Fujitsu and MTN
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225So these are the connected cows were talking about. Notice the pedometers on their legs. They are connected to WiFi. They are sending data into Microsofts Azure Cloud about the number of steps the cow is taking. Constantly. Every step from every cow in the dairy farm is being counted. It's Fujitsu's GyuHo - "Cow Step" - service in the cloud.
45
Do cows need to take 10,000 steps a day?
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Do cows also need to take 10,000 steps a day?
46
Every company is a Bigdata company
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225But before I give you the answer, let's take a step back. These days, every company is a data company, even the ones you least expect. Even dairy farms. Farms have all the constraints of a typical business. Their output is milk and beef. They need to maximize that output under the constraint of the fixed assets they have - the cow herd, the pasture, and the facilities. They need to reduce loss due to disease, and maximize the efficiency of farm labor, which btw, turns out to be expensive.
47
What a dairy farmer can do
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
48
Improve cattle production by accurate detection of estrusDetect health issues early and prevent loss
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Think of Joe the dairy farmer. What can he do? Here are a couple of things Joe can do. First, he can prevent loss of animals due to disease - detect early, vaccinate and so on. Second, to maximize output, he can maximize calf production by accurate detection of estrus. If you remember high school biology, estrus is the scientific word for when an animal goes into heat and is ready to mate. Being in the mood, so to speak. When the magic is ready to happen. This turns out to be very leveraged as you'll see. But as with all things on this subject it turns out to be harder than it looks. What is Joe to do to detect when the time is right for hundreds of cows? It's a real heat in the haystack problem.49
70%IMPROVEMENTUP TO
Effect of estrus detection rate on increasing pregnancy rateCONCEPTION RATEESTRUS DETECTION RATEPREGNANCY RATE70%55%39%70%65%46%70%75%53%70%85%60%70%95%67%
Source: Detection of Standing Estrus in Cattle FS921B George Perry, Beef Reproduction and Management Specialist Animal and Range Sciences Department
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225To see how important accurate estrus detection is, let's look at data from a research paper. These days, with Artificial Insemination, conception rate is typically at 70%. But only if you detect estrus accurately does it work. When you multiply the two probabilities together, you see the effect - at 55% estrus detection rate, which by normal methods is good, you get a 39% probability of calf production. At 95% detection rate, you get 67%, almost 70% improvement over baseline.
50
But this is hard
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225But can you detect at 95% accuracy? It turns out to be hard.
51
Estrus lasts only for 12-18 hours every 21 daysOccurs mostly between hours of 10 pm and 8 am
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Estrous lasts only 12-18 hours and like in the case of many mammals it turns out to be mostly between 10pm and 8am. When Joe and his farm hands have other things to do as well. Remember - he gets up at 5am in the morning, and is working his farm, feeding, cleaning, milking, and the hundreds of other tasks he has to do all day.
52
How can farmers tell when the time is right for hundreds of cows?
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Could you use technology to monitor the dairy? Get a heat map?
53
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Here's where human ingenuity steps in. A farmer in Japan asked our partner Fujitsu for help with this problem. Fujitsu consulted a university researcher. Based on some ingenious research that had already been published by then, they came up with a clever detection system. Something that was indeed 95% accurate for the detection of estrus. It's the hottest system for detecting heat.
54
SENSOR
COWSHEDDIPOLE ANTENNA
3G CARDROUTER
RECEIVER
COMMUNICATION
ANALYTICS
CLOUD CENTER
NETWORK
BASE STATION
HOUSE/OFFICEINTERNETPEDOMETER
GYUHO Cow Step Service
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225This is that system. The Fujitsu GyuHo ("Cow Step") SaaS service on Microsoft Azure. Step count data from the pedometers on each cow are sent over WiFi and the internet to a service on Azure, which analyzes the data and produces alerts for the farmer.
55
16 Hours LaterProbableFemale
ProbableMale
Steps
12th 9:0012th 9:0012th 17:0013th 7:0013th 9:0013th 17:00Elapsed timeSource: http://www.fujitsu.com/global/about/resources/news/press-releases/2013/1015-01.html
Start of Estrus DetectedOptimum For Artificial Insemination
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225It turns out there is a simple secret to detecting when a cow is in heat. You see, at the onset of estrus there is a marked increase in the number of steps a cow takes. Right - when the cow feels hot, she starts pacing furiously. And you can detect this with a simple day over day comparison. This chart, courtesy of Fujitsu, illustrates this well. The X-axis is time of night. Y-axis is the number of steps taken. The lower line is the number of steps on a normal night by our sample cow. The upper line is the number of steps when the cow goes into estrus. The number of steps per hour just shot way up! It turns out that this signal is about 95% accurate for detection of heat, especially in a controlled environment such as the dairy farm. The optimal time for Artificial Insemination is 16 hours from the onset of estrus, for maximum conception rate. That is when AI meets AI.
56
ANIMALTYPE OF ESTRUS CYCLELENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUSTIME OF OVULATIONCATTLEPolyestrous19-23Average:216-30 hoursAverage: 18 hours12 hours after end of estrusGOATSeasonal polyestrous in Fall12-24Average:201-4 daysAverage: 39 hours30-36 hours after start of estrusSHEEPSeasonal polyestrous in Fall14-20Average:1720-42 hoursAverage: 30 hoursAt or near end of estrusHORSESeasonal polyestrous in Spring10-37Average:212-6 daysAverage: 4 days24-48 hours before the start of estrus to 24 hours after the end of estrusSWINEPolyestrous 18-23Average:211-2 daysAverage: 36 hours8-12 hours before the end of estrus or 37-40 hours after start of estrus
Variability in estrus cycles of farm animals
ANIMALCATTLEHORSELENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUS19-23Average:2119-23Average:2110-37Average:212-6 daysAverage: 4 days
ANIMALCATTLEHORSELENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUS19-23Average:216-30 hoursAverage: 18 hours10-37Average:212-6 daysAverage: 4 days
HORSE10-37Average:212-6 daysAverage: 4 days
19-23Average:216-30 hoursAverage: 18 hoursLENGTH OF ESTRUS CYCLE (DAYS)DURATION OF ESTRUSSource: Handbook of Livestock Management, Richard Battaglia
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225It turns out that all types of animal breeding are needle in a haystack problems. The duration of estrus is a short period in a fairly variable estrous cycle. We're only starting to scratch the surface of what might be possible with real time monitors and data. And we haven't even started to talk about the multi-billion dollar human fertility industry. Maybe you could learn more from your Fitbit than you thought you could.
57
Source: FujitsuSTOCK FARMER NAMEBREEDING NUMBERSBEFORE GYUHO INTRODUCTION
8 MONTHSAFTER GYUHO INTRODUCTION
8 MONTHSSHORTENED DAYSANNUAL PRODUCTION INCREASED HEADSANNUAL INCREASEIN %INCOME INCREASE BY PRODUCTION INCREASEJPY350,000/ CATTLEAVERAGEDAYS-OPENINTRAPARTUM INTERVALDAYS-OPENINTRAPARTUM INTERVAL1ASTOCK FARMER 18078363633481584%2,800,0002BSTOCK FARMER 262743595934415125%4,200,0003CSTOCK FARMER 11096377663512687%2,800,0004DSTOCK FARMER 2025433940330963%2,100,0005ESTOCK FARMER 498783635133627408%14,000,0006FSTOCK FARMER 20115443974359803718%12,950,0007GSTOCK FARMER 53711540062347537514%26,250,0008HSTOCK FARMER 273217502663511518531%29,750,0009ISTOCK FARMER 17313742267352702917%10,150,00010JSTOCK FARMER2488336850335332410%8,400,00011KSTOCK FARMER 1511023876935433139%4,550,000AVERAGE258108393 61346473112%10,722,727
Economic effects before and after utilizing Fujitsu GYUHO system on 11 dairy farms
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225So here's the table of actual experimental results from 11 farms over an 8 month period, courtesy of Fujitsu. On an average, they found a 12% increase in calves produced, with some farms as high as 31%. And these benefits do not count the labor savings for monitoring estrus, and the potential benefits of health monitoring of the animals. 58
Case Studies:1. Context Analysis (unstructured data) 2. IOT Case Study 3. Yahoo! Predictive Modeling
Yahoo! One of Largest Destinations on the Web80% of the U.S. Internet population uses Yahoo! Over 600 million users per month globally!
Global network of content, commerce, media, search and access products100+ properties including mail, TV, news, shopping, finance, autos, travel, games, movies, health, etc.25+ terabytes of data collected each dayRepresenting 1000s of cataloged consumer behaviors
More people visited Yahoo! in the past month than:
Use couponsVoteRecycleExercise regularlyHave children living at homeWear sunscreen regularlySources: Mediamark Research, Spring 2004 and comScore Media Metrix, February 2005.
Data is used to develop content, consumer, category and campaign insights for our key content partners and large advertisers
60With Yahoos critical mass, its no longer about tiny segments of consumers
Yahoo is in position to drive market share for Crestor
Yahoo! Big Data A league of its own
GRAND CHALLENGE PROBLEMS OF DATA PROCESSING TRAVEL, CREDIT CARD PROCESSING, STOCK EXCHANGE, RETAIL, INTERNETY! Data Challenge Exceeds others by 2 orders of magnitude
61
Behavioral Targeting (BT)
Search
Ad Clicks
Content
Search Clicks
BT
Targeting ads to consumers whose recent behaviors online indicate which product category is relevant to them
62Based on user behaviourAll actions categorised into taxonomiesE.g, searched ipod classified into Ent/Music and/or Technology/Consumer Electronics/Audio/MP3ClassificationEach country has their own personCulturally relevant news stories (e.g. flooding not just weather but UK news at the moment)Each product category needs to be treated differently flowers vs. MortgagesDifferent weights are given to each action across each of the productsRegression models with 50+ variables astro-physicistsReach vs. efficiency, they have been designed to reach the right peopleCurrently 114 available products in the UKMore in February
Male, age 32Lives in SFLawyer
Searched on from London last week
Searched on:Italian restaurantPalo Alto
Checks Yahoo! Mail daily via PC & Phone
Has 25 IM Buddies, Moderates 3 Y! Groups, and hosts a 360 page viewed by 10k peopleSearched on:Hillary Clinton
Clicked on Sony Plasma TV SS ad Registration Campaign Behavior Unknown
Spends 10 hour/week On the internet
Purchased Da Vinci Code from Amazon
Yahoo! User DNA
On a per consumer basis: maintain a behavioral/interests profile and profitability (user value and LTV) metrics
63
How it works | Network + Interests + ModellingAnalyze predictive patterns for purchase cycles in over 100 product categories
In each category, build models to describe behaviour most likely to lead to an ad response (i.e. click).
Score each user for fit with every categorydaily.
Target ads to users who get highest relevance scores in the targeting categories
Varying Product Purchase Cycles
Match Users to the Models
Rewarding Good Behaviour
Identify Most Relevant Users
64At Yahoo! weve developed models that we understand best describe behaviour in various product categories. We dynamically score users as they move around Yahoo! to identify the users most relevant for particular advertising messages. Simple process to understand (but rocket scientists beavering away in the backgroundno, really, one of them worked at NASA).
Recency Matters, So Does Intensity
Active nowand with feeling
65What makes Yahoo!s behavioural targeting model quite sophisticated is our built in measures of recency and intensity. A user has to demonstrate current interest in a product category as well as a measure of intensity of activity in that category for us to determine him as relevant to an advertiser.
Differentiation | Category specific modelling
timeintensity scoremodelled user
alternative behaviour 1alternative behaviour 2
timeintensity scoremodelled user
alternative behaviour 1alternative behaviour 2
Intense Click Zone
Example 1: Category AutomotiveExample 2: Category Travel/Last Minute
Different models allow us to weight and determine intensity and recency
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
Intense Click Zone
66Yahoo!s unique modelling capabilities weight and determine intensity and recency depending on the category. In each of the models, an ideal user is graphed, determining the appropriate threshold (border to the intense click zone). In an alternative behaviour, in this example a user consumes 5 pages, 2 search keywords, 1 search click and 1 ad click, the user still hits the threshold but a bit later in time. In the 2nd example, the same consumption leads to a different curve, meaning, the weight of the consumption differs from model to model, as does the threshold.
Differentiation | Category specific modelling
timeintensity scoremodelled user
alternative behaviour 2
Intense Click ZoneExample 1: Category Automotive
Different models allow us to weight and determine intensity and recency
with no further activity, decay takes effect
Alt Behaviour 1: 5 pages, 2 search keywords, 1 search click, 1 ad click
user is in the Intense Click Zone
alternative behaviour 1
67When the user no longer shows a relevant activity, the decay factor will come into effect, reducing the users score accordingly.
Automobile Purchase Intender ExampleA test ad-campaign with a major Euro automobile manufacturerDesigned a test that served the same ad creative to test and control groups on YahooSuccess metric: performing specific actions on Jaguar website
Test results: 900% conversion lift vs. control groupPurchase Intenders were 9 times more likely to configure a vehicle, request a price quote or locate a dealer than consumers in the control group~3x higher click through rates vs. control group
Mortgage Intender ExampleWe found:1,900,000 people looking for mortgage loans.
+122% CTR LiftMortgages Home Loans Refinancing Ditech
Financing section in Real EstateMortgage Loans area in FinanceReal Estate section in Yellow Pages+626% Conv LiftExample search terms qualified for this target:
Example Yahoo! Pages visited:
Source: Campaign Click thru Rate lift is determined by Yahoo! Internal research. Conversion is the number of qualified leads from clicks over number of impressions served. Audience size represents the audience within this behavioral interest category that has the highest propensity to engage with a brand or product and to click on an offer. Date: March 2006Results from a client campaign on Yahoo! NetworkExample: Mortgages
69As just an example, we ran a test with a mortgage lender (sorry, you know how secretive Financial Services companies can be). Even compared to our already very valuable Yahoo Finance audience we saw a more than doubling of click throughs and a seven-fold increase in conversions.
Experience summary at Yahoo!Dealing with one of the largest data sources (25 Terabyte per day)Behavioral Targeting business was grown from $20M to > $400M in 3 years of investment!Yahoo! Specific? -- BigData critical to operationsAd targeting creates huge valueRight teams to build technology (3 years of recruiting)Search is a BigData problem (but this has moved to mainstream)
Lessons LearnedA lot more data than qualified talentFinding talent in BigData is very difficultRetaining talent in BigData is even harderAt Yahoo! we created central group that drove huge value to companyData people need to feel like they have critical massMakes it easier to attract the right peopleMakes it easier to retainDrive data efforts by business need, not by technology prioritiesChief Data Officer role at Yahoo! now popular
Where does this leave us?
1. What matters in the age of analytics?. Being able to exploit all the data that is availableProliferating analytics throughout the organizationDriving significant business value
2. Where are we today?New Data Landscape via HadoopConfusion by marketers, analysts, and technical communityStruggling with basics of managing data because of the new flexibility
3. What are the real issues?Data management and governanceTalent for Data and Data Science is rare but criticalData and Analytics are specialisms that need management and know-how: not for generalists
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225The early days of mass auto production
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Todays Auto: It just works!
No need to understand what happens when you turn on ignitionVery complex inside, but all simplicity on the outside
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Data Science, AI, Algorithms
Data ScienceArtificial Intelligence / Machine Learning AlgorithmsData as a Service (DaaS)to provide our business units with better, faster, cheaper dataAlgorithms Create leading edge Algorithms for Automation via AI, Machine Learning, and leading edge data scienceAutomation Tech via AICoE for Intelligent Automation via AI, ML, case-based reasoning & PlanningSecure Cloud Data LabSecure Cloud data lab with open source stack to be able to leverage external datasets and work with the start-ups to innovate at scaleExperimentation evaluationData-fuelled evaluation of experimentation at scale, e.g. digital experiences empowered by dataData Solutions for Customers/ Clients Create a Lab for innovative solutions for customers and ClientsKEY ADVANCED CAPABILITIES
FRAUD / CYBERSECURITYReduced false alarmsLess write-offsBetter intrusion detection, cyber attacks, internal financial crimeBUSINESS IMPACT - EXAMPLESRAPID AUTOMATIONLearning algorithms automate and roboticise manual tasks automatically by crowd-sourcingAutomated reporting and reduced manual reconciliationSMARTER TRADING ALGORITHMSSmarter trading algorithms and quantitative analytics enabled by new and more powerful technologyTARGETED MARKETINGLeverage automated learning and classification to customer acquisition, retention, and activationREVENUE DRIVERLeverage social media / unstructured data for Cross-sell, up-sell, next-best-action, CRM, etc.
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Barclays Local Insights: Providing insights about peoplewww.insights.barclays.co.uk
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225
Application Dates:Rising Eagles Graduate Programme: April-June | Explorer Programme: 1 week in July
We recruit on a rolling bases & early applications are advised find out more and apply at joinus.barclays.com/Africa
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Technology opportunities at BarclaysWe seek world class technologists, data scientists, problem solvers, Data systems engineers the team that will reinvent Financial Services
Create game changing new products and services for our Africa businesses across retail, business banking, Card, Corporate/IB, and Wealth/Investment management that do not only generate new experiences and growth but new business models.We are looking for hungry data talent that is looking to move the needle in all our businesses. Yassi HadjibashiChief Data Officer, Barclays Africa - DSIAfrica is a huge growth opportunity for Barclays. If you know your Data Science, We want you! If you love Hadoop & BigData, We love you! And if you want to be part of an industry transformation around Data in Financial Services, come help us make it happen!Dr. Usama FayyadGroup Chief Data Officer, Barclays BankSecurity, Data, and Software as code. Basically everything automated for sophisticated rapid innovation, deployment cycles, and new age software engineering. Come join our journey and drive this with backing of most senior leadership and attention!!!Peter RixChief Technology Officer, Barclays Africa
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Thank you & questions
Usama Fayyad
@Usamaf
joinus.barclays.com/[email protected]
251-219-1290-57-9264-107-133128-156-1740-0-00-174-239128-205-237191-230-246236-138-64150-150-150203-81-81225-225-225Chart325499410050010005000
Terrabytes of Warehoused Data
Sheet1SABRE50VISA120NYSE225Y!7500Amazon25Korea Telecom49AT&T94Y! LiveStor100Y! Panama Warehouse500Walmart1000Y! Main warehouse5000
Sheet12549941005001000
Terrabytes of Warehoused Data
Sheet2
Sheet3
Chart250120225200014000
Millions of Events Processed Per Day
Sheet1SABRE50VISA120NYSE225YSM2000Y! Global14000
Sheet1
Millions of Events Processed Per Day
Sheet2
Sheet3