overview of analytics and big data in practice

28
Overview of Big Data and Analytics in practice Vivek Murugesan

Upload: vivek-murugesan

Post on 20-Feb-2017

203 views

Category:

Technology


1 download

TRANSCRIPT

Overview of Big Data and Analytics in practice

Vivek Murugesan

Contents● What is Big Data?● What is Analytics?● Why do companies bother?● Why now?● Industries & Use cases● Why should I bother?

Big data is about these 4 Vs

Analytics

Is the process of iterative, methodical exploration of an organization’s data with emphasis on statistical Analysis. To enable data-driven decision making.

Why do companies bother…?

Why now?● Storage has become cheaper● Availability of infrastructure at

cloud● Open source● Data Science and Machine

learning moving beyond research

Data everywhere in every domain

❖ Web - content, link structure, clicks❖ Retail - customer details, point of sale, inventory❖ Medical - literature, patient history, drug details …❖ Financial - stocks, currencies, financial news, commodities❖ Insurance - customer history, claim details …❖ Telecom - call detail records, customer history & profile …❖ Banking - customer transactions, profile …❖ Travel & Hospitality - travel itinerary, schedule …

Industries● Medical, Healthcare and Life

Sciences● Automobile and Manufacturing● Travel and Hospitality● Retail and Ecommerce● Web, Social Media and Digital

Media● Telecommunication ● Banking, Finance and Insurance● Energy● Sports, Media and Entertainment● Niche areas like autonomous

driving, image video processing, etc,.

Medical, Healthcare and Life Sciences● Cancer research with pattern recognition on

cells● Clinical trials with millions of compositions for

drugs● Prediction of diseases with tests and

probabilistic studies ex: Diabetes and Down syndrome prediction

● Collection and storage of test results like scan reports, blood test reports, etc,.

● Image processing, text processing and complex pattern recognition analysis etc,.

● Analyzing literatures and patents to find out cure for diseases

Automobile and manufacturing● One of the frontrunners of adopting big data and

analytics even before the cloud computing (during the cluster computing days)

● Analyzing vast amount,○ Customer feedbacks○ Inventory data○ Repair and life of parts report○ Competitive information○ Market research data

● To come out with best design that will sustain long time in the market

● Some of these analysis could run for months together● Design arrived at will be tested under simulation

environment

Travel and Hospitality● Revenue management was one of the

technique that resurrected the airline industry that was close to its death during early 90’s

● Similar techniques are used with hospitality industry as well with increasing number of hotels and the kind of competitive market it has became

● Growing number of Online portals shows the amount of competition in this industry

● Data generated and consumed in this industry really huge

Retail and Ecommerce● Inventory tracking across franchises● Relationship between inventory overrun and

discounts● Recommending right products in subseconds to

close the purchase lifecycle of the customers appropriately

● Imagine the scaling problems faced by online retailers like Amazon, Flipkart, etc,. With millions of products and millions customers to handle

● The capability to handle the price elasticity in the market

● Example use case of Best buy vs Amazon

Web, Social media and Digital media● With the amount of tweets and posts that twitter and

facebook handle it is daunting task for them to be notifying the right set of people

● The kind of job recommendation and PYMK does by Linkedin is a really hard problem to handle at that scale

● Advertisement industry in the digital media has a really complicated ecosystem,

○ With so many publishers, agencies and advertisements

○ To satisfy so many parameters like number of impressions, CTR, conversion, etc,.

● Such a complicated ecosystem is handling online bidding at micro seconds to choose the advertisement to show for each page

Telecommunication● More than 16 players in India running under

a very tight margin in call rate● For them to get revenue they have to

squeeze out interest through every single customer,

○ By targeting them with right offer and promotion at right time

○ They operate at micro segments of size 1000s out of their 160 Million customers

● Huge number mobile subscribers moving all over and making lot of calls

● All these generate a lot of data in the form CDRs, etc,.

● And all of these needs to be processed, stored, analyzed and archived appropriately

Banking, Finance and Insurance● Banks run lot of promotions in the form of sending emails,

sms, etc,. To its customers● They get profit for every single conversion out of these

campaigns● Imaging how hard it is to choose the right set of customers

to target with right set offers to maximize the revenues out of these campaigns

● People who work in finance industry like stock market etc,. Has a large volume of data in wide variety of forms to consume to mine for any meaningful insights to come out with right strategy for investment

● Processing claims and detection of frauds is really hard problem to solve at scale

● Insurance firms have started utilizing sophisticated techniques like text processing on the claim statements to detect frauds

Energy● Amount of image processing in

analyzing satellite images to locate the point of energy source is humongous

● Any small amount of precision of error can also introduce a huge loss

● Hence the results need to be optimized with huge number of iterations to minimize the error

Sports, Media and Entertainment● Football clubs and IPL franchises have started

modeling the players to arrive at optimal strategy to play with

● For example NZ cricket team at some point of time started utilizing the systems to an extent to automate the team selection

● Media and Entertainment needs to be up to date with social media to compete with them and against their peers

Online advertising industry

Showing the interactions in all directions and the companies playing in the space

List of Techniques ● Statistical testing, models (regression, forecasting etc,.)● Machine learning (pattern recognition, classification, clustering, segmentation,

etc,.)● Application Simulation and Optimization (Revenue management, Supply

chain management, set covering, network problems etc,.)● Recommendation (personalized, non-personalized, association rule mining,

etc,.)● Text analytics, Image processing

Few more domains● Education, Academics, E-learning● Networking - Security, Adaptive routing● Niche areas - Autonomous driving, Reinforcement learning, etc,.● Multimedia - Audio and Video analytics

Why Should I bother…?

Why should I bother…?

● Industry growing rapidly● More organizations adopting● Technology trends● Skill gap and projection● Skills getting obsolete

Industry 4.0

Cyber-Physical Systems (CPS) are integrations of computation, networking, and physical processes. Embedded computers and networks monitor and control the physical processes, with feedback loops where physical processes affect computations and vice versa.

Questions...Contact: [email protected]

Linkedin: https://in.linkedin.com/in/vivek-murugesan-aa183416