big data road map
TRANSCRIPT
WDABT 2016 – BHARATHIAR UNIVERSITY
Dr.V.BhuvaneswariAssistant Professor
Department of Computer ApplicationsBharathiar University
[email protected], [email protected]
visit at www.budca.in/faculty.php
BIG DATA ROADMAP
3
Big Data RoadmapTimeline – Big Data PredictionsData Growth in UnitsData LandscapeData ExplosionBig Data MythsBig Data 5Vs of Big Data Why Big DataData as Data Science
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
4
Timeline – Big Data Predictions1944- Yale Library in 2040 will have “approximately
200,000,000 Volumes1961- Scientific Journals will grow exponentially
rather than linearly, doubling every fifteen years and increasing by a factor of ten during every half-century.
1975- Ministry of Posts and Telecommunications in Japan introduced words as unifying unit of measurement
1997- First article published by Michael Cox and David Ellsworth in in the ACM digital library to the term “Big data.”
Big Data evolved in 1997 and exploded to greater heights in 2010 and become popular in 2012Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
5
Data Growth – in Units
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
6
Data Landscape
7
BIG DATA FACTSEvery 2 days we create as much
information as we did from the beginning of time until 2003
Over 90% of all the data in the world was created in the past 2 years.
It is expected that by 2020 the amount of digital information in existence will have grown from 3.2 zettabytes today to 40 zettabytes.
Every minute we send 204 million emails, generate 1.8 million Facebook likes, send 278 thousand Tweets, and up-load 200,000 photos to FacebookDr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Big Data Explosion
12+ TBs of tweet data
every day
25+ TBs of
log data every day
? TB
s of
data
ev
ery
day
2+ billion people on the
Web by end
2011
30 billion RFID tags today
(1.3B in 2005)
4.6 billion camera phones
world wide
100s of million
s of GPS
enabled
devices sold
annually
76 million smart meters
in 2009… 200M by 2014
Data Deluge
Big Data Market Size
11
Potential Talent Pool -Big Data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
India will require a minimum of 1 lakh data scientists in the next couple of years in addition to data analysts and data managers to support the Big Data space.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
12
BIG DATA MYTHSBig Data • New• Only About Massive Data Volume• Means Hadoop• Need A Data Warehouse• Means Unstructured Data• for Social Media & Sentiment Analysis
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
13
Lets Us Clarify
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
14
Big DataBig Data isA complete subject with tools,
techniques and frameworks.Technology which deals with large and
complex dataset which are varied in data format and structures, does not fit into the memory.
Not about huge volume of data; provide an opportunity to find new insight into the existing data and guidelines to capture and analyze future data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
15
Big Data : A DefinitionBig data is the realization of
greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies
:Source: Harness the Power of Big Data: The IBM Big Data Platform
16
BIG DATA as Platform
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar UniversitySource: IBM
17
4 V‘s of Big Data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
18
5Vs of Big DataVolumeVelocityVarietyVeracityValue
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
19
Why Big Data ?
Big Data ExplorationFind, visualize, understand all big data to improve decision making
Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources
Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time
Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency
Operations AnalysisAnalyze a variety of machinedata for improved business results
The 5 Key Big Data Use Cases
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University2
0
21Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
22
Data Science "Data Science" was used by
statisticians and economist in early 1970 and defined by Peter Naur in 1974.
Data Science” has gained popularity in the last couple of years because of the massive data deposits
Usage of Big Data technology to explore data used in large corporates, government and industries made the term data science catchy.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
23
Data Science as Discipline Data Science has emerged as a new discipline to
provide deep insight on the large volume of data.
Data Science is fusion of major disciplines like Computational Algorithms, Statistics and Visualization
90% of the world’s data has been created in the last two years which includes 10% of structured data and 80% of unstructured data
The digital universe is in data deluge and estimated to be larger than the physical universe and data unit measurement is predicted as Geopbytes
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
24
25
Data Growth in Bytes
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
26
Data Classification◦Open Data◦Closed Data◦Hot Data◦Warm Data◦Cold Data◦Thin Data◦Thick Data
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
27
Data Analytics – Need for todayData considered as digital asset
similar to other property. The organizations believe data
generated by them will provide deep insights to understand their business process for arriving strategic decisions.
The earlier limitation of computational storage and processing is overcome by the technologies of cloud computing and big data techniques.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
28
Data Science Components
Pre-Processing -
ETL
Dash BoardsChartsPie, BarHistogram
Data ModelsLinear Regression, Decision Tree, Dimensionality Reduction
ClusteringOutlier AnalysisAssociation Analysis
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
29
Data Science - Big Data TechnologyCollect, Load, Transform
◦ETL SCRIBE, FLUMEStore
◦HADOOP, SPARK, STORMProcess, Analyze and Reasoning
◦Computational Algorithms,◦Statistical Methods and Models
R, PIG, HIVE, PHYTON, JAVA, SCALA, CLOJURE, MAHOUT
Visualization ◦DASHBOARD, APP
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
30
Data Science Vs Data Analytics Data Science is a discipline which
groups techniques and methods from various domains to study about data and data analytics is a component in Data Science.
Data Analytics is a process of analyzing the dataset to find deep insights of data using computational algorithms and statistical methods. There exists no common procedure to analyze all datasets
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
31
Data Analytics Vs Big Data Analytics
Data Analytics is used to explore and analyze datasets using statistical methods and models.
Big Data Analytics is used to analyze data with the characteristics of Volume, Velocity and Variety by integrating statistics, mathematics, computational algorithms in Big data Platform.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
32
Data Science – Emerging RolesData Scientist is responsible for scrubbing data to
bring out deep insights of dataSkills : Expert in CS, Mathematics, Statistics
Work on open ended research problems Data Engineer is responsible for managing and
administering the infrastructure and storage of data.
Skills : Strong skills in Programming and Software Engineering Deep Knowledge in Data warehousing Expertise in Hadoop, NOSQL and SQL technologies
Data Analyst is one who views the data from one source and has deep insight on the data based on the organization guidance. Skills : Competency Skills in understanding of Statistics
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
33
Data Analytics Use Case Scenario
34
Data Science ApplicationsData Personalization - Logs, Tweets, LikesSmart Pricing – Air TransportationFinancial Services – Fraud Detection
InsuranceSmart Grids – Energy Management
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
35
Air Fare Management – Use case 1Objectives: Hike airfare based on High Value
Customers - CRM.Strategic decision requires Understanding of
data insights How customers are divided?Which customer is high value customer?Who is Frequent flyer?How to retain customers?
Data sources :Conventional Enterprise informationData from weblogs, social media, competitors pricing
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
36
Data Engineering Airfare Classification (Economy, Business,First)Analyse factors (Enterprise Datasources) – Data
Exploration techniquesPassenger Booking informationForecasted data - StatisticsInventory
Customers Behavioral data - Predictive Analytics – Statistical models – Decision tree, classification
Information has to be gained from websites thatprovide route information, dining, preferable
locationsHolistic Analytics
Analyzing customer data from Social profiles, sales, CRM etc.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
37
Complexities and ChallengesData is larger than terabytesData integration
Variety data formatsSolutionBig data Accelerators
Hadoop ecosystemAnalytic componentsIntegrated data warehouses
Source: Big data spectrum InfosysDr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
38
Insurance Fraud Detection – Use case ScenarioData Engineering
Verifying customer dataCustomer Profile analysisVerification of claims raisedFraud detection from disparate systemsExact claim reimbursement
Data Sources Data about customer, product sold from ERP,
CRM Credit history from other sources Data from social networking – Customer profiles,
product rating, credit rating from 3rd parties
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
39
Health EpidemicsData Engineering
Kind of epidemics and target usersCauses and effects with respect to locationsEnvironmental and other related issues of epidemicsData on Awareness
Data Sources EHR records, Medical Insurance claims, Socialmedia – awareness, ERP Systems
Data AnalyticsDescriptive Analytics
Predictive Analytics ( Model based analysis)
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
40
Big Data ChallengesPrivacy Protection
All Big data stages collect, store, process, knowledge
Integration with enterprise landscapeAll systems store data in rdbms,DWDoes not support bulk loading to Big data storeLimited number of analytics from MahoutBig data technologies lack visualization support and deliverable methods
Leveraging cloud computing for big data applications Addressing Real time needs with varied
format and volume Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
41
PART B : Big Data Use Cases – Scenario
42
Big Data Applications
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
43
Big Data Applications - IndiaBig Data – ElectionsSBI uses big data mining to check
defaultsKarnataka Govt – Identify water
leakage
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
44
Big Data - Election Mined data from every Internet user in the
country, to accurately understand voter sentiments and local issues.
Data-based analysis was used to raise funds and create different models for different regions targeting on local issues.
India involve more than 800 million voters with different ideologies and expectations.
Innovative usage of Big Data marked a huge change in the way elections were fought traditionally.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
45
Data AnalyticsModac Analytics built electroal data.Processing huge volumes of
unstructured data (around 10TB of PDF documents), and also structured data.
Modak chose Hadoop, and self-built a 64-node cluster that had 128TB of storage. Apart from Hadoop, the team used PostgreSQL as the front-end database.
They have developed Rapid ETL to overcome the difficulties into hadoop.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
46
SBIState Bank of India (SBI) ran its newly
acquired data-mining software recently to check for purity of data.
Made an interesting find - close to one crore accountholders have not provided any nomination for their savings accounts. What is worse, over half of them are senior citizens.
To analyse trends in Banks, SBI has hired a whole team of statisticians and economists.
Identify default patterns, high value customers.
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
47
QUERIES?
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University
48
THANK YOU
Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Computer Applications, Bhararthiar University