big data for business - working with the elephant made easy

52
0 Copyright 2014 FUJITSU Human Centric Innovation Human Centric Innovation Fujitsu Forum 2014 19th – 20th November

Upload: fujitsu-global

Post on 30-Jun-2015

413 views

Category:

Technology


0 download

DESCRIPTION

In many Big Data use cases, there is no way around the Open Source software Hadoop when it comes to processing the large data volumes extracted and collected from versatile data sources. However, using Hadoop also raises challenges in respects of many kinds. Writing Hadoop jobs for data collection and processing is a cumbersome task, which requires profound skills which are not available in most organizations. Further questions are how to visualize the outcomes, what to consider if you want to optimally configure a Hadoop solution, and how to shorten time to production. Fujitsu knows the answers, simplifies handling with Hadoop, and enables even business users without IT knowledge to get in touch with Big Data. Speakers: Mr. Dr Fritz Schinkel (Fujitsu)

TRANSCRIPT

Page 1: Big Data for Business - Working with the elephant made easy

0 Copyright 2014 FUJITSU

Human CentricInnovation

Human CentricInnovation

Fujitsu Forum2014

19th – 20th November

Page 2: Big Data for Business - Working with the elephant made easy

1 Copyright 2014 FUJITSU

Big Data for the Business –Working with the Elephant Made Easy

Dr. Fritz SchinkelProgram Manager for Cloud Infrastructures and Big Data Innovations, Fujitsu

Page 3: Big Data for Business - Working with the elephant made easy

2 Copyright 2014 FUJITSU

Data Driven Economy

Page 4: Big Data for Business - Working with the elephant made easy

3 Copyright 2014 FUJITSU

An emerging new world where

people, information, things and

infrastructure are connected via

networks, transforming work and

life everywhere

People Enormous numberof individuals

Information Big Data methods for new value

InfrastructureIndividual end points connected to central compute & storage

Fujitsu’s vision of a Hyperconnected World

Page 5: Big Data for Business - Working with the elephant made easy

4 Copyright 2014 FUJITSU

People: Improve Living and Empower Individuals

?

Is our energy system future proof ?

?

Should we invest in wind energy?

Page 6: Big Data for Business - Working with the elephant made easy

5 Copyright 2014 FUJITSU

Infrastructure: Transfer, Store and Process Data

?

Is our energy system future proof ?

?

Should we invest in wind energy?

Self driving car :

3.6 TB/h#

smart meters for 80% of EU electricity consumer by 2020

#

PBs of data from 100 weather satellites#

more than 50 billion connected things#

Page 7: Big Data for Business - Working with the elephant made easy

6 Copyright 2014 FUJITSU

Information: Create Insights from Collected Data

!Demography

prediction!

Traffic trends

! ! Wind measurement& weather trends

Weather risk assessment

!

Self driving car :

3.6 TB/h#

smart meters for 80% of EU electricity consumer by 2020

#

PBs of data from 100 weather satellites#

more than 50 billion connected things#

?

Is our energy system future proof ?

?

Should we invest in wind energy?

Page 8: Big Data for Business - Working with the elephant made easy

7 Copyright 2014 FUJITSU

Bringing together the 3 dimensions will realize business and social value

Expectations for Big Data Solutions

People & Business EmpowermentConnect people & empower for business ideas based on information

Creative IntelligenceCreate knowledge from information fast enough

Connected InfrastructureConnect everything, store and process collected data timely

Page 9: Big Data for Business - Working with the elephant made easy

8 Copyright 2014 FUJITSU

People and Business Empowerment

Page 10: Big Data for Business - Working with the elephant made easy

9 Copyright 2014 FUJITSU

Start Asking from the Business Side

What is your (new) business approach?

What are you expecting?

What can be earned? (business priority)?

What data do you have / need?

What is the expected total size?

What is your productive platform?Plat

form

Valu

e

How will you consolidate your data?

How do you analyze and discover meaning?

Which analytic methods will you apply?

How can you visualize results effectively?

Tool

s

Did you respect security, privacy, regulations?

Which skills do you have / do you need?

Is your concept flexible enough?

Mis

c.

Page 11: Big Data for Business - Working with the elephant made easy

10 Copyright 2014 FUJITSU

Fujitsu Consulting and Services for Big Data

Big Data Assessment WorkshopUnderstand the opportunities Big Data can bring to your organization through the assessment of your organization’s strategic objectives, processes, and technical assets.

Strategy ConsultingDevelop the comprehensive Strategy Plan and optimal road map needed to efficiently introduce Big Data into your business.

Analytic ServiceFujitsu Big Data Analytics Services assist our customers quickly implement new Big Data analytics workflows through a proven Use Case driven approach

Hadoop ServicePragmatic, efficient and assured services for integrating Hadoop into your business.

Integration ServiceEstablish solution in your environment and connect to IT services.

• Fujitsu Big Data Assessment Workshop• Fujitsu Big Data Strategy Consulting• Fujitsu Big Data Analytics Services

Big Data Consulting Services

Fujitsu Services for Hadoop

Fujitsu Integration Services

Page 12: Big Data for Business - Working with the elephant made easy

11 Copyright 2014 FUJITSU

Analytic Services

Customer Intimacy Operational Efficiency

Risk Management Innovation

Categories

Improve efficiency of processes and reduce cost

Use your data to create new business models, products and services

Improve customer satisfaction and service Increase customer insight

Improve fraud detection, cyber security, and compliance

Adaptable use cases deliver short time-to-value

Page 13: Big Data for Business - Working with the elephant made easy

12 Copyright 2014 FUJITSU

Example: Weather Trend Analysis

Investment decision for wind park

Predict demography, traffic, wind power

ROI optimized by wind park location

Customer history, open weather data

100 TB of data is expected

Data will be processed on HadoopPlat

form

Valu

e

Import customer and weather data

Calculate local trends for wind power

Generate time series per location

Visualize data as map and trend chart

Tool

s

Check compliance for customer data

Basic analytic skills, meteorological skill

Use concept for solar power, insurance …

Mis

c.

Page 14: Big Data for Business - Working with the elephant made easy

13 Copyright 2014 FUJITSU

Connected Infrastructure

Page 15: Big Data for Business - Working with the elephant made easy

14 Copyright 2014 FUJITSU

Data & Information Flow for Big Data

Sensors:Trace of the real world Feedback:

Actions in the real world

Idea:Creating newbusiness value

Outcome:Real business value

Data usage

Information Recommendation Marketing Product optimization Decision Control …

Data Sources

Corporate Data, History Public Data, e.g. weather Internet-Usage Social Networks Smartphone Usage Sensors e.g. in a car Quantified-Self …

Data store

Private data store Online / Nearline /

Archive Public data services Commercial data …

Modeling:Image of parts of the real world

Data analytics

Aggregating / Cleansing Modeling

Data processing

Statistics Correlation Classification Prediction Prescription …

Page 16: Big Data for Business - Working with the elephant made easy

15 Copyright 2014 FUJITSU

Big Data Infrastructure Reference Architecture:Choose Platform according to Business Problem

Consolidated data Distilled essence Applied knowledgeVarious data

Extract, Collect Cleanse, Transform Decide, ActAnalyze, Visualize

Data Sources Analytics Platform Access

Batch processing platform

Event processing platform

Fast response platform

Data bases

Application server

Webcontent

Sensordata

AppsServicesQueries

VisualizationReporting

Notification

Page 17: Big Data for Business - Working with the elephant made easy

16 Copyright 2014 FUJITSU

Example: Weather Trend Analysis –Batch Preparation and Real-time Retrieval

Consolidated data Distilled essence Applied knowledgeVarious data

Extract, Collect Cleanse, Transform Decide, ActAnalyze, Visualize

Data Sources Analytics Platform Access

Batch processing platform

Event processing platform

Fast response platform

Data bases

Application server

Webcontent

Sensordata

AppsServicesQueries

VisualizationReporting

Notification

Import weather history (50.000 GRIB files)

Invert time series of maps to map of time series (1.000.000 files)

Fast retrieval of time series and visualization

ERA interim data

Page 18: Big Data for Business - Working with the elephant made easy

17 Copyright 2014 FUJITSU

Platform: PRIMEFLEX for Hadoop

Software stack Hadoop core: Map Reduce / HDFS Streaming and In-memory technologies Analytic framework

Hadoop platform sourcing options On-premise: Entry or Rack option Off-premise: Cloud offering Storage – or Compute intensive workloads

Service and Consulting Integration Service Tool supported sizing Hadoop and Analytic Services

Entry Rack Cloud

Big Data Management

Analytics

Analytic Services

Integration Service andSizing

Page 19: Big Data for Business - Working with the elephant made easy

18 Copyright 2014 FUJITSU

Iterative Big Data AnalyticsClassical Business Analytics

Manage Risk, Gain ValueIn

vest

/ Re

turn

time

ETL1

analysis1

operate1

har

dwar

e 1

Inve

st /

Retu

rn

time

value1 value1

value2

value3

value4

value5

HW

1

ETL&

anal

ysis

1

oper

ate 1

value2

Incremental investments and agile iterations leverage steep part of value curve.

Page 20: Big Data for Business - Working with the elephant made easy

19 Copyright 2014 FUJITSU

Creative Intelligence

Page 21: Big Data for Business - Working with the elephant made easy

20 Copyright 2014 FUJITSU

To Be Implemented: Big Data Value Chain

Big Data

ExtractCollect

Structured & unstructured data

Devices,sensors,

Internet of Things

CleanseTransformAnalyze

FindDecideAct

Research & development, science

Operation, automation,

production

Interactive reporting, advertising

Structured approach in three steps.

Social media, open data, linked data

Page 22: Big Data for Business - Working with the elephant made easy

21 Copyright 2014 FUJITSU

Implementation of Big Data Analytics

To be considered

Problem characteristic

Performance: Size / Runtime

Available skills

Implementation alternatives

Optimal Control

Complex Questions

Iterative Analysis

Find the right method for your business

Page 23: Big Data for Business - Working with the elephant made easy

22 Copyright 2014 FUJITSU

Optimal Control: Map Reduce Programming

Method: Program explicit map / reduce functions Characteristic

• Structured / unstructured data

• Parallel tasks on input data

Performance

• Fits to any size of cluster

• Best resource utilization

Skills

• Problem translation to Map / Reduce model

• Programming Java or script

Use case examples Relations, similarities, patterns in large data sets (e.g. clickstream)

Sort and split data along given criteria (e.g. transaction lists)

Invert table wrt. certain column (e.g. web index)

Process data on independent chunks (e.g. voice to text)

Page 24: Big Data for Business - Working with the elephant made easy

23 Copyright 2014 FUJITSU

Example: Time Series Transformation

Problem:

Invert 20.000 weather maps with 1 million grid points to 1 million time series with 20.000 entries

Visualize location based results

Solution:

Dedicated map reduce job on Hadoop

Visualization based on d3 graphics package

Realization:

Development map / reduce: 4 days

Development web GUI: 5 days

Execution: 8 node cluster, 2h

HDFSMap reduce

transfer datato HDFS (flume)

transfer datato webserver (nfs)

Visualize data(Javascript)

Program and execute map reduce (Java)

Page 25: Big Data for Business - Working with the elephant made easy

24 Copyright 2014 FUJITSU

Complex Questions: SQL Hive, Impala

Method: Descriptive SQL queries Characteristic

• Structured data

• Medium to complex dependencies

Performance

• Highest volumes for batch-like execution

• Medium volume for dialog execution

Skills

• Problem description in SQL syntax (e.g. Hive or Impala)

• Business knowledge, mathematics, statistics

Use case examples Find column correlation (e.g. pricing strategy)

Compute statistics and derivate values (e.g. averages, median, variance)

Join data from several sources (e.g. transaction data with sentiment data)

Ad-hoc queries in trial phase (e.g. hypothesis verification)

Page 26: Big Data for Business - Working with the elephant made easy

25 Copyright 2014 FUJITSU

Example: Temperature Weekday Dependency I

Problem:

Does local average temperature depend on weekday?

Approve or disprove hypothesis

Solution:

Run ad-hoc query on Impala database

Do simple visualization in Excel

Realization:

Development of SQL query: 0.5 Day

Visualization in Excel: 2 h

Execution: 8 node cluster, 30min

HDFSimpala

Import datato HDFS (impala)

DownloadData to PC

Visualize data(Excel)

Specify query(Impala SQL)

Page 27: Big Data for Business - Working with the elephant made easy

26 Copyright 2014 FUJITSU

Iterative Analytics: Big Data Spreadsheet

Method: Spreadsheet for Big Data Characteristic

• Structured / unstructured data

• Complex and unknown dependencies

Performance

• Highest volumes for batch-like execution

• In-Memory execution for smaller problems

Skills

• Select functions and compose formulas

• Business knowledge, mathematics, statistics

Use case examples Find hidden dependency patterns (e.g. credit fraud behavior)

Learn multi variant dependencies (e.g. decision trees)

Compute statistics and derivate values (e.g. averages, median, variance)

Join sources from multiple sources (e.g. weather data, traffic, sentiment)

Page 28: Big Data for Business - Working with the elephant made easy

27 Copyright 2014 FUJITSU

Example: Temperature Weekday Dependency II

Problem: Calculate local average temperature on weekdays

Visualize locations with strong variance (suspect for local warming)

Solution: Use Datameer calculation of averages per weekday

Visualize results using integrated Infographics

Visualize hot spot by web interface d3 graphics package

Realization: Development of Workbook: 2h

Visualization via Infographics: 2h

Development web GUI: 5 Days

Execution: 8 node cluster, 3h

Hadoop

Import

Dat

amee

r

Write & runWorkbook

Infographic

Page 29: Big Data for Business - Working with the elephant made easy

28 Copyright 2014 FUJITSU

Fujitsu’s PRIMEFLEX for Hadoop at a Glance

Complexity made easy: Get in touch with Big Data, see what is possible.

Consult & implementConsulting and service program from strategy to implementation

Collect VisualizeUnderstand

Choice of analytics for highest control or highest comfort

Store & ComputeIntegrated and optimally sized on-premise or off-premise infrastructure

Page 30: Big Data for Business - Working with the elephant made easy

29 Copyright 2014 FUJITSU

Page 31: Big Data for Business - Working with the elephant made easy

30 Copyright 2014 FUJITSU

Showcase

Page 32: Big Data for Business - Working with the elephant made easy

31 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (1)

Selectconnection and

file typefor import

Page 33: Big Data for Business - Working with the elephant made easy

32 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (2)

Configureimport

Page 34: Big Data for Business - Working with the elephant made easy

33 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (3)

Select and modify imported fields

Page 35: Big Data for Business - Working with the elephant made easy

34 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (4)

Define execution

plan …

… save and start

Page 36: Big Data for Business - Working with the elephant made easy

35 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (5)

Import is executed on the complete

cluster asynchronously as

planned

Page 37: Big Data for Business - Working with the elephant made easy

36 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (6)

Create new workbook and add imported data

Page 38: Big Data for Business - Working with the elephant made easy

37 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (7)

Create new tab and start analytics

Page 39: Big Data for Business - Working with the elephant made easy

38 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (8)

Specify formulas and see results (on representative sample data) immediately

When all is complete, save workbook and press “run”

Page 40: Big Data for Business - Working with the elephant made easy

39 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (9)

Create new infographic …

When all is complete, save workbook and press “run”

… drag new widgets into your graphic …

… and bind it to data …

Page 41: Big Data for Business - Working with the elephant made easy

40 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (10)

Configure your widgets step by step

Page 42: Big Data for Business - Working with the elephant made easy

41 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (11)

Get complete page automatically published

Locations with most significant span between warmest and coldest weekday average

as map and as list

Number of grid points with maximum / minimum temperature on certain weekday

Locations with most significant span between warmest and coldest weekday average

and warmest day on a certain weekday

Page 43: Big Data for Business - Working with the elephant made easy

42 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (12)

Visualization GUI to study the span of weekday mean

temperature in certain places and to look for possible reasons

Map colored for high span of weekday mean temperature

Page 44: Big Data for Business - Working with the elephant made easy

43 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (13)

Sliders for span threshold,

contrast and opacity

of coloring.

Page 45: Big Data for Business - Working with the elephant made easy

44 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (14)

And an adjustment for grid points with low temperature

span over the complete observation time.

Page 46: Big Data for Business - Working with the elephant made easy

45 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (15)

Using the color settings and the zooming into the map

we can find areas with significant differences of

weekday mean values in the observed timeframe

Page 47: Big Data for Business - Working with the elephant made easy

46 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (16)

Click to a certain position shows the curve of average

temperature for the weekdays,

the coordinates and the total min/max temperature

of the point

Page 48: Big Data for Business - Working with the elephant made easy

47 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (17)

Map and satellite can be used to find possible

reasons for mean temperature related to

weekdays.

Page 49: Big Data for Business - Working with the elephant made easy

48 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (18)

Zoom into the source of the color cloud.

Industrial complex isshut down on Sunday?

Page 50: Big Data for Business - Working with the elephant made easy

49 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (19)

US east cost is cooler on Sunday / Monday.

Is traffic system heating

the atmosphere over the week?

Page 51: Big Data for Business - Working with the elephant made easy

50 Copyright 2014 FUJITSU

Fujitsu Showcase: Weekdays and Weather (20)

South of Hudson Bay is an area with Wednesday

mean temperature approx. 1C higher than on Saturday

Does wood industry influence the temperature

in the rhythm of the week?

Page 52: Big Data for Business - Working with the elephant made easy

51 Copyright 2014 FUJITSU