from big to smart data - smart data innovation lab overview

34
© 2015 IBM Corporation Smart Data Innovation Lab: From Big Data to Smart Data Session: DPA-2135 Jan Erik Sundermann Karlsruhe Institute of Technology Plamen Kiradjiev IBM Germany October 28, 2015

Upload: plamen-kiradjiev

Post on 16-Apr-2017

681 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: From Big to Smart Data - Smart Data Innovation Lab Overview

© 2015 IBM Corporation

Smart Data Innovation Lab: From Big Data to Smart DataSession: DPA-2135Jan Erik Sundermann

Karlsruhe Institute of Technology

Plamen KiradjievIBM Germany

October 28, 2015

Page 2: From Big to Smart Data - Smart Data Innovation Lab Overview

• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in acontrolled environment. The actual throughput or performance that any user will experience will varydepending upon many factors, including considerations such as the amount of multiprogramming in

theuser’s job stream, the I/O configuration, the storage configuration, and the workload processed.Therefore, no assurance can be given that an individual user will achieve results similar to those

statedhere.

Please Note:

2

Page 3: From Big to Smart Data - Smart Data Innovation Lab Overview

Biographies

Jan Erik Sundermann is research associate at Karlsruhe Institute of Technology’s Steinbuch Centre for Computing. He is part of the team responsible for planning, deployment and operation of the SDIL computing platform. Jan Erik has expertise in the field of scientific computing, distributed computing and data analysis which he gained during his PhD studies and as a postdoctoral researcher in the field of experimental particle physics participating in experiments at SLAC and CERN.

3

Jan Erik SundermannResearch AssociateKarlsruhe Institute of TechnologySteinbuch Centre for Computing

Plamen Kiradjiev is Executive Architect at IBM leading a TechSales team focused on Industrie 4.0 , delivering IT solutions for machine constructors and OEMs, as well as partnering with automation providers and integrators. He has 20 years experience in the IT business – software architectures, business development, pilot implementations. As the IBM Ambassador for SDIL, Plamen represents IBM as one of the Core Partners in the SDIL initiative.

Plamen KiradjievExecutive ArchitectIndustrie 4.0 Core Tech Team LeadIBM Ambassador @SDIL

Page 4: From Big to Smart Data - Smart Data Innovation Lab Overview

Agenda

4

2

Smart Data Innovation Lab – Why & What

KIT SCC and its role in SDIL

1

3 IBM‘s contribution: Watson Foundation on POWER

4 First projects and experiences

Page 5: From Big to Smart Data - Smart Data Innovation Lab Overview

Steinbuch Centre for Computing

Smart Data Innovation Lab (SDIL):A joint research platform for Big Data

• Smart Data Innovation = generate knowledge from data• SDIL: research platform from science and industry• Aim: joint generation of added value in innovative application fields

With new algorithms and methods On the basis of securely handled data In the framework of well-defined projects

Supported by

Page 6: From Big to Smart Data - Smart Data Innovation Lab Overview

SDIL: WIN-WIN-WIN between1. Industry:

Lower threshold to experiment with Big Data analytics

Access to cutting-edge research and technology

Leverage Smart Data for tangible business advantage

2. Research: Proof concepts against real use

cases and data Using a powerful cutting-edge

technology3. IT providers:

Showcase latest technology Test and improve products for

real use cases and workload

6

http://www.sdil.de

German government initiative for boosting Big Data use in top level research for

four business areas

Page 7: From Big to Smart Data - Smart Data Innovation Lab Overview

SDIL PartnersCore Partners

Research Partners

Project Partners

Associations

Page 8: From Big to Smart Data - Smart Data Innovation Lab Overview

SDIL’s Basic Principle

8

Page 9: From Big to Smart Data - Smart Data Innovation Lab Overview

Data Protection and Privacy – SDIL’s Top Priority

• Any data processing takes place in compliance with German data protection rules and regulations.

• All data available at the KIT can be saved in highly secure format and cannot be accessed by third parties without access control. Leading-edge state-of-the-art security technology is used here.

• Industry data sources are only accessible if such access was expressly granted by the data provider in advance.

• Results from processing data from different data providers and whose authorship cannot be clearly established are not saved within the platform as a matter of principle.

9

Page 10: From Big to Smart Data - Smart Data Innovation Lab Overview

SDIL Platform at SCC

10

Page 11: From Big to Smart Data - Smart Data Innovation Lab Overview

Agenda

11

2

Smart Data Innovation Lab – Why & What

KIT SCC and its role in SDIL

1

3 IBM‘s contribution: Watson Foundation on POWER

4 First projects and experiences

Page 12: From Big to Smart Data - Smart Data Innovation Lab Overview

Karlsruhe Institute of Technology (KIT)One of the largest and most prestigious research and education institutions in Germany

12

Page 13: From Big to Smart Data - Smart Data Innovation Lab Overview

Steinbuch Centre for Computing

KIT – Facts and Figures

* Budget 2013

24 778 Students 9 491 Employees

355 Professors6 035 Scientists

~3 200 PhD students

844M € Budget*270M € Federal funds216M € State funds358M € 3rd party funds

129 Invention disclosures52 Patent applications

25 Spin-offs2.2M € Income from KIT

licenses

Page 14: From Big to Smart Data - Smart Data Innovation Lab Overview

Steinbuch Centre for Computing

Steinbuch Centre for Computing (SCC)• Founded on January 1st, 2008

Merger of the Computing Centers of former Karlsruhe University (URZ) and Research Center Karlsruhe (IWR)

• Karl Steinbuch Professor at Karlsruhe University, creator of the term “Informatik”, co-

founder of the first German faculty of informatics

• Two locations at KIT Campus South and North• 189 people in total (as of 1.9.2015)

60% scientists, 40% technicians, administrative personnel, trainees 7 departments and 4 research groups

• Board of directors Prof. Dr. Hannes Hartenstein Prof. Dr. Bernhard Neumair Prof. Dr. Achim Streit

Page 15: From Big to Smart Data - Smart Data Innovation Lab Overview

Steinbuch Centre for Computing

Who are we?

What do we do?

Which demands do we satisfy?

“Services for Science – Science for Services”

Institute in KIT withservice tasks

Computational Science & Engineering (CSE)

Data-Intensive Science (DIS) For users in KIT, BaWü,

Germany and international

Research, education and innovation in Supercomputing, Big Data and secure IT-federations

Operation of large scale research facilities

Operation of basic IT services

Page 16: From Big to Smart Data - Smart Data Innovation Lab Overview

Enabling Data-Intensive Science (DIS)• Operation of GridKa

German Tier-1 in WLCG for aninternational community

• Operation of the Large-Scale Data Facility Multi-disciplinary data centre for climate research,

systems biology, energy research, etc. in BaWü

• Joint R&D&I with scientific communities Generic data management research Data Life Cycle Labs in Helmholtz Programm SBD

• Innovation driver for SMEs,big industry und start-ups

• Active role in national and international projects & initiatives

Page 17: From Big to Smart Data - Smart Data Innovation Lab Overview

Agenda

17

2

Smart Data Innovation Lab – Why & What

KIT SCC and its role in SDIL

1

3 IBM‘s contribution: Watson Foundation on POWER

4 First projects and experiences

Page 18: From Big to Smart Data - Smart Data Innovation Lab Overview

IBM Watson

Foundations

SoftwareEnterprise-grade Big Data

Model-based Predictive Analytics

Semantic Text Analysis

Cognitive Computing

18

IBM’s Watson Foudation POWER cluster

260 disks with >300 TB space

7 nodes140 cores2.800 virtual systems

40 GB/s network switch

4 TB RAM

Page 19: From Big to Smart Data - Smart Data Innovation Lab Overview

Core Watson Foundation Technology for SDIL

19

WATSON FOUNDATIONS

Sales Marketing Finance Operations HRRisk ITFraud

IBM Watson™ and Industry Solutions

SOLUTIONS

CONSULTING AND IMPLEMENTATION SERVICES

BIG DATA & ANALYTICS INFRASTRUCTURE

DecisionManagement

Planning &Forecasting

Discovery &Exploration

Business Intelligence & Predictive Analytics

ContentAnalytics

Information Integration & Governance

Data Mgmt & Warehouse

HadoopSystem

StreamComputing

ContentManagement

WATSON FOUNDATIONS

Sales Marketing Finance Operations HRRisk ITFraud

IBM Watson™ and Industry Solutions

SOLUTIONS

CONSULTING AND IMPLEMENTATION SERVICES

BIG DATA & ANALYTICS INFRASTRUCTURE

DecisionManagement

Planning &Forecasting

Discovery &Exploration

Business Intelligence & Predictive AnalyticsBusiness Intelligence & Predictive Analytics

ContentAnalytics

Information Integration & Governance

Data Mgmt & Warehouse

HadoopSystem

StreamComputing

ContentManagement

Page 20: From Big to Smart Data - Smart Data Innovation Lab Overview

Watson Cluster Architecture Overview

20

Page 21: From Big to Smart Data - Smart Data Innovation Lab Overview

Watson Foundation Bootcamp in January 2015: 84 participants trained in SPSS and BigInsights in 2 days

21

Page 22: From Big to Smart Data - Smart Data Innovation Lab Overview

Agenda

22

2

Smart Data Innovation Lab – Why & What

KIT SCC and its role in SDIL

1

3 IBM‘s contribution: Watson Foundation on POWER

4 First projects and experiences

Page 23: From Big to Smart Data - Smart Data Innovation Lab Overview

SDIL Projects Overview

23

Industrie 4.0

Energy

Medicine

Smart Cities

• Condition-Based Maintenance (done)

• Industrial Log Analysis (running)• SmartFactoryKL Predictive

Maintenance (running)

• User Sensitive HMI (planned)

• Optimal Nesting (planned)

• Data Mining for Welding (planned)

• Paintshop Machine Learning (planned)

• Smart Brain Data Analytics (running)

• Machine learning for age-related

macular degeneration treatment

(running)

• Spinal cord injury analytics

(approved)

• Decentralized Energy Markets Demonstrator (done)

• Predictive Analytics for Energy Management (planned)

• Disaster Management Demonsrator (done)

• Smart Grid (planned)

Page 24: From Big to Smart Data - Smart Data Innovation Lab Overview

Smart Brain Analytics: Use Case

24

1. Human Brain Project (HBP) A human brain frozen at -80oC Cut into 70μm thin slides Take image of the brain after

each extracted slide Segment the sectional planes

to build 3D model of the brain Use data analysis to replace

manual segmentation 843 Brain slides 1350×1950 pixels each image 6.6 GByte RGB images 42 MByte mask images Up to 2PB with extremely high

resolution image scanners

Page 25: From Big to Smart Data - Smart Data Innovation Lab Overview

Project Brain: Methodology & Realization

25

Page 26: From Big to Smart Data - Smart Data Innovation Lab Overview

Project Brain: Training & Testing Strategy

26

Page 27: From Big to Smart Data - Smart Data Innovation Lab Overview

Project Brain: Preliminary Results

Manually marked brain slice

SPSS-determined brain slice

(98.8% accuracy)

27

Source

SPSS-model with feature extraction (99.53% accuracy)

Page 28: From Big to Smart Data - Smart Data Innovation Lab Overview

Industrial Log File Analysis

Association Analysis for Data-Driven Services Based on Industrial Logs•Challenge: existing solutions for analyzing industrial log files recordings (e.g. alarms, machine logs, error messages, user interactions) are restricted:

They focus on isolated problem analysis and optimization They are not able to cover complex functions like revealing of hidden

correlations respectively prediction of events Work on relatively small data sets without parallelization and scalability

•Vision: Using the potential of a holistic analysis of industrial log files with the following goals:

Derive and evaluate appropriate analytical methods Choose parallelization and scalable strategies for data pruning and

features extraction Explore real-time and deployment options

28

Page 29: From Big to Smart Data - Smart Data Innovation Lab Overview

Roles Profile Sensitive HMI• Analyze user-machine-interaction to predict and provide an

optimized HMI assistance

29

Challenge: Anonymous and unknown users Billions of interaction options depended from production orders Production orders normally never will repeatedVision: (Self-) Optimized user-machine-interface for every machine operator Increase productivity: avoid problems caused by operating issues

Page 30: From Big to Smart Data - Smart Data Innovation Lab Overview

Top 10 Best Practices & Lessons Learned

10. One common demand: faster route from research to field

9. Consider the pipeline from internal data sources to SDIL, e.g. data cleansing and pseudonymization

8. Sensitive person-related data is not the only reason for restrictive access rules

7. Data privacy & confidentiality – not a technical, but a bureaucratic challenge

6. Opportunity to rehearse processes for external data use in the cloud

5. Objective “Yes, but we have to do something…” is not appropriate

4. Accuracy is relative: sometimes 60% is great, but 99,2% - not enough

3. Algorithms on real data do not perform the same as on probes

2. Fruitful cooperation between business, IT and research experts

1. Information, not data, is the gold of 21. century, but… all that glitters is not gold

30

Page 31: From Big to Smart Data - Smart Data Innovation Lab Overview

We Value Your Feedback!

Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it

to continually improve the conference.

Access your surveys at insight2015survey.com to quickly submit your surveys from your smartphone, laptop or

conference kiosk.

31

Page 32: From Big to Smart Data - Smart Data Innovation Lab Overview

32

Notices and DisclaimersCopyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

Page 33: From Big to Smart Data - Smart Data Innovation Lab Overview

33

Notices and Disclaimers (con’t)

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

Page 34: From Big to Smart Data - Smart Data Innovation Lab Overview

© 2015 IBM Corporation

Thank You