from big to smart data - smart data innovation lab overview
TRANSCRIPT
© 2015 IBM Corporation
Smart Data Innovation Lab: From Big Data to Smart DataSession: DPA-2135Jan Erik Sundermann
Karlsruhe Institute of Technology
Plamen KiradjievIBM Germany
October 28, 2015
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in acontrolled environment. The actual throughput or performance that any user will experience will varydepending upon many factors, including considerations such as the amount of multiprogramming in
theuser’s job stream, the I/O configuration, the storage configuration, and the workload processed.Therefore, no assurance can be given that an individual user will achieve results similar to those
statedhere.
Please Note:
2
Biographies
Jan Erik Sundermann is research associate at Karlsruhe Institute of Technology’s Steinbuch Centre for Computing. He is part of the team responsible for planning, deployment and operation of the SDIL computing platform. Jan Erik has expertise in the field of scientific computing, distributed computing and data analysis which he gained during his PhD studies and as a postdoctoral researcher in the field of experimental particle physics participating in experiments at SLAC and CERN.
3
Jan Erik SundermannResearch AssociateKarlsruhe Institute of TechnologySteinbuch Centre for Computing
Plamen Kiradjiev is Executive Architect at IBM leading a TechSales team focused on Industrie 4.0 , delivering IT solutions for machine constructors and OEMs, as well as partnering with automation providers and integrators. He has 20 years experience in the IT business – software architectures, business development, pilot implementations. As the IBM Ambassador for SDIL, Plamen represents IBM as one of the Core Partners in the SDIL initiative.
Plamen KiradjievExecutive ArchitectIndustrie 4.0 Core Tech Team LeadIBM Ambassador @SDIL
Agenda
4
2
Smart Data Innovation Lab – Why & What
KIT SCC and its role in SDIL
1
3 IBM‘s contribution: Watson Foundation on POWER
4 First projects and experiences
Steinbuch Centre for Computing
Smart Data Innovation Lab (SDIL):A joint research platform for Big Data
• Smart Data Innovation = generate knowledge from data• SDIL: research platform from science and industry• Aim: joint generation of added value in innovative application fields
With new algorithms and methods On the basis of securely handled data In the framework of well-defined projects
Supported by
SDIL: WIN-WIN-WIN between1. Industry:
Lower threshold to experiment with Big Data analytics
Access to cutting-edge research and technology
Leverage Smart Data for tangible business advantage
2. Research: Proof concepts against real use
cases and data Using a powerful cutting-edge
technology3. IT providers:
Showcase latest technology Test and improve products for
real use cases and workload
6
http://www.sdil.de
German government initiative for boosting Big Data use in top level research for
four business areas
SDIL PartnersCore Partners
Research Partners
Project Partners
Associations
SDIL’s Basic Principle
8
Data Protection and Privacy – SDIL’s Top Priority
• Any data processing takes place in compliance with German data protection rules and regulations.
• All data available at the KIT can be saved in highly secure format and cannot be accessed by third parties without access control. Leading-edge state-of-the-art security technology is used here.
• Industry data sources are only accessible if such access was expressly granted by the data provider in advance.
• Results from processing data from different data providers and whose authorship cannot be clearly established are not saved within the platform as a matter of principle.
9
SDIL Platform at SCC
10
Agenda
11
2
Smart Data Innovation Lab – Why & What
KIT SCC and its role in SDIL
1
3 IBM‘s contribution: Watson Foundation on POWER
4 First projects and experiences
Karlsruhe Institute of Technology (KIT)One of the largest and most prestigious research and education institutions in Germany
12
Steinbuch Centre for Computing
KIT – Facts and Figures
* Budget 2013
24 778 Students 9 491 Employees
355 Professors6 035 Scientists
~3 200 PhD students
844M € Budget*270M € Federal funds216M € State funds358M € 3rd party funds
129 Invention disclosures52 Patent applications
25 Spin-offs2.2M € Income from KIT
licenses
Steinbuch Centre for Computing
Steinbuch Centre for Computing (SCC)• Founded on January 1st, 2008
Merger of the Computing Centers of former Karlsruhe University (URZ) and Research Center Karlsruhe (IWR)
• Karl Steinbuch Professor at Karlsruhe University, creator of the term “Informatik”, co-
founder of the first German faculty of informatics
• Two locations at KIT Campus South and North• 189 people in total (as of 1.9.2015)
60% scientists, 40% technicians, administrative personnel, trainees 7 departments and 4 research groups
• Board of directors Prof. Dr. Hannes Hartenstein Prof. Dr. Bernhard Neumair Prof. Dr. Achim Streit
Steinbuch Centre for Computing
Who are we?
What do we do?
Which demands do we satisfy?
“Services for Science – Science for Services”
Institute in KIT withservice tasks
Computational Science & Engineering (CSE)
Data-Intensive Science (DIS) For users in KIT, BaWü,
Germany and international
Research, education and innovation in Supercomputing, Big Data and secure IT-federations
Operation of large scale research facilities
Operation of basic IT services
Enabling Data-Intensive Science (DIS)• Operation of GridKa
German Tier-1 in WLCG for aninternational community
• Operation of the Large-Scale Data Facility Multi-disciplinary data centre for climate research,
systems biology, energy research, etc. in BaWü
• Joint R&D&I with scientific communities Generic data management research Data Life Cycle Labs in Helmholtz Programm SBD
• Innovation driver for SMEs,big industry und start-ups
• Active role in national and international projects & initiatives
Agenda
17
2
Smart Data Innovation Lab – Why & What
KIT SCC and its role in SDIL
1
3 IBM‘s contribution: Watson Foundation on POWER
4 First projects and experiences
IBM Watson
Foundations
SoftwareEnterprise-grade Big Data
Model-based Predictive Analytics
Semantic Text Analysis
Cognitive Computing
18
IBM’s Watson Foudation POWER cluster
260 disks with >300 TB space
7 nodes140 cores2.800 virtual systems
40 GB/s network switch
4 TB RAM
Core Watson Foundation Technology for SDIL
19
WATSON FOUNDATIONS
Sales Marketing Finance Operations HRRisk ITFraud
IBM Watson™ and Industry Solutions
SOLUTIONS
CONSULTING AND IMPLEMENTATION SERVICES
BIG DATA & ANALYTICS INFRASTRUCTURE
DecisionManagement
Planning &Forecasting
Discovery &Exploration
Business Intelligence & Predictive Analytics
ContentAnalytics
Information Integration & Governance
Data Mgmt & Warehouse
HadoopSystem
StreamComputing
ContentManagement
WATSON FOUNDATIONS
Sales Marketing Finance Operations HRRisk ITFraud
IBM Watson™ and Industry Solutions
SOLUTIONS
CONSULTING AND IMPLEMENTATION SERVICES
BIG DATA & ANALYTICS INFRASTRUCTURE
DecisionManagement
Planning &Forecasting
Discovery &Exploration
Business Intelligence & Predictive AnalyticsBusiness Intelligence & Predictive Analytics
ContentAnalytics
Information Integration & Governance
Data Mgmt & Warehouse
HadoopSystem
StreamComputing
ContentManagement
Watson Cluster Architecture Overview
20
Watson Foundation Bootcamp in January 2015: 84 participants trained in SPSS and BigInsights in 2 days
21
Agenda
22
2
Smart Data Innovation Lab – Why & What
KIT SCC and its role in SDIL
1
3 IBM‘s contribution: Watson Foundation on POWER
4 First projects and experiences
SDIL Projects Overview
23
Industrie 4.0
Energy
Medicine
Smart Cities
• Condition-Based Maintenance (done)
• Industrial Log Analysis (running)• SmartFactoryKL Predictive
Maintenance (running)
• User Sensitive HMI (planned)
• Optimal Nesting (planned)
• Data Mining for Welding (planned)
• Paintshop Machine Learning (planned)
• Smart Brain Data Analytics (running)
• Machine learning for age-related
macular degeneration treatment
(running)
• Spinal cord injury analytics
(approved)
• Decentralized Energy Markets Demonstrator (done)
• Predictive Analytics for Energy Management (planned)
• Disaster Management Demonsrator (done)
• Smart Grid (planned)
Smart Brain Analytics: Use Case
24
1. Human Brain Project (HBP) A human brain frozen at -80oC Cut into 70μm thin slides Take image of the brain after
each extracted slide Segment the sectional planes
to build 3D model of the brain Use data analysis to replace
manual segmentation 843 Brain slides 1350×1950 pixels each image 6.6 GByte RGB images 42 MByte mask images Up to 2PB with extremely high
resolution image scanners
Project Brain: Methodology & Realization
25
Project Brain: Training & Testing Strategy
26
Project Brain: Preliminary Results
Manually marked brain slice
SPSS-determined brain slice
(98.8% accuracy)
27
Source
SPSS-model with feature extraction (99.53% accuracy)
Industrial Log File Analysis
Association Analysis for Data-Driven Services Based on Industrial Logs•Challenge: existing solutions for analyzing industrial log files recordings (e.g. alarms, machine logs, error messages, user interactions) are restricted:
They focus on isolated problem analysis and optimization They are not able to cover complex functions like revealing of hidden
correlations respectively prediction of events Work on relatively small data sets without parallelization and scalability
•Vision: Using the potential of a holistic analysis of industrial log files with the following goals:
Derive and evaluate appropriate analytical methods Choose parallelization and scalable strategies for data pruning and
features extraction Explore real-time and deployment options
28
Roles Profile Sensitive HMI• Analyze user-machine-interaction to predict and provide an
optimized HMI assistance
29
Challenge: Anonymous and unknown users Billions of interaction options depended from production orders Production orders normally never will repeatedVision: (Self-) Optimized user-machine-interface for every machine operator Increase productivity: avoid problems caused by operating issues
Top 10 Best Practices & Lessons Learned
10. One common demand: faster route from research to field
9. Consider the pipeline from internal data sources to SDIL, e.g. data cleansing and pseudonymization
8. Sensitive person-related data is not the only reason for restrictive access rules
7. Data privacy & confidentiality – not a technical, but a bureaucratic challenge
6. Opportunity to rehearse processes for external data use in the cloud
5. Objective “Yes, but we have to do something…” is not appropriate
4. Accuracy is relative: sometimes 60% is great, but 99,2% - not enough
3. Algorithms on real data do not perform the same as on probes
2. Fruitful cooperation between business, IT and research experts
1. Information, not data, is the gold of 21. century, but… all that glitters is not gold
30
We Value Your Feedback!
Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it
to continually improve the conference.
Access your surveys at insight2015survey.com to quickly submit your surveys from your smartphone, laptop or
conference kiosk.
31
32
Notices and DisclaimersCopyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
33
Notices and Disclaimers (con’t)
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
•IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
© 2015 IBM Corporation
Thank You