managing the lifecycle in big data environments - … optim classic, - zos, - sap, ibm infosphere...

38
© 2013 IBM Corporation Information Management Managing the Lifecycle in Big Data Environments Wolfgang Epting – Senior Technical Sales Professional IBM Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery

Upload: ngohuong

Post on 06-Apr-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation

Information Management

Managing the Lifecycle in Big Data Environments

Wolfgang Epting – Senior Technical Sales Professional

IBM Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery

Page 2: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation2

Managing the lifecycle in Big Data Environments

Transactional & Collaborative Applications

Business Analytics Applications

External Information Sources

Information – sicher und zuverlässig …während der gesamten Lieferkette !

Trusted ���� Relevant ���� Governed

Analyze

Integrate

Manage Cubes

Streams

Big Data

Master Data

Content

Data

StreamingInformation

Information Governance

Data Warehouses

ContentAnalytics

Govern

Quality Security & Privacy

Lifecycle Standards

Page 3: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation3

Managing the lifecycle in Big Data Environments

Die Information Governance Community erarbeitet Definitionen, Vorgehensweise und konkrete Arbeitsergebnisse

2012

2011

2010

2009

2008

2007

2006

2005

2004

IBM gründet mit ca. 40

weiteren Unternehmen

das „Information Governance Council“

Information Governance

Framework

Information Governance Maturity

Model

Vorschläge für die

Risikoprävention vor dem

Hintergrund der Finanzkrise

Kontinuierliche

Weiterentwicklung von

Best Practices und

Maturity Modell

Information

Governance im

Zeitalter von „Big

Data“

Initial

Wiederholbar

Definiert

Kontrolliert

OptimiertReifegrad –Modell

http://www.infogovcommunity.com

Information Governance ist …

die Orchestrierung von Personen,

Prozessen und Technologien, die

eine Organisation befähigt, Information wie ein Wirtschaftsgut zu nutzen.

Information GovernacneFramework

Kerndisziplinen

Hilfsdisziplinen

Ziele

Treiber

Page 4: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation4

Managing the lifecycle in Big Data Environments

Information Governance

Information Governance

ist die Festlegung von Entscheidungsrechten und Rechenschaftsrahmen für ein wünschenswertes Verhalten in der Bewertung, Erstellung, Speicherung,

Verwendung, Archivierung und Vernichtung von Informationen.

Es umfasst Prozesse, Rollen, Normen und Metriken,

die eine effektive und effiziente Verwendung von Informationen

gewährleistet und es einer Organisation ermoglicht, ihre Ziele zu erreichen

Page 5: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation5

Managing the lifecycle in Big Data Environments

Viele Organisationen haben in den vergangenen Jahren mit der Implementierung von Information Governance - Konzepten begonnen

Trend 1:Ernennung von Information

Governance Verantwortlichen

In Social Networking Tools wie LinkedIn oder XING findet man tausende Personen mit „Information Governance“ oder „Data Governance“ in ihrem Titel. Es gibt einen anhaltenden Trend für eine 100% Zuständigkeit für Information Governance Themen für diese Personen.

Trend 2:Die unternehmerische

Verantwortung für Information Governance wächst

Information Governance wird zunehmend als Stelle wahrgenommen, die Regeln rund um Daten erarbeitet. RiskManagement in Banken, Verkaufsförderung im Handel, und Marketing und Buchhaltung: alle sind beteiligt oder betroffen von Information Governance.

Trend 3:Kontinuierlich verbessert

Messbarkeit der Information Governance

Metriken unterstützen die Fokussierung auf Information Governance Themen: “You only do what you can measure”

Page 6: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation6

Managing the lifecycle in Big Data Environments

Big Data – Neue Datenwelten erschliessen

4.6 Mrd.Mobiltelefoneweltweit

1.3 Mrd. RFID tags in 2005

30 Mrd RFID today

2 Mrd. Internet

Anwender by 2011

Twitter verarbeitettäglich

7 terabytes

Facebook verarbeitettäglich

10 terabytes

World Data Centre for Climate� 220 Terabytes of Web data� 9 Petabytes of additional data

Wachstum des Datenvolumens an

Kapitalmärkten

1.750%, 2003-06

Page 7: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation7

Managing the lifecycle in Big Data Environments

• Clickstream Daten• Twitter Feeds• Facebook Postings• Web content• …

Was sind eigentlich „Big Data“?- Der Versuch einer Klassifizierung -

Web and Social Media

• Smart Meter Daten• RFID Informationen• GPS Signale• Sensordaten (z.B. Durchflußmengen,

Druck, Temperaturen)• …

Maschinen Daten

• Gesichtserkennung• Genetische Daten• …

Biometrische Daten

• Telekommunikationsverbindungsdaten• Energieabrechungsdaten• …

„Big Transaction“ Data

• Aufzeichnung von Call-CenterGesprächen

• E-Mails• Schriftliche Dokumente (Arztberichte,

Reklamationsberichte etc.)• …

Menschlich erzeugt

Page 8: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation8

Managing the lifecycle in Big Data Environments

Ein Beispiel ...

Page 9: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation9

Managing the lifecycle in Big Data Environments

Unpräzise und

unzuverlässige Datentypen,

deren Entstehung sich der

Kontrolle entzieht.

VeracityDaten –

Verlässlichkeit

Terabytes bis PetabytesVolumeDatenmengen

Analyse von Datenströmen

für Entscheidungen in

Sekundenbruchteilen

VelocityGeschwindigkeit

Strukturiert, unstrukturiert,

Text, MultimediaVarietyVariabilität

“Big Data”: Vielfältige, schnell wachsende und unkontrollierte Daten

Die derzeitige Herausforderung besteht darin, „Big Data“ sinnvoll in unternehmerischen Entscheidungsprozessen einzusetzen

Page 10: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation10

Managing the lifecycle in Big Data Environments

10

Zettabytesof data in

databases

“We have for the first time an economy based on a key resource [Information] that is

not only renewable, but self-generating. Running out of it is not a problem, but

drowning in it is.”

– John Naisbitt

Variety

Volume

Velocity

Page 11: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation11

Managing the lifecycle in Big Data Environments

Big Data macht nicht Alles Neu, sondern ergänzt bekannteKonzepte Traditioneller Ansatz

Strukturiert, analytisch, logisch

Neuer Ansatzkreativ, ganzheitlich, intuitiv

StructuredRepeatable

Linear

Monthly sales reportsProfitability analysis

Customer surveys

Internal App Data

Data

Warehouse

Traditional

Sources

Strukturiert

Wiederholbar

Linear

Transaction Data

ERP data

Mainframe Data

OLTP System Data

UnstructuredExploratory

Iterative

Brand sentiment

Product strategy

Maximum asset utilization

Hadoop

Streams

New

Sources

Unstrukturiert

Erforschend

Iterativ

Web Logs

Social Data

Text Data: emails

Sensor data: images

RFID

Enterprise

Integration

11

Page 12: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation12

Managing the lifecycle in Big Data Environments

InfoSphere Delivers Critical Confidence for Big Data Use Cases

Big Data Exploration

Enhanced 360o Viewof the Customer

Operations Analysis Data Warehouse Augmentation

Security/IntelligenceExtension

� Understand confidence� Determine risk

� Establish master record

� Extent to all sources

� Automatic data protection� Mask sensitive

information

� High volume data

integration� Automatic data protection

� High volume data integration� Agile big data archiving and retrieval

Page 13: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation13

Managing the lifecycle in Big Data Environments

IIG Is Essential─Ingest, Understand and Govern

Big Data Platform Capabilities

• Information Ingestion

• Real-time Analytics

• Warehouse & Data

Marts

• Analytic Appliances

All Data Sources

Advanced Analytics Applications

CognitiveLearn Dynamically?

PrescriptiveBest Outcomes?

PredictiveWhat Could Happen?

DescriptiveWhat Has Happened?

Exploration and DiscoveryWhat Do You Have?

Streaming Data

Text Data

Applications Data

Time Series

Geo Spatial

Relational

Social Network

Video & Image

Automated Process

Case Management

Analytic Applications

Watson

Cloud Services

ISV Solutions

Alerts

Open Architecture/Multiple Product Entry

Points

Information Ingestion

and Integration

Data Exploration

Archive

Real-timeAnalytics

Information Governance, Security and

Business Continuity

Information Governance, Security and

Business Continuity

Data Exploration

Enterprise Warehouse

Data Marts

Page 14: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation14

Managing the lifecycle in Big Data Environments

Extremes Datenwachstum stellt eine zunehmendeHerausforderung für Unternehmen dar

Steigende Kosten Steigende

AntwortzeitenRisiko und Compliance

Die "keep everything" Strategiehat negativen Einfluss auf Disaster Recovery sowieAubewahrungs- und Offenlegungspflichen

Immer mehr Speicher kannteuer werden, wenn man nichtnur die Investitionen, sondernauch die Betriebskostenbetrachtet

Develop &Test

Discover&Define

Consolidate &RetireOptimize &

Archive

Information Governance Core DisciplinesLifecycle Management

Endbenutzer und Kunden warten auf Informationen, DBA‘s benötigen viel Zeit, um Performance Probleme zu lösen

Page 15: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation15

Managing the lifecycle in Big Data Environments

… die einfachste Antwort

Hardware Kapazität

Laufzeitverhalten

Date

nb

an

kG

rösse

� Partitionierung von

Datenbanken

� Komprimierung

� Mehr Speicher und CPU Infrastruktur

Develop &Test

Discover&Define

Consolidate &RetireOptimize, Archive

& Access

Information Governance Core DisciplinesLifecycle Management

Page 16: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation16

Managing the lifecycle in Big Data Environments

… die Optim Idee : Proaktive Kontrolle des Datenwachstums

CurrentCurrent

Production

HistoricalHistorical

ArchiveArchive

RetrieveRetrieveRestored DataRestored Data

Universal Access to Application DataUniversal Access to Application Data

Data Archives

Historical DataHistorical Data

Reference DataReference Data

SelektivesZurückspielen

ODBC / JDBC XML Report WriterApplication

Intelligenter Prozess, um inaktive oder nur noch selten

benötigte Daten, die weiterhin Geschäftsrelevanz besitzen,

zu archivieren und einen universellen Zugriff zu ermöglichen

Develop &Test

Discover&Define

Consolidate &RetireOptimize &

Archive

Information Governance Core DisciplinesLifecycle Management

Page 17: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation17

Managing the lifecycle in Big Data Environments

11/14/12

Cost effective computing

17

Update

Access Reporting

Access Ad-Hoc

Access

Page 18: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation18

Managing the lifecycle in Big Data Environments

Archivieren von Daten in Hadoop

�Apply Retention / Hold Policies

�Capture complete business object

�Preserve Data Integrity

�Preserve Schema Metadata

�Load data into Hadoop as needed

�Apply Retention / Hold Policies

�Capture complete business object

�Preserve Data Integrity

�Preserve Schema Metadata

�Load data into Hadoop as needed

Archive Cold Data

Query-able & analytical

data store, using

Hadoop

Query-able & analytical

data store, using

Hadoop

Archive & Purge Data

from heterogeneous

DBMS

Archive & Purge Data

from heterogeneous

DBMS

InfoSphere Optim

Compressed,

immutable, auditable

& restorable archives

Compressed,

immutable, auditable

& restorable archives

Database

IMS

VSAM

More…

Archive filesHadoop

Optim Hadoop Loader

Page 19: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation19

Managing the lifecycle in Big Data Environments

Data Warehouse Augmentation: Queryable Archive

Use Cases

� Immediate storage alternative of cold data

� Cost savings for cold data

� Compliance requirements

� Simple analytics / exploration

� When you find new correlations, go back and

re-mine the archive data to gain additional

insight

Enables an immediate storage alternative. Queryable Archive often serves and

initial step to more advanced integration with their EDW and advanced Hadoop

analytics.

PureDataSystem for Analytics

PureDataSystem for Hadoop

Page 20: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation20

Managing the lifecycle in Big Data Environments

� Included applicationallows migration of data

from PureData System for

Analytics to PureData

System for Hadoop at over

2TB/hr, out-of-the-box

� Provides simple, built-in user interface to allow

users to migrate data

between systems easily

� Enables quick configuration and scheduling of data migration

� Employs parallel processing between BigInsights and PDA/Netezza

� Leverages IBM-developed MapReduce programming for parallel processing

� Utilized Hive to allow for immediate access to migrated data

Optim EasyArchive for PureData System for HadoopFor Easy Data Provisioning from PureData System for Analytics

Page 21: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation21

Managing the lifecycle in Big Data EnvironmentsArchiving for PureData System for Analytics (Netezza)Manage data growth, Lower TCO & Meet data retention compliance

�Apply Data Retention / Hold Policies

�Capture complete business object

�Preserve Data Integrity

�Preserve Schema Metadata

�SQL Compliant

�Restore data when/where needed

�Apply Data Retention / Hold Policies

�Capture complete business object

�Preserve Data Integrity

�Preserve Schema Metadata

�SQL Compliant

�Restore data when/where needed

Archive Cold Data

Query-able data store

of choice for Analytics

on Cold Data

Query-able data store

of choice for Analytics

on Cold DataArchive & Purge DataArchive & Purge Data

InfoSphere Optim

Compressed, secure,

immutable, query-able

& restorable archives

Compressed, secure,

immutable, query-able

& restorable archives

Pu

reD

ata

Sys

tem

for

An

alyt

ics

Pu

reD

ata

Sys

tem

for

Had

oo

p

InfoSphere BigInsights

Archive files

Optim Hadoop Loader

Page 22: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation22

Managing the lifecycle in Big Data Environments

Wann abschalten? – wann konsolidieren?Develop &

TestOptimize &

ArchiveDiscover&

Define Consolidate &Retire

Information Governance Core DisciplinesLifecycle Management

In nahezu ALLEN Fälen, muss der Zugriff auf die Daten während der

gesetzlich vorgeschriebenen Aufbewahrungsfrist gewährleistet werden

� Redundante Applikationen durch Firmenübernamen

und -zusammenschlüsse

� Veraltete Technologie ist nicht mehr kompatibel mit

der strategischen IT Ausrichtung

- Datenbanken, Betriebssysteme und Hardware

sind nicht mehr im Support

� Anwendungs-Know How und technische Skills sind

nicht mehr verfügbar

� “Do more with less” - Kostendruck

� Geschäftsbereiche werden verlagert –

Anwendungen werden obsolet

Page 23: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation23

Managing the lifecycle in Big Data Environments

OracleOracle DB2DB2 AdabasAdabas IMSIMSOracleOracle DB2DB2 AdabasAdabas IMSIMS

ArchivierungArchivierung

Der Application Retirement Prozess

Komplettes Business Objekt

Jahr 2006

Jahr 2005

Jahr 2004

Jahr ….

Verfall

Aufb

ew

ahru

nb

g

Speicher

KatalogKatalog

ProfileProfile

PolicyPolicyZugriffZugriff

<XML>

<Name>John</Name>

<Zip>08540</Zip>

</XML>

<XML>

<Name>John</Name>

<Zip>08540</Zip>

</XML>

Wie

derh

erst

ellu

ng

auf A

nfor

deru

ng

StilllegungStilllegung

Page 24: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation24

Managing the lifecycle in Big Data Environments

Gartner key findings

Source: Gartner, Inc., “Market Trends: World, Database Archiving Market Continues Rapid Growth, 2011”, S. Childs & A Dayley, September 2011

Vendor Market Share by 2010 Revenue

Vendor Market Share by

Total Number of Customers

We believe the market for

database archiving and application

retirement is vibrant and dynamic,

and will see continued solid growth

over the next five years.

Organizations are looking to database

archiving vendors that offer packaged and

custom application support in order to

control storage growth, improve application

performance, and support compliance,

audit and e-discovery activities.

“ “

Page 25: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation25

Managing the lifecycle in Big Data Environments

http://ibmexperts.computerwoche.de/analytics-big-data/artikel/management-reagiert-zu-langsam-auf-datenmissbrauch?r=4626308161045983&lid=208618

Page 26: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation26

Managing the lifecycle in Big Data Environments

IBM InfoSphere Optim Data Masking Solution

� Vermeidung von Missbrauchund Verstössen

� Schnelle Time-to-Market durchBeschleunigung der Tests

� Reduktion von Risiko und manuellem Aufwand

� Schutz vertraulicher Daten in Test-, Entwicklungs- und Schulungssystemen

� Konsistente Maskierung und Konsolidierung von DatenunterschiedlichsterzusammenhängenderAnwendungen zurSicherstellungproduktionsnaher Tests

� Anwendung von vordefiniertenund individuellenAnonymisierungsalgorithmen

Anforderungen

Benefits

Maskieren sensitiver Informationen mit realistischen -

jedoch fiktionalen - Daten für Test- und Entwicklungszwecke

Data Privacy

Optim Data Masking unterstützt verteilte Plattformen (LUW) und z/OS.

Unterstützung der wichtigsten ERP/CRM Anwendungen

OtherOther

Page 27: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation27

Managing the lifecycle in Big Data Environments

Datenanonymisierung in nicht-produktivenUmgebungen (Entwicklung, Test, Schulung)

� Maskieren oder anonymisieren von sensitiven Daten, die

auf eine Einzelperson schließen lassen

� Sicherstellen, dass maskierte Daten dem Kontext der ersetzten

Daten entsprechen, um die Testqualität nicht zu beeinflussen

• Realistische und dennoch fiktive Daten

• Maskierte Daten innnerhalb der erlaubten Limits

� Unterstützung von referentieller Integritat der maskierten Daten

zur Vermeidung von Fehlern beim Testen

Informationen, die Rück-

schlüsse auf Einzelper-

sonen erlauben, werden

für Test und Entwicklung

mit realistischen, aber

fiktiven Daten ersetztJASON MICHAELSJASON MICHAELS ROBERT SMITHROBERT SMITH

PCI DSS Compliance

Page 28: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation28

Managing the lifecycle in Big Data Environments

Auffinden von versteckten sensitiven Daten

� Sensitive Daten lassen sich durch einfache Suchen meist nicht lückenlos auffinden– Tabellen und Lookup Tabellen müssen miteinander verknüpft werden– In längeren Feldern versteckt (Substring) oder über mehere Felder gespeichert

(Concatenations)– Unterschiedliche Darstellung (Lookup Tabellen und Fallunterscheidungen)

� “Corporate memory” ist mangelhaft und weist Lücken auf– Unvollständige Dokumentation– Spezialisten kennen meist nur ein oder zwei Systeme

� Hunderte von Tabellen mit Millionen von Zeilen:– Komplex– Schwer zu verifizieren

� Mangelnde Datenqualität verstärkt das Problem

13:52:49555 908 121210-28-2008

TimePhoneDate

Table A

Transaction Number

Table B

1352555908121210282008

InfoSphereInfoSphereDiscoveryDiscovery

Secure &Protect

Monitor & AuditUnderstand &

Define

Information Governance Core DisciplinesSecurity and Privacy

Page 29: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation29

Managing the lifecycle in Big Data Environments

For Web Logs, Clickstream Analysis

User IDs, Birth Date

Page 30: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation30

Managing the lifecycle in Big Data Environments

For XML Data references

<?xml version="1.0" encoding="utf-8"?>

<customers>

<customer>

<!-- All Valid and Present -->

<first_name>Bobby</first_name>

<middle_initial>J</middle_initial>

<last_name>Fudge</last_name>

<address>

< street>100 Fifth Avenue</street>

<city>New York</city>

<state>NY</state>

<zip>10014</zip>

</address>

<ccn>5411116857029116</ccn>

<telephone>1-609-156-5648

</telephone>

<email_address> [email protected]

</email_address>

</customer>

</customers>© 2012 IBM Corporation

Before XML Document After XML Document

<?xml version="1.0" encoding="utf-8"?>

<customers>

<customer>

<!-- All Valid and Present -->

<first_name>Bobby</first_name>

<middle_initial>J</middle_initial>

<last_name>Fudge</last_name>

<address>

<street>100 Fifth Avenue</street>

<city>New York</city>

<state>NY</state>

<zip>10014</zip>

</address>

<ccn>5411110000000017</ccn>

<telephone>1-609-321-7654

</telephone>

<email_address> [email protected]

</email_address>

</customer>

</customers>

Page 31: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation31

Managing the lifecycle in Big Data Environments

For Data in NoSQL, Internet Commerce{ name : "Matt Kalan",

title : ["Account Manager", "Solutions Architect"], phone : "+1 347 688-5694", location : "New York, NY",email : "[email protected]", web : ["mongodb.com", "Mongodb.org"], linkedin : ["mkalan", "Mongodb"] twitter : ["@MatthewKalan", "@MongoDB", "@MongoDBInc"], facebook : ["MongoDB", "MongoDB, Inc."] }

}

{ name : "Matt Kalan",title : ["Account Manager", "Solutions Architect"], phone : "+1 347 654-1234", location : "New York, NY",email : “[email protected]", web : ["mongodb.com", "Mongodb.org"], linkedin : ["mkalan", "Mongodb"] twitter : ["@MatthewKalan", "@MongoDB", "@MongoDBInc"], facebook : ["MongoDB", "MongoDB, Inc."] }

}

Page 32: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation32

Managing the lifecycle in Big Data Environments

For Call Data Records, Mobile Apps Phone numbers, Call history

IMEI

Page 33: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation33

Managing the lifecycle in Big Data Environments

For Medical / Healthcare / Hospital Patient Care Masking unstructured information [documents/text]

Date: August 29, 2008

Patient Name: Arthur Brown

Date of Birth: April 10, 1957

Social Security Number: 078-05-1121

Ref No. MR 2335/324

Insurance Provider Aetna

Background: Mr. Arthur Brown was admitted to Sioux General Hospital at

05:15 AM on 15 August 2008, transferred from Brookdale Psychiatric Hospital

after a fall as a result of a left-side

weakness.

Date: August 29, 2008

Patient Name: Arthur Brown

Date of Birth: April 10, 1957

Social Security Number: 078-05-1121

Ref No. MR 2335/324

Insurance Provider Aetna

Background: Mr. Arthur Brown was admitted to Sioux General Hospital at

05:15 AM on 15 August 2008, transferred from Brookdale Psychiatric Hospital

after a fall as a result of a left-side

weakness.

Date: April 12, 2007

Patient Name: John Smith

Date of Birth: June 05, 1962

Social Security Number: 035-01-1271

Ref No. MR 2335/324

Insurance Provider Aetna

Background: Mr. John Smith was

admitted to Sioux General Hospital at

05:15 AM on 12 April 2001, transferred

from Brookdale Psychiatric Hospital after a fall as a result of a left-side

weakness.

Date: April 12, 2007

Patient Name: John Smith

Date of Birth: June 05, 1962

Social Security Number: 035-01-1271

Ref No. MR 2335/324

Insurance Provider Aetna

Background: Mr. John Smith was

admitted to Sioux General Hospital at

05:15 AM on 12 April 2001, transferred

from Brookdale Psychiatric Hospital after a fall as a result of a left-side

weakness.

Mask

�Leverage Masking on Demand

�Conversion from image format to text for consumption by Hadoop

�Option to redact or mask

Page 34: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation34

Managing the lifecycle in Big Data Environments

Masking & Data Redaction integration for unstructured data

Full NameStreet Address

Phone Number

Account Number

Text is fragmentized

OCR and text extraction are performed

A document copy is generated with the sensitive data removed

Sensitive Information is identified

Sensitive data is replaced with redacted or masked sections

Page 35: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation35

Managing the lifecycle in Big Data Environments

For Text Logs, Mobile Apps or Customer Service Experience

Agent: “Mr Smith, let me verify the phone number associated with your account?”Customer: “408-555-1212”Agent: “Thank you. Let’s discuss the problem you are having with your iPhone 5 and the battery issue”…

Agent: “[NAME], let me verify the phone number associated with your account?”Customer: “[PHONE]”Agent: “Thank you. Let’s discuss the problem you are having with your iPhone 5 and the battery issue”…

� Ability to parse unstructured, structure and

semi-structured content:

� Voice to Text Logs

� Agent Notes

� Text Chats

� Social media feeds

Page 36: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation36

Managing the lifecycle in Big Data Environments

Information Governance ist auch – oder gerade WEGEN – Big Data eine wichtige Grundlage von Information Management

� Die Disziplinen einer Information Governance sind auch im Zeitalter von Big Data relevant

� Big Data liegen innerhalb und außerhalb des eigenen Unternehmens

� Eine wichtige Aufgabe besteht darin, die etablierten Informationen mit Big Dataanzureichern

� IBM InfoSphere Optim ist die zentrale Plattform, die Ihnen dabei hilft, Ihre Informationen von der Entstehung bis zur Löschung zu kontrollieren.

Kerndisziplinen (Core)

Datenqualitäts-

Management

Management des

Informations-Lebenszyklus

Informationssicherheit

& DatenschutzKerndisziplinen (Core)

Datenqualitäts-

Management

Management des

Informations-Lebenszyklus

Informationssicherheit

& Datenschutz

Hilfsdisziplinen (Supporting)

Datenverarbeitungs-

Architektur

Klassifikation

& Metadaten

Audit-Informationen,

Protokollierung & Berichte

unterstützen Hilfsdisziplinen (Supporting)

Datenverarbeitungs-

Architektur

Klassifikation

& Metadaten

Audit-Informationen,

Protokollierung & Berichte

unterstützen

erfordern

Ziele

Geschäftsnutzen

erfordern

Ziele

Geschäftsnutzen

Treiber (Enabler)

Organisatorische Strukturen & Problembewusstsein

ermöglichen

Stewardship

Informations-

Risiko-ManagementRichtlinien & Regeln

Treiber (Enabler)

Organisatorische Strukturen & Problembewusstsein

ermöglichen

Stewardship

Informations-

Risiko-ManagementRichtlinien & Regeln

Treiber (Enabler)

Organisatorische Strukturen & Problembewusstsein

ermöglichen

Stewardship

Informations-

Risiko-ManagementRichtlinien & Regeln

Page 37: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation37

Managing the lifecycle in Big Data Environments

Fragen

Page 38: Managing the Lifecycle in Big Data Environments - … Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery. ... InfoSphere Optim Compressed, secure, immutable, query-able & restorable

© 2013 IBM Corporation38

Managing the lifecycle in Big Data Environments

Dipl.-Betriebswirt

Wolfgang [email protected]

+49 160 9064 3048

IBM Software GroupSenior Technical Sales ProfessionalIBM Optim Classic, - zOS, - SAP, IBM InfoSphere Discovery

Please feel free to contact me if you have any questions ...