enterprise architecture in the era of big data and quantum computing
TRANSCRIPT
©2014 Knowledgent Group Inc. All Rights Reserved
Pragmatic Enterprise ArchitectureIn the Era of Big Data and Quantum Computing
©2014 Knowledgent Group Inc. All Rights Reserved
Agenda• Bio / Who is Jim Luisi?• WIIFM / What’s in it for me?• Enterprise Architecture … Past… Future• What is Big Data?• Why has Big Data recently become more compelling?• What architectural disciplines relate to Big Data?• EA View of Big Data• IT Challenge• Big Data TPM… APM… DB Architectures… Quantum Computing• Big Data Use Case Families• Hardware Architecture / QC• Software Architecture / QC• Hypernumbers and QC in Perspective• Big Data Technology Combination of Accelerators• Big Data IA Foundation• Big Data Metadata… MDM… RDM• Big Data Ecosystem• Questions
©2014 Knowledgent Group Inc. All Rights Reserved
Bio
• 30 years• Business
- Business owner- Big Data in advertising industry- Worked on business side in Wall St
• IT- VLDB with government before Big Data- Specialized compartments in NP-complete problem space - Hadoop Big Data architecture with government- Many areas of artificial intelligence are in the Big Data space
• Philosophy- holistic view- Enterprise Architecture perspective
• Author- Artificial intelligence- Enterprise architecture
Who is Jim Luisi?
©2014 Knowledgent Group Inc. All Rights Reserved
• 99.9% of the Big Data mystery will be removed• You’ll be able to engage in the conversation• You can help transform your organization
WIIFMWhat’s in it for me?
©2014 Knowledgent Group Inc. All Rights Reserved
• EA was a corporate committee of generalists• EA focused on a few parts
- Capability that answered basic questions crossing – Applications– Tools– Organizational Units
EA PastEnterprise Architecture
©2014 Knowledgent Group Inc. All Rights Reserved
• EA will render a holistic view of automation• Standards and governance will manage complexity
EA FutureEnterprise Architecture
©2014 Knowledgent Group Inc. All Rights Reserved
• Subject matter experts will make the technology ‘D’• SMEs will collaborate with other SMEs• SMEs will minimize complexity
EA Future (cont’d)Enterprise Architecture
©2014 Knowledgent Group Inc. All Rights Reserved
• From a storage perspective- Lots of storage
• From a distribution perspective- Data stored in lots of places globally
• From a database design perspective- Lots of rows- Lots of tables- Lots of columns
• From an algorithmic or mathematical perspective- Lots of variables- Lots of combinations- Lots of permutations- Summed up as one optimal answer among a large number of possibilities
What is Big Data?
©2014 Knowledgent Group Inc. All Rights Reserved
Why Has Big Data become more compelling?• Big Data has become
- More advanced to address a wider array of business challenges- More friendly to developers and end users- A low cost alternative to traditional transaction databases (OldSQL)- A significant competitive differentiator in many industries
• Big Data can now address business use cases involving:- Many enterprise grade requirements- Vast volumes of structured, unstructured, and semi-structured data- High speed and complex data transformation rates- High velocity ingestion rates- Large numbers of concurrent users- Competitive hyper-real-time requirements (e.g., milliseconds, nanoseconds)
©2014 Knowledgent Group Inc. All Rights Reserved
What architectural disciplines relate to Big Data?
Business ArchitectureBusiness Continuity
Marketing Architecture Operations Architecture
Disaster Recovery ArchitectureStorage Architecture
Infrastructure ArchitectureNetwork SecurityApplication
ArchitectureReporting Architecture
Integration ArchitectureApplication Portfolio ManagementTechnology Portfolio Management
Application Security
Information Architecture
Data Architecture
Data Governance ArchitectureMaster Data Management (MDM &
RDM)Metadata Management
Data Security
©2014 Knowledgent Group Inc. All Rights Reserved
• Variety of Big Data- database technologies- hardware technologies- applications- reporting tools
• Variety of sources- Open source providers- Licensed providers- In-house developed- Custom built providers
EA View of Big Data Enterprise Architecture
©2014 Knowledgent Group Inc. All Rights Reserved
• All requiring a solid foundation• None of which are free• Adding to the complexity of your IT landscape
IT ChallengeEnterprise Architecture
©2014 Knowledgent Group Inc. All Rights Reserved
• Competing open source foundations- Apache Software Foundation (ASF)- Free Software Foundation (FSF)- Many individual companies
• Competing open source providers of Hadoop- Hortonworks, - Cloudera- MapR- IBM- Microsoft- Intel (IDH) now being sunset
Big Data TPMTechnology Portfolio Management
©2014 Knowledgent Group Inc. All Rights Reserved
TPM / Competing Big Data Frameworks
• and more…
©2014 Knowledgent Group Inc. All Rights Reserved
• Applications use assorted Big Data technologies- Open source- Licensed- Traditional databases (OldSQL)- Traditional Big Data technologies- Non-traditional (NoSQL)- OLTP (New SQL) fully ACID- Proprietary breed- Quantum computing
Big Data APMApplication Portfolio Management
• Approved– Data sources– Use cases– Technologies– Time period
• Approved– Data sources– Technologies– Time period
• Approved– Data sources– Data target– Time period
©2014 Knowledgent Group Inc. All Rights Reserved
Parent Child Data
A
B C
D E X
Hierarchical
Parent First Set A Last Set A Data
A
B1 B2 B3
C
X
Network Linked List
Parent Next Set A Prior Set A First Set C Last Set C Data Data
AssociativeRecord AKey Value 'Blue Eyes' 'Height '6''Pointers Addr2 Addr3
Addr4 Addr4Addr7 Addr10Addr10 Addr14
Inverted List
DataAddr1 DataAddr2 DataAddr3 DataAddr4 DataAddr5 DataAddr6 Data
Data
Loan # Date Type AmountPK
Data C1 C2 C3 C4
Loan # Payment # Due Date Due Amount
Data C1 P1 P2 P3
Loan
Payment Coupon
PK
RelationalHeader
Row3
Row4 cntd
Db Page (4K)
Footer
Row1
Row2
Row3 cntd Row4
Free Space
Underlying Storage Structure
Just as there are DB Architectures of OldSQL
©2014 Knowledgent Group Inc. All Rights Reserved
There are database architectures ofTPM / Traditional Big Data Technologies
©2014 Knowledgent Group Inc. All Rights Reserved
Microsoft HDInsight Intel Distribution (IDH)
And there are database architectures ofTPM / Non- Traditional Big Data Technologies
©2014 Knowledgent Group Inc. All Rights Reserved
TPM / Newer SQL Types
NoSQL is a new way of thinking about databases, founded on the belief that a relational database model may not be the best solution for all situations.
NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing workloads while still maintaining the ACID guarantees of a traditional database system.
NoSQL NewSQL
©2014 Knowledgent Group Inc. All Rights Reserved
What does ACID mean?
• Atomicity- refers to all or nothing for a logical unit of work
• Consistency- refers to adherence of data integrity rules that are enforced by the DB
• Isolation- refers to the need to enforce a sequence of transactions when updating a
database (e.g., two purchasers both trying to purchase the last instance of an item)
• Durability- refers to safeguarding information once a commit has been performed to
declare successful completion of a transaction.
©2014 Knowledgent Group Inc. All Rights Reserved
TPM / Proprietary Big Data Product Names• US Government
- Hypernumbers
• Major Financial Conglomerate- Hypercube
• High-tech Companies- Named for each client- Hypercompression- LNB- DBX / DB accelerator
©2014 Knowledgent Group Inc. All Rights Reserved
TPM / Quantum Computing• Probabilistic computing
- confidence levels- two competing quantum computing architectures
• Gate model (aka quantum circuit)- Shor’s algorithm (code breaking algorithm)- present implementations have few qubits- qubit and gate growth is linear- prone to decoherence (quantum physics wave function collapse)- complete error correction theory
• Adiabatic quantum computing (AQC)- discrete combinatorial optimization problems (NP-complete)- present implementations are 512 qubits- qubit growth is geometric doubling every 24 months- not prone to decoherence- lacks complete error correction theory
©2014 Knowledgent Group Inc. All Rights Reserved
What are some Big Data Use Case Families?
• Document / content management• Online transaction processing (OLTP)• Data warehousing• Real-time analytics• Batch analytics• Geographic information systems (GIS)• Search• Predictive analytics• Deterministic algorithms• Probabilistic
©2014 Knowledgent Group Inc. All Rights Reserved
Document / Content Management Use CasesIT Documents
presentations word documents spreadsheets spreadsheet applications MS Access applications standards documents company policies architectural frameworks
Candidate Requirement TypesMaximum document size
Maximum ingestion rates
Maximum access rates
Maximum and mean access speed
Maximum number of documents
Maximum total storage
Maximum concurrent users
Supported data types
Maximum number of indexes
Global accessibility
Multi-data center
Fault tolerance
Developer friendly
Business Documents loan applications and documents mortgage applications and
documents insurance applications and
documents new account forms
Customer Documents diplomas customer in birth certificates insurance policies records for tax preparation
MongoDB 16 MB limit per document Developer friendly
Cassandra 2 GB limit per document* Few keys
Hadoop HDFS No practical limit per
document size Slow access
Basho Riak 2 GB limit per document Fault tolerant
Couchbase 20 MB limit per document
MarkLogic 512 MB limit per
document High speed access
Candidate Solutions
©2014 Knowledgent Group Inc. All Rights Reserved
Online Transaction Processing Use Cases
Candidate Solutions
E-Commerce Global web-based transactions Global inventory Global shipping
Marketing RFID supply chain Opportunity based marketing Google glasses applications
Consumer Products & Services In-home medical care word documents
Governmental Capabilities Military logistics Homeland security
Financial Industry Global customer exposure Operational risk
Candidate Requirement TypesPeak transactions per secondMaximum length of transaction
System availabilitySystem security
High volume transaction access pathsNumber of concurrent connections and sessions
Internationalization (e.g., Unicode)Data volume requiring compression
Geospatial index supportFull text searchIndex support
Sharding supportMaximum value size
Operating systemMinimum memory requirements
Real time analytics
VoltDB High speed in-memory ACID
SQLFire High speed in-memory ACID
NuoDB ACID
Google Spanner Successor to BigTable ACID
Clustrix High speed in-memory Shared-nothing
architecture
Akiban ACID
©2014 Knowledgent Group Inc. All Rights Reserved
Data Warehousing Use Cases
Candidate Solutions
Financial Industry Underwriting
Marketing Customer analytics M&A decision making Divestiture decision making Campaign management Customer analytics
Science Based Industry Pharmaceutical development Genetics research
Governmental Capabilities Materiel management Intelligence capabilities Human disease management Food supply analysis
Candidate Requirement Types
High data ingestion ratesLarge data persistence layer
Large number of concurrent usersDrill downs
Internationalization (e.g., Unicode)Data volume requiring compression
Comprehensive index supportSharding support
Robust SQL interfaceBackup and restorabilityDisaster recoverabilityCommodity hardware
Staffing and skill availabilityTrainabilityAffordability
Vertica High speed in-memory High speed DB layer High ingestion rate High concurrent users Real-time analytics
Teradata High speed DB layer High ingestion rate High concurrent users Widest array of connectors
Hadoop High ingestion capacity Not for real-time analytics
Greenplum High speed in-memory High speed DB layer High ingestion rate High concurrent users Real-time analytics
SAP Hana High speed in-memory High speed DB layer High ingestion rate High concurrent users Real-time analytics
©2014 Knowledgent Group Inc. All Rights Reserved
Real Time Analytics Use Cases
Candidate Solutions
Investment risk Operational risk Financial risk Market risk Credit risk
Governmental Capabilities Intelligence capabilities Human disease management
Marketing Opportunity based marketing Dynamic web advertising
Regulatory exception reporting Operational performance Trading analytics Algorithmic trading Real time valuations
Financial Industry
Candidate Requirement Types
High data ingestion ratesLarge data persistence layer
Real-time analyticsLarge number of concurrent users
Drill downsInternationalization (e.g., Unicode)Data volume requiring compression
Comprehensive index supportSharding support
Robust SQL interfaceBackup and restorabilityDisaster recoverabilityCommodity hardware
Staffing and skill availabilityTrainabilityAffordability
Vertica High speed in-memory High speed DB layer High ingestion rate High concurrent users Real-time analytics
Greenplum High speed in-memory High speed DB layer High ingestion rate High concurrent users Real-time analytics
SAP Hana High speed in-memory High speed DB layer High ingestion rate High concurrent users Real-time analytics
Teradata High speed DB layer High ingestion rate High concurrent users Widest array of connectors
©2014 Knowledgent Group Inc. All Rights Reserved
Batch Analytics Use Cases
Candidate SolutionsCandidate Requirement Types
High data ingestion rates
Large data volumes
Validation ability
Traceability
Integratability
Maintainability
Affordability
Financial Industry Financial crime Anti-money laundering Sanctions FATCA Insurance fraud detection Back testing Credit risk Portfolio valuation
Hadoop HDFS MapReduce Large data volumes
Hadoop HBase MapReduce (one HBase table as input) Large data volumes
Engineering Equipment failure forecasting
Commerce Collusion forecasting Legislative forecasting Regulatory forecasting
Marketing Customer analytics
Government al Capabilities Terrorist activity forecasting Terrorism event forecasting
©2014 Knowledgent Group Inc. All Rights Reserved
Geographic Information System Use Cases
Candidate SolutionsCandidate Requirement Types
User friendly
ACID compliant
Flexibility
Query able
Full spatial function support
Address Geocoding Warrant serving Emergency services Crime analysis Public health analysis
Linear Metric Event Modeling Road maintenance activities Roadway projects Traffic & safety analysis
Cartography Hazardous materials tracking Taxable asset tracking
Routing Evacuation planning Towing & plowing Refuse removal Emergency services
Topological Elevation data Orthophotography Hydrography
Neo4j Fully ACID
PostGIS Open source Geographic support Built on PostgreSQL
Oracle Spatial Spatial support in Oracle
GeoTime Temporal 3D visual analysis
©2014 Knowledgent Group Inc. All Rights Reserved
Search Use Cases
Candidate SolutionsCandidate Requirement Types
UsabilityFlexibilty
Maintainability
Search as a Service (Saas) Website search
Search Enabled Applications External data source identification
E-discovery Legal holds Investigations
Data Landscape Mapping Locating data cross data center
Lucidworks Built on Solr GUI for common use cases
Solr Flexibility
Splunk Machine generated output
©2014 Knowledgent Group Inc. All Rights Reserved
Predictive Analytics Use Cases
Candidate Solutions
Financial Industry Capital markets fraud detection Wholesale banking fraud detection Retail banking fraud detection Insurance fraud detection Market risk forecasting Market opportunity forecasting Operational defect forecasting
Engineering Equipment failure forecasting
Commerce Collusion forecasting Regulatory shift forecasting
Marketing Customer LTV scoring Customer defection score
Governmental Capabilities Terrorist activity forecasting Terrorism event forecasting
Candidate Requirement Types
High data ingestion rates
Large learning sets
Supports complex models
Rapidly changing learning sets
High volume for operational deployment
Validation ability
Traceability
Integratability
Maintainability
Affordability
Fair Isaac HNC Large feature set Extensive professional
services High speed High learning rate High concurrent users Real-time deployment
SAS CEP & Statistical Packages High speed Supports many source types Comprehensive feature set
Ward Systems Large feature set Extensive professional
services High speed High learning rate High concurrent users Real-time deployment
©2014 Knowledgent Group Inc. All Rights Reserved
Particular Deterministic Algorithmic Use Cases
Candidate Solutions
Any Industry Matrix-vector multiplication Relational algebra Computing Selections Computing Projections Union, Intersection, Difference Grouping and aggregation Reducer size and replication rate Similarity joins Graph modeling
Candidate Requirement TypesHigh data ingestion rates
Large data volumes
Validation ability
Traceability
Integratability
Maintainability
Affordability
Hadoop HDFS MapReduce Large data volumes
Hadoop HBase MapReduce Large data volumes
IBM Netezza Hardware based algorithms
©2014 Knowledgent Group Inc. All Rights Reserved
NP Deterministic Algorithmic Use Cases
Candidate Solutions(Proprietary)
NP-complete problems- reasonably testable
Candidate Requirement TypesMassive number of tables involved in joins
Massive number of rows involved in joins
Large number of calculations
High permutation count
High combination count
Massively distributed
Super large numbers
Real-time
Hyper-numbers
Hyper-cubes
Hyper-compression
DBX
LNB
NP-hard problems- proof is not reasonably testable
©2014 Knowledgent Group Inc. All Rights Reserved
What is NP?
• NP stands for ‘non-deterministic polynomial time’
• Conventional computers can take billions or trillions of years
• NP-complete and NP-hard refers only to the ‘check ability’ of a solution in time
• NP-complete can be checked in a reasonable length of time
• NP-hard cannot be checked in a reasonable length of time
©2014 Knowledgent Group Inc. All Rights Reserved
Probabilistic Use Cases• Cryptography (aka code breaking)
• Prime number generation
• Traveling salesmen problem
• Labeling images and objects within images
• NLP meaning extraction
• Correlations among Big Data- genetic code correlations
• Testing a scientific hypothesis
• Machine learning for problem solving- self-programming
©2014 Knowledgent Group Inc. All Rights Reserved
Hardware architecture / AQC• Commercially available platforms
- 128 qubit Rainier-4 (D-Wave One)- 512 qubit Vesuvius 3 (D-Wave Two)- 10’ black cube- electromagnetically shielded- digital optical cables- closed liquid helium cooling system (20mK is 100 times colder than interstellar
space 2.75K)- cylindrical magnetic shields <1 nanoTesla (nT)
• Each qubit has four wave function values- ‘-1-1’, ‘+1+1’, ‘-1+1’, ‘+1-1’
• Each qubit is surrounded by switches- over 180 Josephson junctions per qubit in 3-D space
©2014 Knowledgent Group Inc. All Rights Reserved
Software Architecture / AQC
full disjunctive normal form
©2014 Knowledgent Group Inc. All Rights Reserved
AQC Software - Bottom Up• Programming the hardware
- is all about managing the switches - to represent a Boolean expression that is to be optimized- to allow the expression to be preloaded with data values- to allow the variables to be set to an unknown state- the slowest step is the annealing process- qubit values are inspected- solution is tested on a conventional computer- calculation is repeated multiple times
• In quantum terms you create a mathematical formula to represent the energy state of the system in the form of a Boolean SAT problem
• SAT were the first documented example of an NP-complete problem where no know algorithms can solve them in a reasonable length of time using conventional computers*
©2014 Knowledgent Group Inc. All Rights Reserved
AQC Software - Bottom Up (cont’d)
• System Application Program Interface- communicates directly with the hardware- requires expertise in quantum mechanics mathematics- is necessary to program only when
– developing new functions– exploring quantum physics– conducting QC experiments
• Complier- does not convert a higher level language into machine language- requires no knowledge of
– QC physics– QC hardware
- it is layer that allows the programmer to focus on – bit strings and Boolean mathematics
©2014 Knowledgent Group Inc. All Rights Reserved
AQC Software - Top Layers
• Client libraries- conventional programming languages
• Frameworks- wrapped functions (aka toolkit) bundled for reuse
– supervised binary classification– supervised multiple label assignment– unsupervised feature learning
• Applications (top)- this layer interacts with the end user GUI- the GUI itself is on a conventional computer
– directly outside the cube– on a conventional network or Intranet or– on the Internet anywhere with access
©2014 Knowledgent Group Inc. All Rights Reserved
D-Wave Customers
US Government
D-Wave’s QC is a controversial sciencescale of quantum entanglement
©2014 Knowledgent Group Inc. All Rights Reserved
Hypernumbers and QCin Perspective
• AQC computing today- Effective probabilistic approach- Just starting to get traction
• Hypernumbers- Highly effective deterministic approach- Hypernumber architecture
– Deeper into the polynomial problem space
• Promise of QC- Potentially deeper into polynomial timespace
©2014 Knowledgent Group Inc. All Rights Reserved
Big Data Technology is a Combo of Accelerators
• Reduced code set- Eliminating large amounts of DBMS code- Eliminating large amounts of OLTP code
• Distributed processing- Parallel processing- Loosely or tightly coupled
• Compression- Data encoding- Least number of bit data functions
• Proprietary hardware- Performing algorithms at the data persistence layer- Massively parallel platforms, networks, etc.- Quantum computer platforms
©2014 Knowledgent Group Inc. All Rights Reserved
Big Data IA Foundation• Information architecture
- Logical data architecture- Physical data architecture
• Master data management- Shared master data- Reference data management
– Code tables– Shared files
• Metadata management- Business data glossary- SDLC metadata- Big Data metadata ecosystem
©2014 Knowledgent Group Inc. All Rights Reserved
Big Data Metadata Ecosystem
©2014 Knowledgent Group Inc. All Rights Reserved
Integrated MDM RDM Architecture
• RDM- Centrally maintained for the enterprise- Passed to application databases- Managed in HBase for reference data lookups
• MDM- Processed prior to Big Data deployment
– Landed– Profiled– Cleansed– Standardized– Integrated
©2014 Knowledgent Group Inc. All Rights Reserved
Big Data Ecosystem Without EA
•Uncoordinated
•Inconsistent
•Complex
©2014 Knowledgent Group Inc. All Rights Reserved
EA Driven Big Data Ecosystem
©2014 Knowledgent Group Inc. All Rights Reserved
Questions?
• Pragmatic Enterprise Architecture: Strategies to Transform Information Systems in the Era of Big Data and Quantum Computing
• James V Luisi / 732-740-2274
• Connect on Linked-in for new books and updates
• Available at Amazon.com