tmx ben sharma - zalonidata.zaloni.com/rs/626-tfj-400/images/stratany18_tmx_&_zaloni... · big...

20
1 Governing your cloud-based enterprise data lake Selwyn Collaco Chief Data Officer TMX Ben Sharma Founder and CEO, Zaloni

Upload: others

Post on 27-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

1

Governing your cloud-based enterprise data lake

Selwyn CollacoChief Data OfficerTMX

Ben SharmaFounder and CEO, Zaloni

Page 2: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

2

TMX Data StrategyUnlocking The Business Value Of DATA, An

Enterprise Asset

Page 3: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

3

Enterprise Data Strategy

ConsolidatedData Asset

1

Secure & Accessible

2

Advanced Analytics

& Visualization

3

3

“Our strategy is that we use DATA as an Enterprise (TMX) Asset to unlock Business Value and enable Growth”

Page 4: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

4

•••

••

••

•••

••

1 2 3

4

5

3

Foundational Enablers for Transforming TMX Data Assets

Page 5: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

5

Update On Our Journey“Unlocking business value through our DATA”

Data G

overnanceG

overnance process framew

ork for managing

data

▪ Data ownership▪ Data classification▪ Data quality

Client C360 Holistic view of our customers through TMX

Enterprise BIEnterprise wide BI and Data Visualization

Analytics as a ServiceUse case driven search based analytics

Enterprise Data PlatformEnterprise Data Lake – Enterprise Wide data repository

▪ Enable cross sell & up sell ▪ Enterprise-Wide Account management▪ Enterprise Client Relationship view

▪ Reduce time and effort for Reporting & Analysis ▪ Self-serve BI and Empowered business users▪ Improved Quality and Better insights

▪ Ad-hoc analytical capability in milliseconds▪ Real-time calculations ▪ White label for monetization opportunities

▪ Data Platform supporting Business Insights ▪ Technology enabling new Business Capabilities

• Research & Product Development• Improved data fulfilment• Advanced Analytics

CloudC360

Page 6: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

6

Enterprise Data & Analytics PlatformENABLING NEW CAPABILITIES

SINGLE SOURCE OF DATA | DATA MARKET PLACE | 3600 VIEW | ADVANCED ANALYTICS & VISUALIZATION

GOVERNANCEDATA POLICIES & OVERSIGHT | DATA DEFINITION | DATA MANAGEMENT | DATA CLASSIFICATION

Data Discovery

Data Profiling & Data Preparation

Analytics As a Service

Custom Solutions

Pre-defined AnalyticTemplates

Self-Service BI & Reporting

IngestBatch/

Streaming

File/ Relational

Internal/ Externally Hosted/

Third Party

DistributeSelf-Service

Data Prep

Data Consumption

Data Catalog

GovernMaster and Metadata

Management

Data Lineage

Data Security & Access controls

Data Quality

Data Consumption

Trading Data(Equities & Derivatives)

Security Related Data

Market Data

Listing Information

Billing / Finance Data

Clearing Data

TMX Data Assets

Alternate DataCustomer Data(Salesforce)

Powered By:

Analyst Workbench Advanced Analytics

ETF Broker

Order book Visualization

Analytics GRAPEVINE

Plug & Play Applications

TMX Enterprise Data Repository

Data Lake

Page 7: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

7

Non-Invasive Data Governance Approach

7

In order to develop a Data Governance structure that is right-sized for TMX, the proposed Enterprise Data Governance structure has been developed based on key observations identified through stakeholder interviews, which provided insights in to key opportunities and challenges, data domain structure1 and in-flight data initiatives

Key Opportunities and Challenges Identified

Data Domains & In-flight Initiatives

Enterprise Data Governance Structure

Global Solutions, Insights and Analytics (GSIA)

Montreal Exchange (MX) & Regulatory (MXR) CDS / CDCCGlobal Equity Capital Markets

(includes Equity Trading & Listings)

TSX Trust

Finance

Shorcan Enterprise Strategy

Marketing Legal & ERM Human Resources

Global Technology Solutions (GTS)

TMX Group

Lack of data ownership

Data accessibility and security

Duplicative Data

Third Party Data Management

Data Sharing Capabilities

Manual Processes

Data retention / access to historical data

- Trading – Deriv.- Trading – Equities- Trading – FI- Listings- Clearing – Deriv.- Clearing – Equities- Regulatory – Deriv.- Finance

Dat

a D

omai

nsIn

-flig

ht

Initi

ativ

es

Client 360

Issuer Services Excel.

Tableau

Advance Analytics Platform

Workday

Atlas

An ongoing responsibility of the Data Governance bodies will be to prioritize and address data opportunities and challenges

Owners for each domain have been identified and will serve as the foundation for the EDGC; an in-flight project will be selected for preliminary implementation

Based on the identified key opportunities and challenges, as well as TMX’s data domain structure, a two-tier structure is created

1 Data domains and data domain owners are further detailed in the TMX Data Asset Catalogue

Note: Working sessions did not include Trayport as they were a recent acquisition

- Marketing- Customer- Legal - Human Resources- Risk- Tech. Operations- External/Alternative

Page 8: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

8

Enterprise Data Platform – Key Technology Options

➢ There are number of Technology solutions available in market place for building the Enterprise Data Platform.

➢ Technologies required for Data Platform falls in THREE categories. ➢ Chart describe the categories and key options within those categories.

Technology Selected

Technology Platform

Big Data Hadoop Distribution

Big Data Management Tool

• On – Premise Servers• Cloud – AWS*, Azure

• Cloudera (On-prem, Cloud)• Hortonworks (On-prem, Cloud)• EMR (AWS)

• Informatica (Enterprise ETL)• IBM Data Stage (Enterprise ETL)• Zaloni (DataLake)

8

Page 9: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

9

Transform your business with an integrated self-service data platform

September 12, 2018

Page 10: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

10

Zaloni Confidential and Proprietary - Provided under NDA

10 Zaloni Proprietary and Confidential

Today’s data confusion

Page 11: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

11 © Zaloni Inc. 2018, All rights reserved | Zaloni proprietary

The key is an integrated self-service data platform

Advanced Analytics,BI, Enterprise Reporting,Applications

Data Management, Governance, Self-service

Compute

Zaloni Data PlatformIntegrated self-service data platform

Storage

1. Manage data across on-prem and cloud environments

2. Metadata management to find useful data and enable governance

3. Automation of data pipelines for scale and efficiency

4. Improve time to insight with self-service for data consumers

Page 12: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

12

Zaloni Confidential and Proprietary - Provided under NDA

12 Zaloni Proprietary and Confidential

Governed Data Lake

No matter where you are on your data lake journey...

Govern

Improve governance and adoption

Modernize your platform

● Early Hadoop adopters● Not sure what data is in the swamp● Need to right-size governance● Data duplicated across ponds● Enterprise adoption● Need to demonstrate ROI

● Need more agility and flexibility● Need advanced analytics to reduce

time to insight● Do not want the complexity of

building and integrating the platform

Self-service data platform

● Acquire useful data from across enterprise

● Improved visibility and understanding via metadata management

● Ensure security/privacy of sensitive data

● Scalable production data lake for new and improved business insights

Data Warehouse

Traditional ArchitectureData Swamp/Data Ponds

Scale Out Architecture

Page 13: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

Typical data lake reference architecture

• Ad-hoc exploratory analysis

• Data scientists, data consumers with proper privileges

• Standardized on corporate governance/ quality policies

• Consumers are anyone with appropriate role-based access

TransientLanding Zone Raw Zone

Refined Zone

Trusted Zone

Sandbox

Data Lake

• Temporary store of source data

• Consumers are IT, Data Stewards

• Implemented in highly regulated industries

• Original source data ready for consumption

• Consumers are developers, data stewards, some data scientists

• Single source of truth with history

• Data required for LOB specific views - transformed from existing certified data

• Consumers are anyone with appropriate role-based access

Sensors (or other time series data)

Relational Data Stores (OLTP/ODS/DW)

Logs(or other unstructured

data)

Social and shared data

Page 14: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

14 Zaloni proprietary – do not duplicate without permission

Zaloni’s integrated self-service data platform (ZDP) accelerates time to insight

ENABLEGOVERN

ENGAGE

Enable Govern Engage• Batch Ingestion• Streaming Ingestion• Metadata Capture• Auto Discovery

• Data Quality• Data Lineage• Data Mastering • Data Privacy/Security• Data Enrichment• Data Lifecycle Management

• Data Marketplace• Self-Service Operations

Cloud, On-Premises, HybridInfrastructure and cloud-platform agnostic

Provides the foundation for your data initiatives

Page 15: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

15

Scale out ingestion of data- Batch and streaming ingestion- DB ingestion with support for full/incremental updates- Metadata capture and integration with Data Pipeline- Ingestion Wizard simplifies ingestion and creates reusable workflows

Enable the Data Lake with ZDP

Automated data inventory- Crawls the data store to catalog datasets - Automatically detects metadata in several cases

Workflow and orchestration- Robust workflow management feature with

scalable and extensible architecture

Monitor the health of the data lake- Single unified log repository for all data lake operations- Dashboards for operational health of the data lake

Page 16: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

16

Metadata management- Business, technical and operational metadata- Captures lineage from ingestion to

consumption- Integration with Enterprise Metadata

repositories

Security and access control- Masking and Tokenization of sensitive data- RBAC for data lake artifacts – Entities, DQ rules,

ingestion with push down to underlying security framework

- Supports kerberized environment

Data lifecycle management- Policy based DLM- Leverages HDFS storage tiers and

supports S3 interface

Data quality- Rules Engine for DQ- DQ reporting and analysis- Automation of DQ remediation process

Govern the Data Lake with ZDP

Data enrichment- Drag and drop enrichment of the data- Support for data operations – JOIN, FILTER,

UNION, etc. - Out of the box format conversion, parsers with

ability to extend with custom implementations

Page 17: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

17

Data marketplace- Rich data catalog that aggregates business,

technical and operational metadata- Global search allows data consumers and

producers to search across data lake zones, projects, workflows, data quality rules, transformations and other related content

- Annotate, Tag, Rate and create custom metadata

- Workspaces for collaboration

Engage the Business with ZDP

Data provisioning- Shopping cart experience- Sandbox provisioning into the data lake or

RDBMS

Page 18: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

Zaloni Confidential and Proprietary - Provided under NDA

18 Zaloni Proprietary and Confidential

AWS Data Lake Platform Components

Serving Layer

Storage Layer

Processing Layer

Ingestion

Consumption Layer

Data Mgmt and

Governance

Lambda Glue

Quicksight

EMR

Kinesis

Athena

RDS DynamoDBEMR Redshift

S3 EFS Glacier

ES

ML

StorageGateway

Zaloni DataPlatform Zaloni Data

PlatformDatabases

Files

Streams

SaaS APIs

Sources

Security and Networking components are included but not shown in this architecture

Page 19: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

Zaloni Confidential and Proprietary - Provided under NDA

19 Zaloni Proprietary and Confidential

Solution Architecture for S3 Data Lake

TrustedRaw Refined SandboxLanding zone

Files

Kinesis Kinesis Firehose

StorageGateway

Databases

Ingest Validate Enrich Provision

RedshiftESEMR

Streams

Filecopy Process

Glue

Zaloni Data Platform

Ingest Catalog DQ SecurityMetadata

Provision

Athena

Query

Directory ServiceActive

DirectoryQuicksight

Page 20: TMX Ben Sharma - Zalonidata.zaloni.com/rs/626-TFJ-400/images/StrataNY18_TMX_&_Zaloni... · Big Data Hadoop Distribution Big Data Management Tool •On – Premise Servers •Cloud

Complimentary copy of “Architecting Data Lakes”

2nd edition

Visit Booth #1115

The Data Imperative Ben Sharma, CEO Zaloni

Thursday Keynotes, 10:20am – 3E