Transcript

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

October 31, 2017 | 10:00 AM PT

Architecting an Open Data

Lake for the Enterprise

© 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Today’s PresentersPratap Ramamurthy, Solutions Architect, Amazon Web Services

Ashwin Viswanath, Director, Cloud Product Marketing, Talend

Eric Anderson, Executive Director, Data, Beachbody

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Today’s Agenda• An overview of AWS and AWS Marketplace, with an

emphasis on AWS data lake solutions and Talend

• Overview of the Talend solutions featured in our story

• Challenges faced by Beachbody

• The Beachbody success story with AWS and Talend

• Q&A/Discussion

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Learning Objectives:1. How to migrate a variety of structured and unstructured data sources

to a data lake

2. How to shorten development and testing cycles

3. How to mitigate complex deployment challenges common to real-time

data

4. How to take advantage of Spark and Hadoop by generating native

code

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Data Lake and AWS

Drive business value with any type of data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Legacy Data Warehouses & RDBMS

• Complex to setup and manage

• Do not scale

• Takes months to add new

data sources

• Queries take too long

• Cost $MM upfront

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Should I Build a Data Lake?

Starting by amassing "all your data" and dumping

into a large repository for the data gurus to start

finding "insights" is like trying to win the lottery by

buying all the tickets

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rethink How to Become a Data-driven Business

• Business outcomes - start with the insights and actions

you want to drive, then work backwards to a streamlined

design

• Experimentation - start small, test many ideas, keep

the good ones and scale those up, paying only for what

you consume

• Agile and timely - deploy data processing infrastructure

in minutes, not months. take advantage of a rich platform

of services to respond quickly to changing business

needs

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Business Case Determines Platform Design

Ingest/

Collect

Consume/

visualizeStore

Process/

analyze

Data

1 40 9

5

Answers &

Insights

START HEREWITH A BUSINESS CASE

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Experiment and Scale Based on Your Business Needs

MATCHAVAILABLE DATA

Metrics and

Monitoring

Workflow

Logs

ERP

Transactions

Ingest/

Collect

Consume/

visualizeStore

Process/

analyze

Data

1 40 9

5

Answers &

Insights

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Business Outcomes on a Modern Data Architecture

Outcome 1 : Modernize and consolidate

• Insights to enhance business applications and create new digital services

Outcome 2 : Innovate for new revenues

• Personalization, demand forecasting, risk analysis

Outcome 3 : Real-time engagement

• Interactive customer experience, event-driven automation, fraud detection

Outcome 4 : Automate for expansive reach

• Automation of business processes and physical infrastructure

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Use an Optimal Combination of Highly

Interoperable Services

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Amazon S3 for Modern Data Architecture?

Designed for 11 9s

of durability

Designed for

99.99% availability

Durable Available High performance Multiple upload

Range GET

Store as much as you need

Scale storage and compute

independently

No minimum usage commitments

Scalable

Amazon EMR

Amazon Redshift

Amazon DynamoDB

Amazon Athena

IntegratedEasy to use

Simple REST API

AWS SDKs

Read-after-create consistency

Event notification

Lifecycle policies

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Decouple Storage and Compute

• Legacy design was large databases or

data warehouses with integrated

hardware

• Big Data architectures often benefit

from decoupling storage and compute

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Improving Data Agility with

TalendAshwin Viswanath,

Director of Cloud Product Marketing,

Talend

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lakes: The First Phase and the Future

First phase: Capture and store raw data of many

different types at scale

Next phase: Augment enterprise data warehousing

strategies

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Why Data Lake Projects Fail

No DevOps

Practices for

Scalability &

Testing

Lack of Expertise Siloed

Operating Model

Poor Data

Governance

Poor Architectural

Design &

Integration

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Foundational Elements of a Data Lake

Data Preparation

Self-service Data IngestMetadata Management

Data Classification

Data Lake

Data GovernanceData Lineage

Security Data Profiling

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Streamline DevOps Process for Big Data

Custom Cluster Configuration

• Retrieve Hadoop configuration

data from job server

• Upload configuration files to

different clusters based on role:

dev/test/prod

• Enforce uniform security

standards

• Available for Spark and Spark

Streaming jobs

Portable integration jobs across your environment

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data Matching and Machine Learning on

Spark

• New Data Stewardship interface simplifies matching process

• Improved performance through continuous matching speeding time to insight

Harmonize data at scale by learning from your key experts

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Faster and Better Real-time Insights with Spark

Enterprise – class robustness and Intelligent Integration

• New Spark Support

• Production-ready with Spark 2.1

• Toggle between Spark 1.X and 2.X

• Easily upgrade to Spark 2.X

• Natural Language Processing with Spark

• Data Preparation for Spark Streaming

• Talend Data Mapper runs with Spark Streaming

• Spark Streaming support for Kerberized Kafka 0.10

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data Governance

Complete End-to-end Data Lineage

• Understand more about your unstructured data with new cloud and big data metadata bridges

• Save time by automatically harvesting data structures to build a data lake inventory

• Manage change with version control and notifications

Metadata bridges

S3, Hadoop HDFS, Hive, MongoDB,

Couchbase, Cassandra, Apache Atlas

Files systems

Amazon S3, Hadoop HDFS, Unix, Windows, Linux

File formats

CSV, Excel, JSON, Avro, Parquet

Know Your Data for Increased Data Protection, Accessibility and Compliance

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Beachbody – Fitness goes Big

DataDriving innovation with Talend on AWS

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

About Beachbody

• A leading provider of fitness, nutrition, and weight-loss

programs

• Creator of P90X® Series, INSANITY®, FOCUS T25®, 21

Day Fix®, Body Beast®, PiYo®, and Hip Hop Abs®

• Empowers over 23 Million customers

• Supports 350K+ independent “Coach” distributors

• Operates with 800+ employees

• Sees 5 million+ monthly unique visits across digital

platforms

• Reached $1 billion in gross sales in 2015

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Challenge - Do More Better, Faster, Cheaper

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Project Vision – Open Enterprise Data Lake

• Build an OPEN Enterprise Data Platform

• Open Source Technology: Bring Your Own Tool

• Decentralized Data Ownership: Many teams can publish

• Centralized people, processes, and tools available

• Capture All Data as real-time as possible

• Access to All raw + processed data by Authorized Users

• HIPAA/PII encrypted or masked to for compliance

• Shift Time from Collecting Data -> Analyzing Data

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

The Technology

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lake Architecture

Amazon S3 Data Lake folder structure

AWS

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Architectural Design

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lake Component - Storage

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lake Component – Data Pipeline

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lake Component – RDBMS

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lake Component – Compute

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data Lake Component – Analytics

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Business Benefits

• Reduced Data Acquisition Time by 5x

• Improved Marketing Campaigns

• Reduced Site Tagging Costs

• Improved Employee Retention and Satisfaction

• Automated Customer Self-Service Order Status

• Identified Web Funnel Conversion Opportunities (testing now)

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Next steps and further information

• Data Lake solution on AWS:

https://aws.amazon.com/big-data/data-lake-on-aws/

• Take a Free 30-Day Trial of Talend Integration Cloud:

https://iam.integrationcloud.talend.com/idp/federation/up/login

• Try AWS for free:

https://aws.amazon.com/

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Q & A

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Thank you!


Top Related