Architecting an Open Data Lake for the Enterprise

Download Architecting an Open Data Lake for the Enterprise

Post on 21-Jan-2018

982 views

Category:

Documents

1 download

Embed Size (px)

TRANSCRIPT

  1. 1. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. October 31, 2017 | 10:00 AM PT Architecting an Open Data Lake for the Enterprise 2017, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  2. 2. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Todays Presenters Pratap Ramamurthy, Solutions Architect, Amazon Web Services Ashwin Viswanath, Director, Cloud Product Marketing, Talend Eric Anderson, Executive Director, Data, Beachbody
  3. 3. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Todays Agenda An overview of AWS and AWS Marketplace, with an emphasis on AWS data lake solutions and Talend Overview of the Talend solutions featured in our story Challenges faced by Beachbody The Beachbody success story with AWS and Talend Q&A/Discussion
  4. 4. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Objectives: 1. How to migrate a variety of structured and unstructured data sources to a data lake 2. How to shorten development and testing cycles 3. How to mitigate complex deployment challenges common to real-time data 4. How to take advantage of Spark and Hadoop by generating native code
  5. 5. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Data Lake and AWS Drive business value with any type of data
  6. 6. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Legacy Data Warehouses & RDBMS Complex to setup and manage Do not scale Takes months to add new data sources Queries take too long Cost $MM upfront
  7. 7. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Should I Build a Data Lake? Starting by amassing "all your data" and dumping into a large repository for the data gurus to start finding "insights" is like trying to win the lottery by buying all the tickets
  8. 8. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rethink How to Become a Data-driven Business Business outcomes - start with the insights and actions you want to drive, then work backwards to a streamlined design Experimentation - start small, test many ideas, keep the good ones and scale those up, paying only for what you consume Agile and timely - deploy data processing infrastructure in minutes, not months. take advantage of a rich platform of services to respond quickly to changing business
  9. 9. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Case Determines Platform Design Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights START HERE WITH A BUSINESS CASE
  10. 10. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Experiment and Scale Based on Your Business Needs MATCH AVAILABLE DATA Metrics and Monitoring Workflow Logs ERP Transactions Ingest/ Collect Consume/ visualize Store Process/ analyze Data 1 4 0 9 5 Answers & Insights
  11. 11. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Outcomes on a Modern Data Architecture Outcome 1 : Modernize and consolidate Insights to enhance business applications and create new digital services Outcome 2 : Innovate for new revenues Personalization, demand forecasting, risk analysis Outcome 3 : Real-time engagement Interactive customer experience, event-driven automation, fraud detection Outcome 4 : Automate for expansive reach Automation of business processes and physical infrastructure
  12. 12. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Use an Optimal Combination of Highly Interoperable Services
  13. 13. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Amazon S3 for Modern Data Architecture? Designed for 11 9s of durability Designed for 99.99% availability Durable Available High performance Multiple upload Range GET Store as much as you need Scale storage and compute independently No minimum usage commitments Scalable Amazon EMR Amazon Redshift Amazon DynamoDB Amazon Athena IntegratedEasy to use Simple REST API AWS SDKs Read-after-create consistency Event notification Lifecycle policies
  14. 14. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Decouple Storage and Compute Legacy design was large databases or data warehouses with integrated hardware Big Data architectures often benefit from decoupling storage and compute
  15. 15. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Improving Data Agility with Talend Ashwin Viswanath, Director of Cloud Product Marketing, Talend
  16. 16. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lakes: The First Phase and the Future First phase: Capture and store raw data of many different types at scale Next phase: Augment enterprise data warehousing strategies
  17. 17. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Data Lake Projects Fail No DevOps Practices for Scalability & Testing Lack of Expertise Siloed Operating Model Poor Data Governance Poor Architectural Design & Integration
  18. 18. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Foundational Elements of a Data Lake Data Preparation Self-service Data IngestMetadata Management Data Classification Data Lake Data GovernanceData Lineage Security Data Profiling
  19. 19. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streamline DevOps Process for Big Data Custom Cluster Configuration Retrieve Hadoop configuration data from job server Upload configuration files to different clusters based on role: dev/test/prod Enforce uniform security standards Available for Spark and Spark Streaming jobs Portable integration jobs across your environment
  20. 20. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Matching and Machine Learning on Spark New Data Stewardship interface simplifies matching process Improved performance through continuous matching speeding time to insight Harmonize data at scale by learning from your key experts
  21. 21. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Faster and Better Real-time Insights with Spark Enterprise class robustness and Intelligent Integration New Spark Support Production-ready with Spark 2.1 Toggle between Spark 1.X and 2.X Easily upgrade to Spark 2.X Natural Language Processing with Spark Data Preparation for Spark Streaming Talend Data Mapper runs with Spark Streaming Spark Streaming support for Kerberized Kafka 0.10
  22. 22. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Governance Complete End-to-end Data Lineage Understand more about your unstructured data with new cloud and big data metadata bridges Save time by automatically harvesting data structures to build a data lake inventory Manage change with version control and notifications Metadata bridges S3, Hadoop HDFS, Hive, MongoDB, Couchbase, Cassandra, Apache Atlas Files systems Amazon S3, Hadoop HDFS, Unix, Windows, Linux File formats CSV, Excel, JSON, Avro, Parquet Know Your Data for Increased Data Protection, Accessibility and Compliance
  23. 23. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Beachbody Fitness goes Big Data Driving innovation with Talend on AWS
  24. 24. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About Beachbody A leading provider of fitness, nutrition, and weight-loss programs Creator of P90X Series, INSANITY, FOCUS T25, 21 Day Fix, Body Beast, PiYo, and Hip Hop Abs Empowers over 23 Million customers Supports 350K+ independent Coach distributors Operates with 800+ employees Sees 5 million+ monthly unique visits across digital platforms Reached $1 billion in gross sales in 2015
  25. 25. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Challenge - Do More Better, Faster, Cheaper
  26. 26. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Project Vision Open Enterprise Data Lake Build an OPEN Enterprise Data Platform Open Source Technology: Bring Your Own Tool Decentralized Data Ownership: Many teams can publish Centralized people, processes, and tools available Capture All Data as real-time as possible Access to All raw + processed data by Authorized Users HIPAA/PII encrypted or masked to for compliance Shift Time from Collecting Data -> Analyzing Data
  27. 27. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Technology
  28. 28. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Architecture Amazon S3 Data Lake folder structure AWS
  29. 29. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Architectural Design
  30. 30. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component - Storage
  31. 31. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component Data Pipeline
  32. 32. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component RDBMS
  33. 33. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component Compute
  34. 34. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake Component Analytics
  35. 35. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Benefits Reduced Data Acquisition Time by 5x Improved Marketing Campaigns Reduced Site Tagging Costs Improved Employee Retention and Satisfaction Automated Customer Self-Service Order Status Identified Web Funnel Conversion Opportunities (testing now)
  36. 36. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Next steps and further information Data Lake solution on AWS: https://aws.amazon.com/big-data/data-lake-on-aws/ Take a Free 30-Day Trial of Talend Integration Cloud: https://iam.integrationcloud.talend.com/idp/federation/up/login Try AWS for free: https://aws.amazon.com/
  37. 37. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Q & A
  38. 38. 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!

Recommended

View more >