files.ondemand.cloudera.com...storing relational and key-value data: amazon rds and dynamodb chapter...
Post on 29-Jun-2020
7 Views
Preview:
TRANSCRIPT
Cloud Fundamentals
191213
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be re-produced or shared without prior written consent from Cloudera.
IntroductionChapter 1
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-2
Trademark Information
▪ The names and logos of Apache products mentioned in Cloudera trainingcourses, including those listed below, are trademarks of the Apache SoftwareFoundation
Apache Accumulo Apache Hive Apache PigApache Avro Apache Impala Apache RangerApache Ambari Apache Kafka Apache SentryApache Atlas Apache Knox Apache SolrApache Bigtop Apache Kudu Apache SparkApache Crunch Apache Lucene Apache SqoopApache Druid Apache Mahout Apache StormApache Flink Apache NiFi Apache TezApache Flume Apache Oozie Apache TikaApache Hadoop Apache ORC Apache ZeppelinApache HBase Apache Parquet Apache ZooKeeperApache HCatalog Apache Phoenix
▪ All other product names, logos, and brands cited herein are the property oftheir respective owners
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-3
Chapter Topics
Introduction
▪ About This Course
▪ Introductions
▪ About Cloudera
▪ About Cloudera Educational Services
▪ Course Logistics
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-4
Course Objectives
During this course, you will learn
▪ The advantages of deploying infrastructure as a service in the cloud
▪ How to estimate and optimize the cost of running services in the cloud
▪ How to secure cloud resources
▪ How to create and manage a network in the cloud
▪ How to deploy, modify, and delete new resources in the cloud
▪ How to deploy and manage compute resources
▪ How to store data in the cloud using object stores and databases
▪ How to create and work with cloud managed services
▪ How to deploy infrastructure programatically
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-5
Chapter Topics
Introduction
▪ About This Course
▪ Introductions
▪ About Cloudera
▪ About Cloudera Educational Services
▪ Course Logistics
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-6
Introductions
▪ About your instructor
▪ About you─ Currently, what do you do at your workplace?─ What is your experience with database technologies, programming, and
query languages?─ How much experience do you have with UNIX or Linux?─ What is your experience with big data?─ What do you expect to gain from this course? What would you like to be
able to do at the end that you cannot do now?
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-7
Chapter Topics
Introduction
▪ About This Course
▪ Introductions
▪ About Cloudera
▪ About Cloudera Educational Services
▪ Course Logistics
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-8
About Cloudera
THE ENTERPRISE DATA CLOUD COMPANY
▪ Cloudera (founded 2008) and Hortonworks (founded 2011) merged in 2019
▪ The new Cloudera improves on the best of both companies─ Introduced the world’s first Enterprise Data Cloud─ Delivers an comprehensive platform for any data from the Edge to AI─ Leads in training, certification, support, and consulting for data professionals─ Remains committed to open source and open standards
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-9
Cloudera Data Platform
A suite of products to collect, curate, report, serve, and predict
▪ Cloud native or bare metaldeployment
▪ Powered by open source
▪ Analytics from the Edge to AI
▪ Unified data control plane
▪ Shared Data Experience (SDX)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-10
Cloudera Shared Data Experience (SDX)
▪ Full data lifecycle: Manages your data from ingestion to actionable insights
▪ Unified security: Protects sensitive data with consistent controls
▪ Consistent governance: Enables safe self-service access
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-11
Self-Serve Experiences for Cloud Form Factors
▪ Services customized for specific steps in the data lifecycle─ Emphasize productivity and ease of use─ Auto-scale compute resources to match changing demands─ Isolate compute resources to maintain workload performance
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-12
Cloudera DataFlow
▪ Data-in-motion platform
▪ Reduces data integrationdevelopment time
▪ Manages and securesyour data from edge toenterprise
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-13
Cloudera Machine Learning
▪ Cloud-native enterprise machine learning─ Fast, easy, and secure self-service data science in enterprise environments─ Direct access to a secure cluster running Spark and other tools─ Isolated environments for running Python, R, and Scala code─ Teams, version control, collaboration, and project sharing
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-14
Cloudera Data Hub
Customize your own experience in cloud form factors
▪ Integrated suite of analytic engines
▪ Cloudera SDX applies consistent security and governance
▪ Fueled by open source innovation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-15
Chapter Topics
Introduction
▪ About This Course
▪ Introductions
▪ About Cloudera
▪ About Cloudera Educational Services
▪ Course Logistics
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-16
Cloudera Educational Services
▪ We offer a variety of ways to take our courses─ Instructor-led, both in physical and virtual classrooms
─ Private and customized courses also available─ Self-paced, through Cloudera OnDemand
▪ Courses for all kinds of data professionals─ Executives and managers─ Data scientists and machine learning specialists─ Data analysts─ Developers and data engineers─ System administrators─ Security professionals
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-17
Cloudera Education Catalog
▪ A broad portfolio across multiple platforms─ Not all courses shown here─ See our website for the complete catalog
Administrator Security NiFi AWS Fundamentals
for CDP
Data Analyst Hive 3 Kudu Cloudera Data WarehouseCDP
SparkSpark PerformanceTuning
Stream Developer Kaa Operaons Search | Solr
ArchitectureWorkshop
Private ClassPublic ClassOnDemand
Data Scienst Cloudera DS Workbench CML
DATA ANALYST
DEVELOPER &DATA ENGINEER
DATA SCIENTIST
ADMINISTRATOR CDH | HDP CDH|HDP CDF
CDH | CDP HDP CDH
CDH | HDP CDH CDF CDH CDH CDH
CDH|HDP|CDP CDH | HDP CDP
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-18
Cloudera OnDemand
▪ Our OnDemand catalog includes─ Courses for developers, data analysts, administrators, and data scientists,
updated regularly─ Exclusive OnDemand-only courses, such as those covering security and
Cloudera Data Science Workbench─ Free courses such as Essentials and Cloudera Director available to all with or
without an OnDemand account
▪ Features include─ Video lectures and demonstrations with searchable transcripts─ Hands-on exercises through a browser-based virtual environment─ Discussion forums monitored by Cloudera course instructors─ Searchable content within and across courses
▪ Purchase access to a library of courses or individual courses
▪ See the Cloudera OnDemand information page for more details or to make apurchase, or go directly to the OnDemand Course Catalog
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-19
Accessing Cloudera OnDemand
▪ Cloudera OnDemandsubscribers can accesstheir courses onlinethrough a web browser
▪ Cloudera OnDemand is also available through an
iOS app─ Search for “Cloudera OnDemand” in the iOS
App Store
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-20
Cloudera Certification
▪ The leader in Apache Hadoop-based certification
▪ Cloudera certification exams favor hands-on, performance-based problemsthat require execution of a set of real-world tasks against a live, workingcluster
▪ We offer two levels of certifications─ Cloudera Certified Associate (CCA)
─ CCA Spark and Hadoop Developer─ CCA Data Analyst─ CCA CDH Administrator and CCA HDP Administrator
─ Cloudera Certified Professional (CCP)─ CCP Data Engineer
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-21
Chapter Topics
Introduction
▪ About This Course
▪ Introductions
▪ About Cloudera
▪ About Cloudera Educational Services
▪ Course Logistics
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-22
Logistics
▪ Class start and finish time
▪ Lunch
▪ Breaks
▪ Restrooms
▪ Wi-Fi access
▪ Virtual machines
Your instructor will give you details on howto access the course materials for the class
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01-23
An Overview of the Cloud withClouderaChapter 2
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-2
Chapter Topics
An Overview of the Cloud with Cloudera
▪ Cloud Fundamentals
▪ Evolution from the Data Center to the Cloud
▪ Amazon Web Services (AWS)
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-3
Cloud Fundamentals Objectives
In this training, you will learn
▪ Fundamentals of cloud computing─ Key concepts of Amazon Web Services (AWS)─ Prerequisites to work with Cloudera products and services
▪ Step-by-step demonstrations and exercises
▪ History of Amazon Web Services
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-4
Chapter Topics
An Overview of the Cloud with Cloudera
▪ Cloud Fundamentals
▪ Evolution from the Data Center to the Cloud
▪ Amazon Web Services (AWS)
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-5
The Evolution to Big Data
▪ The need to organize data─ Analog era
▪ Digital era─ Spreadsheets, databases and even bigger databases
▪ Information explosion
▪ Big Data era─ Data beyond a manageable size
─ Single computing device─ Parallel computing
▪ Multiple machines in a data center
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-6
Big Data Era in the Corporate Data Center
▪ Required─ Large number of machines working in parallel─ Sizeable data repositories
▪ Potential drawbacks─ Large upfront capital expense─ Requires planning and approval─ May be over- or under-utilized─ Virtualization added flexibility
▪ Other options are available
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-7
The Cloud
▪ What is cloud computing?─ Someone else’s computers─ In charge of the infrastructure─ Offered as services
▪ Different modalities─ Infrastructure-as-a-Service (IaaS)
─ Amazon Web Services─ Platform-as-a-Service (PaaS)
─ Heroku or OpenShift─ Software-as-a-Service (SaaS)
─ Cloudera CDP
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-8
On-premises and Cloud Offerings
▪ Resource administration
▪ On-premises─ You manage everything
What IaaS PaaS SaaS
Applications You You AWS
Data You You AWS
Operating system You AWS AWS
Virtualization AWS AWS AWS
Servers AWS AWS AWS
Storage AWS AWS AWS
Networking AWS AWS AWS
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-9
Advantages of Cloud Computing (1)
▪ Flexible environment─ Adapts to your needs
▪ Wide number of services available in AWS
▪ Pay-as-you-go approach─ Cost savings─ Operating expense
─ Not a capital expense
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-10
The Advantages of Cloud Computing (2)
▪ Near-infinite scalability─ Test and develop with a subset of resources
─ Grow as needed─ Cost is more commonly the limit
▪ Worldwide availability
▪ Focus on your applications and clusters─ Not infrastructure
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-11
Comparing Corporate Data Center and Cloud
▪ Analogy: Using a car
Modality Similar to Details
Owning Data center Requires large up-front investment
Renting Cloud On-demandPay only for what you use
Leasing Cloud Longer-term commitmentTake advantage of discounts
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-12
Chapter Topics
An Overview of the Cloud with Cloudera
▪ Cloud Fundamentals
▪ Evolution from the Data Center to the Cloud
▪ Amazon Web Services (AWS)
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-13
Amazon Web Services (AWS)
▪ Prelude started in early 2000s─ Work began on merchant.com─ Intended for use by other retailers
▪ Realization─ Better service decoupling needed─ Vision paper published in 2003
▪ Officially launched in 2006─ Storage was the first service
─ Simple Storage Service (S3)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-14
Amazon Web Services (AWS)
▪ Largest by market cap─ Competitors catching up
▪ Provides infrastructure as a service─ Compute, storage, networking, databases, security related...
▪ Set up and run clusters─ In the cloud
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-15
Chapter Topics
An Overview of the Cloud with Cloudera
▪ Cloud Fundamentals
▪ Evolution from the Data Center to the Cloud
▪ Amazon Web Services (AWS)
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-16
Essential Points
▪ Amazon Web Services (AWS)─ First commercial cloud─ Largest by market cap
▪ Concepts required─ Cloudera products and services─ General introduction to the cloud
▪ Infrastructure─ Required for processing large amounts of data─ Data center
─ Potential drawbacks
▪ Cloud provides an alternative─ Flexible, cost-effective, and near-infinite scalability
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02-17
Getting Started with the CloudChapter 3
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-2
Chapter Topics
Getting Started with the Cloud
▪ Getting Started with AWS
▪ AWS Management Console
▪ AWS Account and Resource Identifiers
▪ AWS Command Line Interface (CLI)
▪ Hands-On Exercise: Accessing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-3
Chapter Objectives
In this chapter, you will learn
▪ How to create an AWS account
▪ How to access the AWS management console
▪ How to obtain the unique AWS identifiers for your account
▪ How to work with the AWS API via the command line
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-4
Getting Started with AWS
▪ Begin by creating a primary AWS account─ Also known as a root account
▪ This account has full administrative privileges─ Access all services─ Account and administrative tasks─ Create resources and additional accounts
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-5
Creating the Primary Account
▪ Create an AWS account─ https://aws.amazon.com/
▪ Click Create an AWS Account
▪ Provide account information─ Email─ Password─ Account name
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-6
Account Information
▪ Contact information
▪ Phone number─ Reachable immediately
▪ Payment method─ Credit card, debit card, EFT, ACH, or SEPA (Europe)
▪ Support plan─ Free or paid
▪ Activate the account
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-7
Chapter Topics
Getting Started with the Cloud
▪ Getting Started with AWS
▪ AWS Management Console
▪ AWS Account and Resource Identifiers
▪ AWS Command Line Interface (CLI)
▪ Hands-On Exercise: Accessing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-8
AWS Management Console
▪ Web application─ Manage AWS services─ Account information
─ Billing
▪ Starting point─ Work with resources─ Account related tasks
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-9
Management Console Home Screen
▪ Menu bar─ Customizable─ Links of interest
▪ Search area─ Find services
▪ List of all services─ Divided by type
▪ Additional resources─ Simple wizards and automated workflows
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-10
AWS Management Console
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-11
A Service from the Management Console
▪ Elastic Compute Cloud (EC2) as example─ Virtual machines in the cloud
▪ Left menu─ Navigation pane
▪ Resources in use─ Instances in the current region
▪ Launch instance button
▪ Health information
▪ Additional information
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-12
A Service from the Management Console
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-13
Chapter Topics
Getting Started with the Cloud
▪ Getting Started with AWS
▪ AWS Management Console
▪ AWS Account and Resource Identifiers
▪ AWS Command Line Interface (CLI)
▪ Hands-On Exercise: Accessing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-14
Account Identifiers
▪ Each account has two unique identifiers (ID)─ Account ID
─ Used in Amazon Resource Names (ARN)─ Canonical User ID
─ Storage services (S3)
▪ Available─ My Account─ My Security Credentials
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-15
Amazon Resource Names (ARN)
▪ Uniquely identify resources across AWS
▪ A sample ARN format
arn:partition:service:region:account-id:resource-id
▪ Components and values vary by service─ Paths allowed
▪ The ARN for a storage resource
arn:aws:s3:::cloudfundamentals-dl
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-16
Account ID
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-17
Canonical User ID
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-18
Chapter Topics
Getting Started with the Cloud
▪ Getting Started with AWS
▪ AWS Management Console
▪ AWS Account and Resource Identifiers
▪ AWS Command Line Interface (CLI)
▪ Hands-On Exercise: Accessing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-19
AWS Command Line Interface (CLI)
▪ Unified tool─ Manage all AWS services─ Command-line
▪ Text commands─ Ability to script─ Share and version
▪ Requires credentials
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-20
AWS Command Line Interface (CLI)
▪ Available─ Unix-like shells
─ Linux, macOS, or Unix─ Windows
─ CMD and PowerShell─ Remotely
─ PuTTY, SSH, or AWS Systems Manager
▪ Written in Python
▪ Open-source─ https://github.com/aws/aws-cli
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-21
Command Line Interface (CLI)
▪ Install the CLI
▪ Execute and provide the necessary parameters
$ ./awsusage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]To see help text, you can run:
aws help aws <command> help aws <command> <subcommand> help
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-22
Chapter Topics
Getting Started with the Cloud
▪ Getting Started with AWS
▪ AWS Management Console
▪ AWS Account and Resource Identifiers
▪ AWS Command Line Interface (CLI)
▪ Hands-On Exercise: Accessing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-23
Hands-On Exercise: Accessing the Amazon Cloud
▪ In this exercise, you will create an account in AWS, access the managementconsole, and explore a service
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-24
Chapter Topics
Getting Started with the Cloud
▪ Getting Started with AWS
▪ AWS Management Console
▪ AWS Account and Resource Identifiers
▪ AWS Command Line Interface (CLI)
▪ Hands-On Exercise: Accessing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-25
Essential Points
▪ Start by creating an account─ Provide request information─ Activation required
▪ Access via the management console─ Web application─ Access to all services
▪ Individual screens for services
▪ Account ID and canonical user id
▪ Amazon Resource Name (ARN)
▪ Command-line (CLI)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03-26
Estimating, Managing, andMonitoring CostsChapter 4
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-2
Chapter Topics
Estimating, Managing, and Monitoring Costs
▪ Cloud Economics: Understanding Costs
▪ Estimating Cost
▪ Controlling and Viewing Costs
▪ Hands-On Exercise: Estimating and Viewing Costs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-3
Chapter Objectives
In this chapter, you will learn
▪ How infrastructure-pricing works in the cloud
▪ How to estimate cost of infrastructure
▪ How to manage infrastructure cost
▪ How to monitor cost
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-4
Cloud Economics: Understanding and Managing Cost
▪ Cloud changes how cost works─ Pay only for what you use
─ Operating expense─ Scale up and down as needed
▪ Costs in the cloud─ Computing, storage, bandwidth, and managed services─ Some services are free
▪ Important─ Estimate, monitor, and control cost
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-5
Computing Costs
▪ Compute power in the cloud (virtual machines)─ Amazon Elastic Cloud Compute (EC2)─ Tends to be a high cost
▪ Pay for running instances─ Resources
─ CPU, memory, and storage─ Different types of instances
─ Turn off when not in use─ Pay for storage
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-6
Amazon EC2 Payment Options
▪ Different pricing schemas─ Free tier to try─ Requirements and commitment
─ Opportunity for cost savings
▪ Instance pricing
On-demand Pay for use, with no long term commitment
Reserved Significant discount, requires upfront payment and commitment
Spot Deeper discount on spare capacity, but not guaranteed
Dedicated Physical EC2 server dedicated for your use
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-7
AWS Storage Costs
▪ Types of storage─ Attached to instances (virtual machines)
─ Instance store and block-level storage─ Object storage
▪ Charges vary─ Pay for what you use
─ Tiered, with discounts for larger utilization─ Provisioned capacity─ Storage medium
─ Magnetic (HDD) or solid state (SSD)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-8
Costs of Various Storage Types
Type How it is paid
Instance store Local storage at no extra cost
Not available for all machines
Potential drawbacks
Block store Billed by gigabyte-month (GB/m)
Provisioned capacity
Object store Pay for what you use
Tiered pricing
Location
Access frequency
Stand-alone
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-9
Bandwidth Costs
▪ Free─ Inbound─ Between virtual machines
─ Within the same geographical area (availability zone)
▪ Associated cost─ Outbound data─ Between AWS geographic regions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-10
Managed Services Costs
▪ Pay for a managed service─ Infrastructure─ Software
▪ License─ Bring your own license (BYOL)─ Included in the cost
▪ Examples─ Relational Database Service (RDS)─ DynamoDB
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-11
Other Costs
▪ Many different types of costs─ VPN connections─ Metrics─ Requests─ Queries
▪ High granularity
▪ Inspect costs of each service carefully
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-12
Chapter Topics
Estimating, Managing, and Monitoring Costs
▪ Cloud Economics: Understanding Costs
▪ Estimating Cost
▪ Controlling and Viewing Costs
▪ Hands-On Exercise: Estimating and Viewing Costs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-13
Estimating Cost
▪ Unlimited budget?─ Not a good practice
▪ Estimate costs upfront─ Based on your known requirements
─ Projections─ Fees may vary based on usage
▪ Tools available from AWS to estimate cost─ Simple Monthly Calculator
─ Legacy─ Pricing Calculator
▪ Estimate, save, export, and share
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-14
AWS Simple Monthly Calculator
▪ Select one service─ Configuration─ Commitment
─ Not available for all services
▪ Add all other services
▪ Get an estimate
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-15
AWS Simple Monthly Calculator
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-16
AWS Pricing Calculator
▪ Estimate prices in two ways─ Quick estimate─ Advanced estimate
▪ Steps to follow─ Select and configure each service─ Add other services─ Get an estimated cost of the infrastructure in the cloud
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-17
AWS Pricing Calculator
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-18
Chapter Topics
Estimating, Managing, and Monitoring Costs
▪ Cloud Economics: Understanding Costs
▪ Estimating Cost
▪ Controlling and Viewing Costs
▪ Hands-On Exercise: Estimating and Viewing Costs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-19
Controlling and Viewing Cost
▪ Cost management tools─ Explore cost─ Set budgets─ Create alarms─ Reports─ Account billing-related
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-20
AWS Cost Explorer
▪ View, understand, and manage your costs─ Time intervals─ Filter and drill down
▪ Using the cost explorer tool is free─ Charge associated with programmatic calls (API)
▪ Reports─ Save and share
▪ Recommendations
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-21
AWS Cost Explorer
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-22
Budgets
▪ Monitor cost
▪ Create custom budgets─ Alerts─ Notified when exceeded
▪ Types of budgets─ Cost─ Usage─ Reservation─ Savings plan
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-23
Budgets
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-24
Chapter Topics
Estimating, Managing, and Monitoring Costs
▪ Cloud Economics: Understanding Costs
▪ Estimating Cost
▪ Controlling and Viewing Costs
▪ Hands-On Exercise: Estimating and Viewing Costs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-25
Hands-On Exercise: Estimating and Viewing Costs
▪ In this exercise, you will estimate the cost of infrastructure in the cloud, viewcost, and create a budget
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-26
Chapter Topics
Estimating, Managing, and Monitoring Costs
▪ Cloud Economics: Understanding Costs
▪ Estimating Cost
▪ Controlling and Viewing Costs
▪ Hands-On Exercise: Estimating and Viewing Costs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-27
Essential Points
▪ Cost in the cloud─ Operating expense
▪ Pay for─ Computing, storage, and bandwidth─ Managed services
▪ Estimate─ Simple monthly calculator─ Pricing calculator
▪ Monitor and control cost─ Cost explorer─ Budgets
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04-28
Understanding Cloud Security:Amazon Web ServicesChapter 5
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-2
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-3
Chapter Objectives
In this chapter, you will learn
▪ The steps required to secure your environment in the cloud
▪ Which actions your security credentials can perform
▪ How to provide permissions using policies
▪ How to use identity and access management (IAM) to provide limitedpermissions
▪ How to provide programmatic access using access keys
▪ How to create key-pairs to access virtual machines
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-4
Security in the Cloud
▪ Cloud data security is a top concern─ Highest priority at AWS─ High priority at your company too
▪ Shared responsibility model─ AWS protects the infrastructure
─ Security of the cloud─ Customer security responsibilities
─ Security in the cloud
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-5
Security Topics
▪ Security Credentials
▪ Permissions and Policies
▪ Identity and Access Management (IAM)─ Users, groups, and roles
▪ Access Keys
▪ Key Pairs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-6
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-7
Security Credentials
▪ Proof of identity
▪ Who you are─ Authentication
▪ Whether you have permission─ Authorization
▪ Some actions do not require security credentials─ Public access
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-8
Credential Types
▪ Email and password─ Access via management console─ Registration
─ Root user credentials
▪ Access keys─ Programmatic access─ Command-line
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-9
Root Credentials
▪ Required for all AWS accounts
▪ Provide full access─ Do not share them─ Access cannot be limited
▪ Tasks that require root credentials─ Change support plan─ Billing and cost management─ Restoring user permissions─ Close account─ Other configuration settings
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-10
Root Credentials
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-11
Root Credentials Recommendations
▪ Do not use for everyday tasks
▪ Enable multi-factor authentication (MFA)─ Virtual device, U2F security key, hardware device, or SMS text message
▪ Delete root access keys─ Disable
▪ Recommendation─ Use identity and access management (IAM) instead
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-12
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-13
Permissions and Policies
▪ Who or what has access to which resources
▪ Permission─ Specify access
─ Allow or deny
▪ Policy─ Defines set of permissions─ Associates
─ An identity, like ClouderaJoe─ A resource, like EC2
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-14
Policy Types
▪ Identity-based policies
▪ Resource-based policies
▪ Permission boundaries
▪ Service control policy (SCP)
▪ Access control lists
▪ Session policies
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-15
Comparing Identity-based and Resource-based Policies
Identity-based Resource-based
ClouderaJoe▪ Can List, Read▪ On resource X
Resource X▪ ClouderaJoe▪ List, Read
ClouderaJane▪ Can Write▪ On resource X, Y
Resource X▪ ClouderaJane▪ Denied access
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-16
Create Policy Screen
▪ Two ways to create a policy in AWS─ Visual editor─ JSON
▪ Steps in the visual editor─ Select a service, actions, and resources─ Request conditions─ Additional permissions─ Name and description─ Review summary
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-17
Visual Editor
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-18
JSON Policy
▪ Policy specified as a JSON text
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:ListAllMyBuckets", "s3:ListJobs" ], "Resource": "*" } ]}
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-19
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-20
Identity Access Management (IAM)
▪ Service to securely control access─ Resources─ Use instead of root credentials
─ Root user creates IAM identities
▪ Manage access─ Create policies─ Attach to an IAM identity
─ IAM user─ IAM role─ IAM group
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-21
Identity and Access Management (IAM)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-22
IAM User
▪ Represents a person or an AWS service─ Interact with AWS
─ Management console─ Programmatic requests
▪ Grant permission─ Add to a group
─ Permission policies attached
▪ Clone─ Member of same group─ Same policies
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-23
IAM User
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-24
IAM Groups
▪ Collection of IAM users─ Specify permissions
▪ Associates policies with users in the group─ Multiple users at a time
▪ Simplifies user management─ Add and remove users from a group─ Provide or take away permissions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-25
IAM Groups
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-26
IAM Roles
▪ Entity that defines a set of permissions for making service requests─ Identity with permission policies─ Specify access to resources
▪ Does not have any credentials─ Password or access keys
▪ Assummed by anyone who needs the role─ Take on permissions temporarily
─ To complete a task
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-27
IAM Roles
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-28
Cross-Account Role
▪ Grant access to your resources─ To another AWS account
▪ Create role
▪ Specify account─ Account ID─ External ID
▪ Create policy─ Attach permissions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-29
Cross Account Role
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-30
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-31
AWS Access Keys
▪ Programmatic access and command-line─ Access Key ID─ Secret Access Key
▪ Used together─ Like a username and password─ CLI and API calls
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-32
AWS Access Keys
▪ Safeguard them─ Used to create resources─ Which you will have to pay for
─ Even if stolen─ Delete if you suspect keys have been stolen
▪ Limit─ Two keys per IAM user
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-33
AWS Access Keys
▪ Status─ Newly created
─ Active─ Deactivated
─ If no longer in use
▪ Access key age─ Rotated for security
▪ Last activity
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-34
An Example Access Key
Access Key ID Secret Access Key
AKIAIOSFODNN7EEX4MPSR wJalrXUtnFEMI/K7MDENG/bPxRfiCYE4AMfL34EY
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-35
AWS Access Keys
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-36
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-37
Amazon EC2 Key Pairs
▪ EC2 uses public-key cryptography
▪ Encrypt and decrypt login information─ Securely access EC2 instances
─ Without a password─ Secure Shell (SSH)
▪ Key pair─ Public key─ Private key
▪ Not related to access keys
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-38
Public Key
▪ Digital signature
▪ Sample public key
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQClKsfkNkuSevGj3eYhCe53pcjqP3maAhDFcvBS7O6Vhz2ItxCih+PnDSUaw+WNQn/mZphTk/a/gU8jEzoOWbkM4yxyb/wB96xbiFveSFJuOp/d6RJhJOI0iBXrlsLnBItntckiJ7FbtxJMXLvvwJryDUilBMTjYtwB+QhYXUMOzce5Pjz5/i8SeJtjnV3iAoG/cQk+0FzZqaeJAAHco+CY/5WrUBkrHmFJr6HcXkvJdWPkYQS3xqC0+FmUZofz221CBt5IMucxXPkX4rWi+z7wB3RbBQoQzd8v7yeb7OzlPnWOyN0qFU0XA246RA8QFYiCNYwI3f05p6KLxEXAMPLE my-key-pair
* Modified for displayCopyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-39
Private Key
▪ Validate the signature
▪ Sample private key
-----BEGIN RSA PRIVATE KEY-----MIIEowIBAAKCAQEAh1PkSPTC5xnpi6fgU9Wz+mQy4HcXM96f9Vxj4gaEaOCqao5L1gwYEHLSBvXeMG9Ja1rBPqdR8hFM9tHDJEy8A22OReXuRays8NfaTRUdzFGMkJX7wCG+qgSOg6yIwgJVsdF4Y3eUH9cPewRk3UMr21NBKayhLVKc3PHz5/XlsXbmCA27wETaHnlF1i/WZHaxUc0YsuRzE8qMyMUATllUITgJTkoGYsu8XC/qocou3v0NQAWM/nGdyhaoFWO/haGklE06RgZf6G9UswlsLttI/+wfpVUKYFiCOS1fcKnjixSEky6mYrUqwo10bq5L8+tP3hOj4Uki7jlA5CkRQSyrwIDAQABAoIBAB3Go7A5yriyYa4r9LcuHMNSnGG41fwcGKvjdDAWZxCz3iRA1Sfa+NeMF9eMj0vwmx+4hh5NkxS2kmVvOFNkLSw+EDdD6HQZi3N75q9VSYurGQKBgFElLTssHXNSCUeecAcGC+0VElj2eY+oZytGhWQr3lU98e00w2zlwNJnHjZxw4AIaHrvss6YldVmDbatscBJ51rLmXDuyoVpTHOroMp3RMoRFYbHs8iq080zuZ7sUXVXSekeFj1LxDrVN0fOCguHQL0OKFmH5hgWWTT90pqX9pRAoGBALi1/fnyl5CcJPBiBli6mfMELdrO9c6qR7YqLF81csSPIF2IQDBO0xdykvD9h9p76j/MpSkgHqHMI+CXTdcbepbyvCSQ514cxTtJROq2W5l0pWncInmfu198hSCX+g2Us6/yyKq888q3SLLQvjbzPGyy2uBjLV5hk8lBz-----END RSA PRIVATE KEY-----
* Modified for displayCopyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-40
Creating a Key Pair
▪ Several ways to create a key pair─ From the console
─ EC2 navigation pane─ Using the CLI─ A key pair can be imported into AWS
▪ Download immediately after creation─ PEM file─ Cannot retrieve afterwards
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-41
Amazon EC2 Key Pair
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-42
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-43
AWS Key Management Service (KMS)
▪ Centralized location to manage encryption keys─ Create, describe, list, enable, disable, and delete master keys─ Customer or AWS managed keys
▪ Control access to your data─ Encrypt and decrypt data stored in AWS
─ Managed encryption─ Audit usage
▪ Integrated with multiple AWS services
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-44
AWS Key Management Service (KMS)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-45
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-46
Hands-On Exercise: Securing the Amazon Cloud
▪ In this exercise, you will secure your cloud account using security credentials,permission policies, IAM identities, access keys, and key pairs
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-47
Chapter Topics
Understanding Cloud Security: Amazon Web Services
▪ Security in the Cloud
▪ Security Credentials
▪ Permissions and Policies
▪ Identity Access Management (IAM)
▪ AWS Access Keys
▪ Amazon EC2 Key Pairs
▪ AWS Key Management Service (KMS)
▪ Hands-On Exercise: Securing the Amazon Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-48
Essential Points (1)
▪ Credentials are the proof of identity─ Authentication─ Username/password or access keys
▪ Policy defines set of permissions─ Access to resources─ Allow or deny
▪ IAM to control access─ Resources─ Users, groups, and roles
─ Cross account role
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-49
Essential Points (2)
▪ Access keys used for programmatic access─ Access key ID and secret access key
▪ Key pairs to login to EC2 instances─ Public and private key
▪ KMS to create and control encryption keys
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 05-50
Regions and Availability ZonesChapter 6
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-2
Chapter Topics
Regions and Availability Zones
▪ Picking a Location: Regions and Availability Zones
▪ Hands-On Exercise: Working with Regions and AZs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-3
Chapter Objectives
In this chapter, you will learn
▪ How AWS provides wordwide coverage and resiliency using regions andavailability zones
▪ How to select and change regions
▪ How to check the health of services within a region
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-4
Picking a Location for Your Cloud Infrastructure
▪ Geographic location is an important consideration─ Having clusters close to your customers has many advantages
─ Performance, compliance, disaster recovery, and more
▪ Implementation of local clusters is difficult─ Multiple corporate data centers
─ Longer planning process─ Lengthier provisioning cycles
▪ The cloud provides a better way
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-5
AWS Global Infrastructure
▪ The cloud brings your clusters closer to your customers─ Multiple locations available worldwide─ Known as regions
*
* Source: AWS documentationCopyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-6
Region
▪ A region is a separate geographical area─ Northern Virginia, Oregon, Ireland, Tokyo...─ Most regions are readily available, but some require opt-in
▪ Each region hosts a collection of resources─ Independent from resources in other regions─ Resources in one region do not exist in other regions
─ There are some exceptions, like IAM
▪ View and work with resources per region
▪ A region is comprised of one or more availability zones
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-7
Availability Zones (AZ)
▪ Isolated location within a region─ Provides maximum resiliency─ One or more AZs per region
▪ Each AZ belongs to a single region─ Made up of multiple data centers, typically three
▪ Relationship between regions and availability zones
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-8
Regions and Availability Zones
▪ Network traffic among regions uses the AWS global network backbone─ Cost associated with data transfer
▪ AZs within a region are connected via low-latency links─ Data transfer is free within AZ
─ Using private IP─ Not free between AZs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-9
Region and Availability Zone Identifiers
▪ Regions are represented using an identifier─ Northern Virginia is us-east-1 and Ohio is us-east-2─ Ireland is eu-west-1
▪ AZ uses the region code followed by a letter identifier─ Zone a for Northern Virginia is us-east-1a
▪ AZ naming varies by account for even distribution and load─ us-east-1a in one account may be us-east-1b in another
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-10
Creating Resources in a Region and Availability Zone
▪ Creating resources in a specific AZ may be restricted if capacity is low─ Priority given when you already have resources in the AZ─ Possible to create resources in a different AZ
▪ Pick the region that works best for you─ Cloudera products and services do not support all regions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-11
Regions and Availability Zones
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-12
Chapter Topics
Regions and Availability Zones
▪ Picking a Location: Regions and Availability Zones
▪ Hands-On Exercise: Working with Regions and AZs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-13
Hands-On Exercise: Working with Regions and AZs
▪ In this exercise, you will identify your current region, change region, and checkthe health of services per region
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-14
Chapter Topics
Regions and Availability Zones
▪ Picking a Location: Regions and Availability Zones
▪ Hands-On Exercise: Working with Regions and AZs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-15
Essential Points
▪ Being close to your customers has many advantages─ A potential challenge with corporate data centers
▪ The cloud allows you to deploy resources worldwide─ In different geographic locations, called regions
▪ Each region is comprised of one or more availability zones─ Isolated, for resiliency
▪ Pick the region that works best for your scenario
▪ Create resources in a supported region
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 06-16
NetworkingChapter 7
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-2
Chapter Topics
Networking
▪ Private Networks in the Cloud: VPCs and Subnets
▪ External Networking: Route 53, Elastic IPs, and ELB
▪ Hands-On Exercise: Configuring Your Network
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-3
Chapter Objectives
In this chapter, you will learn
▪ How to create your private network in AWS using a VPC
▪ How to segment your VPC using subnets
▪ How AWS provides friendly names to IP addresses
▪ How to obtain static IP addresses
▪ How AWS balances load
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-4
Your Private Network in AWS
▪ Applications co-exist in the cloud─ Isolation is important
─ Security concerns
▪ Achieved with─ Virtual Private Cloud (VPC)
─ Subnets
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-5
Amazon Virtual Private Cloud (VPC)
▪ Virtual network─ Dedicated to your account─ Logically isolated from other VPCs
─ Even within your own account─ Spans all availability zones in a region
▪ Similar to an on-premises network─ More automation and scale
▪ Default VPC per region─ Create your own
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-6
Amazon Virtual Private Cloud (VPC)
▪ Launch resources into a VPC─ Machines, databases, and storage
▪ IP address associated to resources─ Range specified using CIDR block
▪ Secure resources─ Network ACLs─ Security groups
▪ Segmented into one or more subnets
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-7
Subnets
▪ Logical subdivision of a VPC─ One per availability zone
─ Cannot span zones
▪ Specify non overlapping IP ranges─ Within the VPC
▪ Two types of subnets─ Private, without internet access─ Public, with access to the internet
▪ Different scenarios─ Combinations of private and public subnets
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-8
VPC and Subnets
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-9
Chapter Topics
Networking
▪ Private Networks in the Cloud: VPCs and Subnets
▪ External Networking: Route 53, Elastic IPs, and ELB
▪ Hands-On Exercise: Configuring Your Network
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-10
External Networking
▪ Location in the internet─ Represented as an IP address
─ 13.57.68.67 points to a web site─ Convenient for machines
▪ Problem for humans─ Memorizing IP addresses is not practical
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-11
Domain Name System (DNS)
▪ Hierarchical and decentralized naming system─ Computers, services, and other resources─ Connected to the internet
▪ Phonebook of the internet
▪ Translates an IP address to a host name─ 13.57.68.67 corresponds to www.cloudera.com─ Human-readable names are useful for services
─ Example: my-ml-cluster.services.cloudera.com
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-12
Amazon Route 53
▪ AWS DNS service─ Designed to work with other AWS services─ Highly available and scalable
▪ Three functions─ Domain registration─ DNS routing─ Health checking
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-13
Amazon Route 53 Routing and Health Checking
▪ Route end-users─ Internet-facing applications and services─ Infrastructure running in AWS
▪ Policy routing─ Direct traffic based on rules
▪ Health checks for DNS failover─ CloudWatch metrics
▪ Balance load─ High-demand and fault-tolerance
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-14
Amazon Elastic Load Balancer (ELB)
▪ Load-balancing service─ Even distribution of load─ Applications and services
▪ Automatically distributes incoming traffic─ Across multiple resources and services
▪ Fault tolerance─ Across availability zones
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-15
Types of Load Balancers
▪ Application load balancer─ HTTP and HTTPS
▪ Network load balancer─ TCP, UDP, and TLS
▪ Classic load balancer─ Legacy applications
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-16
Load Balancer
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-17
Elastic IPs (EIP)
▪ Static IPv4 address─ Reachable from the internet
▪ Associated with your account─ Allocate
▪ Map to an instance─ Remap to mask failure
▪ Free as long as they are in use
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-18
Elastic IPs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-19
Chapter Topics
Networking
▪ Private Networks in the Cloud: VPCs and Subnets
▪ External Networking: Route 53, Elastic IPs, and ELB
▪ Hands-On Exercise: Configuring Your Network
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-20
Hands-On Exercise: Configuring Your Network
▪ In this exercise, you will create a virtual private network and create subnetswithin a VPC
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-21
Chapter Topics
Networking
▪ Private Networks in the Cloud: VPCs and Subnets
▪ External Networking: Route 53, Elastic IPs, and ELB
▪ Hands-On Exercise: Configuring Your Network
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-22
Essential Points
▪ Applications co-exist in the cloud─ Isolate using a VPC and subnets
▪ Memorizing IP addresses is not practical─ DNS to translate host names
─ Into human-readable names─ Route 53
─ Policy routing, health checks, and balance load
▪ Elastic Load Balancer (ELB)─ Even distribution of load─ Fault tolerance
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 07-23
Computing Power in AWSChapter 8
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-2
Chapter Topics
Computing Power in AWS
▪ Computing Power with Elastic Cloud Compute (EC2)
▪ Running Containers with the Elastic Kubernetes Service (EKS)
▪ Monitoring EC2 Instances with CloudWatch Logs
▪ Hands-On Exercise: Launching EC2 Instances
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-3
Chapter Objectives
In this chapter, you will learn
▪ Which types of virtual machines are available in AWS
▪ How to launch and configure a virtual machine
▪ How to manage a virtual machine
▪ The similarities and differences between virtual machines and containers
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-4
Virtual Machine
▪ Emulation of a computer system─ Managed by a hypervisor
▪ Virtual machine inside a physical machine (host)
▪ Uses physical host resources─ CPU, memory, and storage
▪ Full copy of the operating system
▪ Multiple machines per host
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-5
Compute Power in the Cloud
▪ Virtual machines
▪ Used for applications and services─ Process large amounts of data
▪ Massively parallel processing engines─ Apache Impala or Apache Spark─ Tens, hundreds, or even thousands of machines
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-6
Amazon Elastic Cloud Compute (EC2)
▪ Scalable computing capacity─ Virtual machines
─ When you need them, in minutes
▪ Create instances─ Different ways to launch an instance
▪ Execute jobs or process data─ Multiple scenarios
▪ Deprovision when no longer required
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-7
An Instance in the Cloud
▪ Instance lifecycle─ Create, register, launch, deregister, and copy
▪ Instance state─ Pending, running, shutting-down, terminated, stopping, or stopped
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-8
Amazon Machine Images (AMI)
▪ Multiple images available─ Known as an AMI─ Different configurations and operating systems
─ Red Hat, CentOS, Windows, and more
▪ Preconfigured services and applications available
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-9
Instance Type Families
▪ Instances are grouped together─ Based on their purpose─ Known as families
▪ Varying combinations of CPU, memory, storage, and networking capacity─ 1 to 96 vCPUs─ 2 to 488 GiB RAM─ Other configurations
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-10
Instance Type Families
▪ General purpose
▪ Compute optimized
▪ Memory optimized
▪ Accelerated computing
▪ Storage optimized
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-11
General Purpose Family
▪ Several types of instances─ A1, T3, T3a, T2, M5, M5a, M5n, and M4
▪ T3 is the next generation burstable instance type─ Baseline level CPU performance─ Ability to increase CPU performance (burst)
─ Short periods of time─ Balance of compute, memory, and network resources
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-12
Available Sizes for the T3 Instance Type
Instance vCPU CPUCredits/hour
Mem(GiB)
Storage NetworkPerformance(Gbps)
t3.nano 2 6 0.5 EBS-Only Up to 5
t3.micro 2 12 1 EBS-Only Up to 5
t3.small 2 24 2 EBS-Only Up to 5
t3.medium 2 24 4 EBS-Only Up to 5
t3.large 2 36 8 EBS-Only Up to 5
t3.xlarge 4 96 16 EBS-Only Up to 5
t3.2xlarge 8 192 32 EBS-Only Up to 5
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-13
Creating An EC2 Instance
▪ Multiple steps to launch an instance manually from the console─ Step 1: Choose an Amazon Machine Image (AMI)─ Step 2: Choose an Instance Type─ Step 3: Configure Instance Details─ Step 4: Add Storage─ Step 5: Add Tags─ Step 6: Configure Security Group─ Step 7: Review Instance Launch
▪ Can launch via CLI or CloudFormation
▪ Some Cloudera products and services─ Launch instances automatically as needed
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-14
Creating An EC2 Instance
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-15
An EC2 Instance Information
▪ Instance name, ID, type, and state─ AMI ID
▪ Availability zone
▪ Key pair name and security groups
▪ VPC and subnet ID
▪ Public and private IP and DNS─ One or more network interfaces
▪ Root and block devices
▪ And more
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-16
An EC2 Instance Information
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-17
Connecting to EC2 Instances with Secure Shell (SSH)
▪ SSH─ A network protocol─ Secure way to connect to a computer or server─ Over an unsecured network
▪ Connect to Linux instances─ Useful for system administrators
▪ Command to connect
$ chmod 400 cf-kp.pem$ ssh -i "cf-kp.pem" root@ec2-18-225-0-136.us-east-2.compute.amazonaws.com
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-18
Chapter Topics
Computing Power in AWS
▪ Computing Power with Elastic Cloud Compute (EC2)
▪ Running Containers with the Elastic Kubernetes Service (EKS)
▪ Monitoring EC2 Instances with CloudWatch Logs
▪ Hands-On Exercise: Launching EC2 Instances
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-19
Containers
▪ A way of packaging software─ Code, libraries, and dependencies─ Bundled together
▪ Only the OS is virtualized─ Share resources
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-20
Comparing Containers and Virtual Machines
▪ Virtual machine provides a full abstraction of a machine─ Container provides an abstract operating system─ All containers in a host share OS kernel
▪ Host can handle larger number of containers─ Than equivalent virtual machines
▪ Several advantages of containers─ Predictable, repeatable, and immutable─ Lightweight─ Faster startup
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-21
Kubernetes (k8s)
▪ Open-source container orchestration system─ Automating application deployment, scaling, and management─ Across clusters of hosts
▪ Containers in the cloud─ Package and deploy services─ Scale up or down quickly
─ Based on demand
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-22
Amazon Elastic Kubernetes Service (EKS)
▪ AWS-managed k8s service─ Certified Kubernetes conformant
─ Works with existing tools and services
▪ Run management infrastructure across multiple AZs─ Eliminate single points of failure─ Automatically detects and replaces unhealthy nodes
▪ Secure by default
▪ Fast
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-23
Chapter Topics
Computing Power in AWS
▪ Computing Power with Elastic Cloud Compute (EC2)
▪ Running Containers with the Elastic Kubernetes Service (EKS)
▪ Monitoring EC2 Instances with CloudWatch Logs
▪ Hands-On Exercise: Launching EC2 Instances
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-24
Amazon CloudWatch Logs
▪ Centralize logs from all systems, applications, and services─ In a single, highly scalable service─ View events and search for specific codes or patterns
▪ Monitor, store, and access log files─ EC2 instances─ Route 53, CloudTrail, and other sources
▪ Start with free tier─ Paid tier, cost based on metrics and API calls
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-25
Chapter Topics
Computing Power in AWS
▪ Computing Power with Elastic Cloud Compute (EC2)
▪ Running Containers with the Elastic Kubernetes Service (EKS)
▪ Monitoring EC2 Instances with CloudWatch Logs
▪ Hands-On Exercise: Launching EC2 Instances
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-26
Hands-On Exercise: Launching EC2 Instances
▪ In this exercise, you will create, configure, and launch an EC2 instance
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-27
Chapter Topics
Computing Power in AWS
▪ Computing Power with Elastic Cloud Compute (EC2)
▪ Running Containers with the Elastic Kubernetes Service (EKS)
▪ Monitoring EC2 Instances with CloudWatch Logs
▪ Hands-On Exercise: Launching EC2 Instances
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-28
Essential Points
▪ EC2 is scalable computing capacity─ Instance families and sizes─ Multiple configurations available for your needs
▪ Create machines as required─ Deprovision when no longer required
▪ Different cost structures based on commitment and needs─ On-demand, reserved, spot, and dedicated
▪ Run containers in the cloud with EKS
▪ CloudWatch Logs to monitor, store, and access EC2 log files
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 08-29
Protecting Your Infrastructure:Security Groups & Network ACLsChapter 9
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-2
Chapter Topics
Protecting Your Infrastructure: Security Groups & NetworkACLs
▪ Protecting Your EC2 Instances: Security Groups
▪ Protecting Your Subnets: Network ACLs
▪ Comparing Security Groups and Network ACLs
▪ Hands-On Exercise: Setting Up Security Groups and Network ACLs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-3
Chapter Objectives
In this chapter, you will learn
▪ How to secure EC2 instances using security groups
▪ How to secure subnets using network access control lists (ACL)
▪ The differences between security groups and network ACLs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-4
Protecting Your EC2 Instances
▪ Securing your instances is a top priority
▪ Possible with a private VPC─ Isolated from the outside world
─ Not from other instances─ Not suitable for most scenarios
▪ Need to control traffic to and from an EC2 instance
▪ Firewall─ Possible to configure per instance
─ Challenges
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-5
Security Groups
▪ Virtual firewall─ Control traffic─ Instance level
▪ No associated cost
▪ Launch an instance─ Assign a security group
─ Custom or default security group─ Associate with a network interface
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-6
Security Group Lifecycle
▪ Create─ Within a VPC─ Name cannot start with sg-
─ Must be unique within the VPC
▪ Rules─ Add, update, and delete
▪ Describe
▪ Delete
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-7
Security Group Rules
▪ Type, protocol, port range, source or destination, and description
▪ Outbound default is to allow all traffic─ Restrict
▪ Stateful
▪ Permissive─ "Allow" rules
─ Most permissive applies─ Cannot create "deny"rules
▪ Multiple security groups on one instance─ Rules aggregated
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-8
Sample Rules For a Security Group
▪ Inbound
Type Protocol Range Source Description
SSH TCP 22 My IP190.7.211.138/32
SSH allowed
MySQL TCP 3306 sg-0a75536f Manager database
▪ Outbound
Type Protocol Range Destination Description
All TCP TCP 0 - 65535 Anywhere0.0.0.0/0, ::/0
All TCP traffic
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-9
Security Group
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-10
Security Group Limits
▪ Default limits
Description Limit
Security groups per network interface 5
Inbound and outbound rules per security group 60
Security groups per region 2500
▪ Possible to request an increase
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-11
Chapter Topics
Protecting Your Infrastructure: Security Groups & NetworkACLs
▪ Protecting Your EC2 Instances: Security Groups
▪ Protecting Your Subnets: Network ACLs
▪ Comparing Security Groups and Network ACLs
▪ Hands-On Exercise: Setting Up Security Groups and Network ACLs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-12
Protecting Your Subnets with Network ACLs
▪ Security groups protect EC2 instances─ May need optional extra layer of security
▪ Network access control list (ACL)─ Firewall to control traffic in and out of a subnet
▪ Each VPC comes with a default network ACL─ Allows all inbound and outbound traffic
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-13
Network ACL Basics
▪ Create custom network ACL─ Denies all inbound and outbound traffic
─ Add rules to allow desired traffic
▪ One network ACL─ Associated with one or more subnets
▪ One subnet─ Associated to only one network ACL
▪ Stateless─ Explicit rule required for inbound and outbound traffic
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-14
Network ACL Rules
▪ Separate inbound and outbound rules─ Allow or deny traffic
▪ Parts of a rule─ Number, type, protocol, port range, source or destination, allow or deny
▪ Processed in order─ From 1 to 32766─ Asterisk (*)
─ If no other rule matches
▪ Rules may affect other AWS services
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-15
Sample Network ACL Rules
▪ Inbound
# Type Protocol Range Source Allow/Deny
100 HTTPS (443) TCP 443 0.0.0.0/0 ALLOW
101 SSH (22) TCP 22 190.7.211.138/32 ALLOW
* ALL Traffic ALL ALL 0.0.0.0/0 DENY
▪ Outbound
# Type Protocol Range Source Allow/Deny
100 Custom ICMP ICMP Echo Request 0.0.0.0/0 DENY
101 ALL TCP TCP 0 - 65535 0.0.0.0/0 ALLOW
* ALL Traffic ALL ALL 0.0.0.0/0 DENY
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-16
Network ACLs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-17
Chapter Topics
Protecting Your Infrastructure: Security Groups & NetworkACLs
▪ Protecting Your EC2 Instances: Security Groups
▪ Protecting Your Subnets: Network ACLs
▪ Comparing Security Groups and Network ACLs
▪ Hands-On Exercise: Setting Up Security Groups and Network ACLs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-18
Comparing Security Groups and Network ACLs
Security Group Network ACL
Instance level Subnet level
Allow only Allow or deny
Stateful Stateless
All rules evaluated Rules processed in order
Applies to associated instances Applies to all instances in the subnet
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-19
Security Groups and Network ACLs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-20
Chapter Topics
Protecting Your Infrastructure: Security Groups & NetworkACLs
▪ Protecting Your EC2 Instances: Security Groups
▪ Protecting Your Subnets: Network ACLs
▪ Comparing Security Groups and Network ACLs
▪ Hands-On Exercise: Setting Up Security Groups and Network ACLs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-21
Hands-On Exercise: Setting Up Security Groups and NetworkACLs
▪ In this exercise, you will create security groups and network ACLs
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-22
Chapter Topics
Protecting Your Infrastructure: Security Groups & NetworkACLs
▪ Protecting Your EC2 Instances: Security Groups
▪ Protecting Your Subnets: Network ACLs
▪ Comparing Security Groups and Network ACLs
▪ Hands-On Exercise: Setting Up Security Groups and Network ACLs
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-23
Essential Points (1)
▪ Security groups─ Virtual firewall
─ Control traffic to and from EC2 instances─ Permissive rules
─ Allow─ Type, protocol, port range, source or destination
─ Stateful
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-24
Essential Points (2)
▪ Network ACLs─ Optional extra layer of security─ Firewall at subnet level─ Rules
─ Allow or deny─ Priority, type, protocol, port range, source or destination
─ Stateless
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 09-25
Storing Files and Objects:Instance Store, EBS and S3Chapter 10
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-2
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-3
Chapter Objectives
In this chapter, you will learn
▪ Which are the storage mechanisms used in AWS to store data
▪ How to store data using local storage for virtual machines
▪ How to store data using block-level storage for virtual machines
▪ How to store data in an NFS file system
▪ How to store data in a large-object storage
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-4
Storage in the Cloud
▪ Storage─ Key component of any system
▪ Virtual machine disks─ OS and general files─ Intermediate location for processing
▪ Data storage─ Raw and processed data
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-5
Different Types of Storage
▪ Local storage─ Physically attached disks to host machines─ Block storage
▪ Network storage
▪ Object storage─ Repository for large amounts of data
─ Data lake─ Backup and archive
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-6
The Storage Options Available in AWS
▪ Instance store or ephemeral
▪ Elastic block store (EBS)
▪ Elastic file system (EFS)
▪ Simple storage service (S3)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-7
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-8
Amazon EC2 Instance Store
▪ Ephemeral storage
▪ Temporary block-level storage for EC2 instances─ Not supported by all instance types
▪ Physically attached disk to host computer
▪ Ideal for temporary storage─ Buffer, cache, or for intermediate processing─ Cost included in the instance
▪ Different volume types and sizes available─ Determined by instance type
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-9
Amazon EC2 Instance Store Lifetime
▪ Attached only on launch─ Cannot detach and attach to another instance
▪ Data persists during instance lifetime─ Including reboot
▪ Data lost─ Disk failure─ Instance stopped or terminated
▪ Do not use for valuable data
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-10
Amazon EC2 Instance Store
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-11
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-12
Amazon Elastic Block Store (EBS)
▪ Block storage service for EC2 instances─ Highly available block level storage volumes
▪ Mount one or many volumes─ Attached to only one instance at a time
▪ Data persists the instance lifetime
▪ Detach─ Attach to other instances
▪ Supports live configuration changes─ Volume size change up to petabytes of data
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-13
Amazon Elastic Block Store (EBS)
▪ Encryption─ At-rest and in-transit
▪ Snapshots─ Back up of critical workloads
▪ Replication─ Within availability zone for resiliency
▪ Volumes for different scenarios─ Price vs. performance
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-14
EBS Volume Types
▪ SSD─ Optimized for transactional workloads with many read and write operations
Type Use Case
General Purpose (gp2) Price performance balance for frequent access workloads
Provisioned IOPS (io1) Highest performance for latency-sensitive workloads
▪ HDD─ Optimized for large workloads at a lower cost
Type Use Case
Throughput Optimized (st1) Low cost volume for frequently accessed workloads
Cold (sc1) Lowest cost for infrequently accessed workloads
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-15
Amazon Elastic Block Store (EBS)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-16
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-17
Amazon Elastic File System (EFS)
▪ NFS file system─ Scalable and fully managed
▪ Network file system─ Access files over the network─ Can be mounted in an on-premises machine
▪ Storage classes─ Standard─ Infrequent access
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-18
Create File System
▪ Configure file system access─ VPC and availability zones─ Security group
▪ Configure optional settings─ Tags, lifecycle management, and encryption─ Throughput mode and performance mode
▪ Review and create
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-19
Amazon Elastic File System (EFS)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-20
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-21
Amazon Simple Storage Service (S3)
▪ S3 is ’storage for the Internet’─ Large object repository
─ Data lake─ Text files, binary files, media, and more
▪ Object storage service─ Infinite scalability and high availability─ Secure, low latency, and cost efficient
▪ Does not need to be attached to a virtual machine
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-22
Amazon Simple Storage Service (S3)
▪ Feature set─ Focuses on simplicity and robustness
▪ Eventual data consistency─ Changes made to files on S3 may not be visible for some period of time─ S3Guard
─ Feature used to address the eventual consistency problem
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-23
S3 Buckets
▪ Bucket is the base storage location─ Similar to a folder
─ Stores objects─ Subfolders
▪ Cannot nest buckets
▪ Region-specific─ Unique name within a region─ Optimize latency, minimize cost, and regulatory purposes
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-24
S3 Objects
▪ Object is the basic storage unit─ Resides in a bucket─ File or collection of data
─ Key, version id, value, and metadata─ Subresources and access control information
▪ Store unlimited number of objects in a bucket
▪ Different tiers─ Storage class─ Vary in price
─ Depends on your use case
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-25
Storage Class
Class Use Case
Standard Frequently accessed data
Intelligent-Tiering Long-lived data with changing or unknown access patterns
Standard-IA Long-lived yet infrequently accessed data
One Zone-IA Long-lived yet infrequently accessed non-critical data
Glacier Data archival with varying retrieval options
Glacier Deep Archive Archival of rarely used data
Reduced Redundancy Frequently accessed but non-critical data
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-26
Amazon Simple Storage Service (S3)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-27
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-28
Hadoop Distributed File System (HDFS)
▪ Primary data storage used by Apache Hadoop─ Originally closely related with MapReduce─ Provides storage layer for many distributed processing frameworks
─ MapReduce and Apache Spark
▪ Breaks data into blocks─ Distributes blocks within the cluster nodes─ Replicates data
─ Fault-tolerant
▪ ’Bring compute to the data’─ Parallel processing
▪ Things changed with the cloud
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-29
Comparing S3 and HDFS in the Cloud Era (1)
▪ HDFS requires a running cluster
▪ S3 is independent from compute
▪ Data lake can be shared among multiple clusters─ Transient clusters─ Cost-efficient─ Higher number of machines
─ Working in parallel
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-30
Comparing S3 and HDFS in the Cloud Era (2)
HDFS S3
Distributed file system Object store
Bound by cluster available storage Infinite storage
Higher cost Cost effective
Requires a running cluster Available for transient clusters
Replication makes it fault tolerant 99.999999999% durability & 99.99% availability
Bring the compute to the data As many instances as needed can access data
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-31
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-32
Hands-On Exercise: Storing Data in the Cloud
▪ In this exercise, you will work with the different cloud storage services
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-33
Chapter Topics
Storing Files and Objects: Instance Store, EBS and S3
▪ Storing Files and Objects in the Cloud
▪ Amazon EC2 Instance Store
▪ Amazon Elastic Block Store (EBS)
▪ Amazon Elastic File System (EFS)
▪ Amazon Simple Storage Service (S3)
▪ Understanding and Comparing S3 and HDFS
▪ Hands-On Exercise: Storing Data in the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-34
Essential Points (1)
▪ Different types of storage
▪ Instance store─ Physically attached disks─ Buffer, cache, or intermediate processing
─ Data can be lost on certain scenarios
▪ EBS─ Highly available block level storage
─ Persistent and supports live configuration changes─ gp2, io1, st1, and sc1
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-35
Essential Points
▪ EFS─ Scalable and fully managed NFS file system─ Mounted on an instance─ Also available for on-premises
▪ S3─ Large object repository with infinite scalability and high availability─ Buckets and objects, with eventual consistency
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 10-36
Storing Relational and Key-Value Data: Amazon RDS andDynamoDBChapter 11
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-2
Chapter Topics
Storing Relational and Key-Value Data: Amazon RDS andDynamoDB
▪ Databases and Key Value Stores
▪ Storing Relational Data: Amazon RDS
▪ Storing Key-Value Data: Amazon DynamoDB
▪ Hands-On Exercise: Setting up RDS and DynamoDB
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-3
Chapter Objectives
In this chapter, you will learn
▪ Which are the relational and key-value stores available in AWS
▪ How to deploy a managed database using RDS
▪ How to create a DynamoDB table
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-4
Databases and Key-Value Stores
▪ Data is an integral part of an application
▪ Different types of data─ Raw or processed for analysis─ Configuration and metadata to control applications and services
▪ AWS databases and key-value stores─ Relational Database Service (RDS)─ DynamoDB
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-5
Chapter Topics
Storing Relational and Key-Value Data: Amazon RDS andDynamoDB
▪ Databases and Key Value Stores
▪ Storing Relational Data: Amazon RDS
▪ Storing Key-Value Data: Amazon DynamoDB
▪ Hands-On Exercise: Setting up RDS and DynamoDB
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-6
Relational Databases
▪ Store structured data─ Configuration and state information─ Health and task progress
▪ Use a relational database─ Cloudera Manager, Ambari, and CDP─ Oozie Server, Sqoop Server, Activity Monitor, and Reports Manager─ Hive Metastore Server, Hue Server, and Sentry Server
▪ Install and administer your database─ Or use a managed database
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-7
Amazon Relational Database Service (RDS)
▪ Databases in the cloud─ As a managed service
▪ No need to worry about─ Set up, operation, and scale─ Backups, replication, and software patching─ Failure detection and recovery
▪ License─ Included─ Bring your own license (BYOL)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-8
Amazon Relational Database Service (RDS)
▪ Scale, resiliency, and fault-tolerance─ Multi-AZ deployment (high availability)─ Read-only replicas
▪ Security group to control access─ IP address ranges─ EC2 instances
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-9
Instance Types
▪ Optimized for memory, performance, or I/O─ Instance class
▪ Scale each component─ As required─ Independently
▪ Several supported database engines─ MariaDB, MySQL, Oracle, and PostgreSQL
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-10
Database Instance
▪ Basic building block─ EC2 instance
─ Resources according to requirements
▪ Isolated database environment─ Supports multiple user-created databases─ Accessed with existing tools and applications
▪ Management console
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-11
Creating an RDS Database (1/3)
▪ Create database─ Standard Create or Easy Create
▪ Engine options─ Database type─ Edition and version
▪ Template─ Production, Dev/Test, or Free Tier
▪ Settings─ Instance identifier─ Credential settings
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-12
Creating an RDS Database (2/3)
▪ DB instance size─ Standard, memory optimized, or burstable
▪ Storage─ Type, allocated storage, and provisioned IOPS─ Autoscaling options
▪ Availability and durability─ Multiple availability zone deployment
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-13
Creating an RDS Database (3/3)
▪ Connectivity─ VPC─ Cannot be changed after creation
▪ Database authentication─ Password─ IAM users and roles
▪ Additional configuration
▪ Estimated costs
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-14
Relational Database Service (RDS)
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-15
Chapter Topics
Storing Relational and Key-Value Data: Amazon RDS andDynamoDB
▪ Databases and Key Value Stores
▪ Storing Relational Data: Amazon RDS
▪ Storing Key-Value Data: Amazon DynamoDB
▪ Hands-On Exercise: Setting up RDS and DynamoDB
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-16
Key-Value Stores
▪ Relational databases can solve many problems─ Not suitable for all─ Alternatives required for other types of problems
▪ Key value store─ Type of nonrelational database─ NoSQL store
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-17
How Key-Value Stores Works
▪ Data stored as key-value pairs─ Key is the unique identifier
▪ Keys and values─ Simple or complex objects
▪ Highly partitionable
▪ Horizontal scaling
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-18
Amazon DynamoDB
▪ Managed key-value store
▪ Serverless
▪ Enterprise ready
▪ Utilized for S3Guard─ Consistent store of metadata─ For objects in an S3 bucket
─ Eventual consistency
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-19
Create Table
▪ Table name
▪ Partition key─ String, binary, number
▪ Settings─ Default─ Custom
─ Secondary indexes and read/write capacity mode─ Provisioned capacity, auto scaling, encryption, and tags
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-20
Amazon DynamoDB
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-21
Chapter Topics
Storing Relational and Key-Value Data: Amazon RDS andDynamoDB
▪ Databases and Key Value Stores
▪ Storing Relational Data: Amazon RDS
▪ Storing Key-Value Data: Amazon DynamoDB
▪ Hands-On Exercise: Setting up RDS and DynamoDB
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-22
Hands-On Exercise: Setting up RDS and DynamoDB
▪ In this exercise, you will create an RDS instance and a DynamoDB table
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-23
Chapter Topics
Storing Relational and Key-Value Data: Amazon RDS andDynamoDB
▪ Databases and Key Value Stores
▪ Storing Relational Data: Amazon RDS
▪ Storing Key-Value Data: Amazon DynamoDB
▪ Hands-On Exercise: Setting up RDS and DynamoDB
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-24
Essential Points
▪ Clusters require data stores─ Store data for analysis─ Configuration and metadata
▪ Databases and key value stores
▪ Relational Database Service (RDS)─ Managed relational database─ Focus on your data
─ Not on the database administration─ Cloudera supported databases available─ DB instance clases
─ Scale as needed
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-25
Essential Points
▪ DynamoDB─ Managed key value store─ Utilized for S3 Guard
─ Metadata about S3 objects─ Solves the "eventual consistency" problem
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 11-26
Migrating Data to the CloudChapter 12
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-2
Chapter Topics
Migrating Data to the Cloud
▪ Migrating Data to the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-3
Chapter Objectives
In this chapter, you will learn
▪ Which are the traditional methods to migrate data to the cloud
▪ Which tools AWS provides for online data migration
▪ Which methods AWS provides for offline bulk data migration
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-4
Migrating to the Cloud
▪ Cloud adoption─ Clusters
─ Lift-and-shift─ Cloud-native
▪ Transfer data to the cloud─ Data migration
─ To data lake or other storage type
▪ Different ways to migrate data to AWS─ Online or offline
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-5
Traditional Online Data Migration
▪ Traditional tools─ SFTP, SCP, Rsync, and other tools
▪ Inbound data transfer is free─ Upload data
─ Not the most efficient mechanism─ Especially for large amounts of data
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-6
Online Data Migration using AWS Direct Connect
▪ Dedicated network connection─ On-premises to AWS
▪ Compatible with all AWS services
▪ Reduces bandwidth costs
▪ Private, secure, and cost-efficient
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-7
Online Data Migration using AWS DataSync
▪ AWS Data Sync─ Automate moving data from on-premises
─ S3 and EFS
▪ Deploy agent as VM on-premises─ In charge of moving your data from on-prem to AWS─ Connect to your storage─ Specify destination─ Process starts
─ Preserves metadata─ Integrity checks
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-8
Offline Data Migration with AWS Snowball
▪ AWS Snowball─ Petabyte-scale data transport solution─ Secure appliances─ Transport data in and out of AWS─ Solves many data transfer limitations
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-9
Offline Data Migration with AWS Snowball Edge
▪ AWS Snowball Edge─ Similar to Snowball─ Provides on-board storage and computing capabilities─ Processing at the edge
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-10
Offline Data Migration with AWS Snowmobile
▪ Snowmobile─ Exabyte-scale data transport solution─ 45 foot shipping container pulled by a semi-trailer truck─ Extremely cost-efficient for very large uploads─ Requires a custom engagement
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-11
Chapter Topics
Migrating Data to the Cloud
▪ Migrating Data to the Cloud
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-12
Essential Points
▪ Cloud adoption─ Clusters─ Data
▪ Upload data using traditional methods─ Inbound data is free
▪ Online and offline data migration tools─ Online
─ Direct Connect and DataSync─ Offline
─ Snowball, Snowball Edge, and Snowmobile
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 12-13
Modeling Infrastructure UsingAWS CloudFormationChapter 13
Course Chapters
▪ Introduction▪ An Overview of the Cloud with Cloudera▪ Getting Started with the Cloud▪ Estimating, Managing, and Monitoring Costs▪ Understanding Cloud Security: Amazon Web Services▪ Regions and Availability Zones▪ Networking▪ Computing Power in AWS▪ Protecting Your Infrastructure: Security Groups & Network ACLs▪ Storing Files and Objects: Instance Store, EBS and S3▪ Storing Relational and Key-Value Data: Amazon RDS and DynamoDB▪ Migrating Data to the Cloud▪ Modeling Infrastructure Using AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-2
Chapter Topics
Modeling Infrastructure Using AWS CloudFormation
▪ Modeling Infrastructure with AWS CloudFormation
▪ Hands-On Exercise: Deploying Infrastructure with CloudFormation
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-3
Chapter Objectives
In this chapter, you will learn
▪ How to model infrastructure-as-code in AWS using CloudFormation
▪ How to deploy infrastructure using Text Files with CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-4
Modeling Infrastructure with AWS CloudFormation
▪ Model and provision infrastructure─ Using text files
▪ Treat your infrastructure as code
▪ Automate deployments─ No need to manually create resources
▪ Same result, every time─ Quickly replicate infrastructure─ Consistent and repeatable fashion
▪ Free service
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-5
Describing and Creating Clusters With CloudFormation (1)
▪ Create a template─ JSON or YAML
─ Add to source control, version, and share
▪ Specify resources─ Created in the correct order
▪ CloudFormation makes all the necessary underlying API calls
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-6
Describing and Creating Clusters With CloudFormation (2)
▪ Collection of resources managed as a unit─ Called a stack
▪ Delete the stack when it is no longer required─ All resources decommissioned
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-7
AWS CloudFormation
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-8
Chapter Topics
Modeling Infrastructure Using AWS CloudFormation
▪ Modeling Infrastructure with AWS CloudFormation
▪ Hands-On Exercise: Deploying Infrastructure with CloudFormation
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-9
Hands-On Exercise: Deploying Infrastructure withCloudFormation
▪ In this exercise, you will deploy infrastructure with text files usingCloudFormation
▪ Please refer to the Hands-On Exercise Manual for instructions
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-10
Chapter Topics
Modeling Infrastructure Using AWS CloudFormation
▪ Modeling Infrastructure with AWS CloudFormation
▪ Hands-On Exercise: Deploying Infrastructure with CloudFormation
▪ Essential Points
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-11
Essential Points
▪ Describe and provision infrastructure─ As code
─ JSON or YAML─ Add to source control, version, and share
─ Replicate infrastructure─ Consistent and repeatable fashion
▪ Stack─ Specify resources
─ Single unit─ Created in the correct order
─ Deprovisioned together
Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 13-12
top related