cloud foundry summit 2015: building a robust cloud foundry (ha, security and dr)

Post on 25-Jul-2015

378 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building a Robust Cloud FoundryHA, Security and DR

Haydon Ryan | Duncan Winn

This Talk

• High Availability (HA)

• Security

• Backing Up to Mitigate Disasters

© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.

HA

High Availability FocusKeep apps and services running in a performant, reliable and recoverable manner with timely error detection

1. Application Instances

2. Platform Processes

3. Platform VMs

4. Availability Zones

Keep Cloud Foundry running in a performant, reliable and recoverable manner with timely error detection

HA Deployments

Data Center Data Center

vs

Single Foundation Deployment

Dual Foundation Deployment

Data Center

AZ AZ

RDS

WHAT IF I TOLD YOU

IT’S POSSIBLE TO SANELY STREACH LAYER 2

User Targets myapp.mycf.com

DNS Resolution

NSX Boundary NSX Boundary

VIP VIPSSL Termination

SSL Termination

DNS Global Traffic Management (GTM)

HA ProxyHA Proxy

LTM ApplianceLTM Appliance

HA ProxyHA Proxy

LTM Appliance LTM Appliance

DomainsSystem Application

myapp.mycf.comtargetsClient

cf1.comcf push myappDeveloperapi.runtime-cf1.comcf apiDeveloper

CF1

cf2.comcf push myappDeveloperapi.runtime-cf2.comcf apiDeveloper

CF2

myapp.mycf.comtargetsClientmyapp.mycf.comtargetsClientmyapp.mycf.comtargetsClient

Services

ServicesAppApp

ServicesService Service

AppApp

Services

HA Deployments

Data Center Data Center

vs

Single Foundation Deployment

Dual Foundation Deployment

Data Center

AZ AZ

RDS

Customer Requirements

• AWS with One VPC • Specific IP Ranges • Using their internal corporate DNS • no ELBs or Route 53 due to security setup • Multiple Deployments of Cloud Foundry

• Availability Requirements: • App uptime • Failure matrix for downtime situations 15

16

HA Proxy HA Proxy

Bind DNS

CF Router CF Router

HA Proxy HA ProxySSL Termination

Who does the deployment need to be highly available for?

• Users

17

• Developers • Operations

Any non-critical jobs?• clock_global

• used to clean up cc jobs. • Rely on Resurrector? • Redeploy to a different AZ by changing

the resource_pool

18

Critical Jobs & VMs• haproxy • router • nats • cloud controller • uaa/login? • doppler?

19

Any less-critical jobs?• loggregator / doppler • loggregator traffic controller • etcd

• Jumpbox? • bosh?

20

Caveats with this design• Single points of failure?

• DNS • Bosh • Jumpbox

• Human interaction required in outage • Bind DNS does not do health monitoring.

Monitoring scripts were outside the scope of the engagement. 21

22

AZ 2 Private Subnet

Customer Managed

Interstate Data Center

VPC10.202.64.0/19

AZ 1 Private Subnet Bosh Subnet

jumpbox

CF SG

Direct connect

Bosh SG

login

uaa

bosh

router

dea cc

natshealth etcd

doppler

cc worker

loggregator traffic

controller

clockRDS Subnet

RDS SG

boshdb

uaadb

ccdb

apps manager

router

bind dns

Customer Managed

NAT

bastion

ha Proxy

ha Proxy

ha Proxy

ha Proxy

router

router

login

uaadea cc

natshealth etcd

doppler

cc worker

loggregator traffic

controller

AZ 1

AZ 2

How We Deployed Services

• Proxy is a Single Point of Failure

• No Load Balancer to use • Acceptable by customer in

failure matrix 23

Proxy Server

Server

App

Proxy

Proxy

Best Practices for Services

24

• By Default the service binding uses the first proxy address only

Proxy

Proxy Server

Server

Server

App

Load  Balancer

Which Deployment

25

Data Center Data Center

Dual Foundation Deployment

Single Foundation Dual AZs

Data Center

Single Foundation Single DC

Data Center

AZ AZ

RDS

© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.

Security and Networking (on AWS)

Security• Security is Hard • Three main concepts

• Restrict • Limit scope if Compromised • Mitigate

• Feedback Loop

Restrict Users• Individual Multi Factor Authentication

• IaaS Console/Hypervisor • Jumpbox

• Separate accounts • jumpbox • bosh • github

28

Restrict Packets• IaaS

• Security Groups (Instance Level) (better) • ACLs (Subnet Level) • Routes

29

Restrict Containers• Cloud Foundry

• Application Security Groups • dea network properties

• (allow_networks, deny_networks)

30

Pivotal Cloud Foundry for AWS 1.4

31

VPC10.0.0.0/16

RDS Subnet

Private Subnet

Public Subnet

Ops Manager

Elastic Runtime SG

ELB

Internet Gateway

NAT SG

Ops Manager SG

RDS SG

login

uaa micro

router

vpcall

NAT

restricted ip80, 443, 22*

dea

Common traffic flow

sg allow rules

cc

natshealth etcd

doppler

cc worker

loggregator traffic

controller

clock

boshdbuaadb ccdbapps

manager db

autoscaling

ELB SG

80?,443

vpcall

vpcall

was it just DEAs that used NAT?

Limit Scope if Compromised• Different user/pass for each component

• Strong passwords (and usernames) • 20 Characters Long • RANDOM • Both Cases • best avoid special characters • eg: YxLIodYrUBQJrvMRYSQL

• Avoid cloud cow 32http://vanmethod.deviantart.com/art/Purple-­‐Cow-­‐on-­‐a-­‐Cloud-­‐146265642

Limit Scope if Compromised

33

Runner

UAA

Login

uaadb

mySql App  Data

Post Breach Security Measures• Roll

• AWS Credentials • Username and password (Manifest) • PEMs

• Investigate: • Vm Logs (stored in Splunk / CloudWatch Logs) • Bosh and Login Audit Trail • Isolate the VM for investigation

• Resurrector will resurrect a non compromised VM • Feedback:

• Incident Reports and Management Support 34

Paranoid Level Security for AWS• Cloudtrail

• Alerts • Audit Logs • Rollback’

• Remove ability to delete • s3 buckets • subnets / vpc • backups

• Everything else can be recovered from a backup… 35

© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.

Disaster Recovery

Backing Up Cloud Foundry

Configuration

CCDB UAADB Apps Man DB BOSH DB

BlobstoreNFS Server

SCENARIO ONELOSE PCF OPS-MGR

ORCF DEPLOYMENT

Restoring Ops Manager

Export Configuration

Create New Ops Manager

Import Configuration

ConfigurationBackup Ops Manager

scp ubuntu@<OPS MRG HOST>:/var/tempest/workspaces/default/deployments/*yml .Backup Deployment Manifests

Deployment Manifests in BOSH

~$ bosh deployments

bosh download manifest cf-c700aee17d9f801eb152 cfmanifest.yml

SCENARIO TWOLOSE BOSH

Restoring Bosh With PCF

Export Configuration Import

Configuration:/var/tempest/workspaces/default/deployments/micro

BOSH  Director

+ bosh.yml

Restoring Bosh Manually

BOSH

BOSH DB

bosh.yml

pg_dump /var/vcap/store

/dev/xvda /dev/sdb /dev/sdf

Volume:

BOSH DB

External MySQL

Blobstore

Critical DatabasesBackup Cloud Controller DB Encryption Credentials

Locate Databases Info From Deployment Manifestbosh download manifest cf-c700aee17d9f801eb152 cfmanifest.yml

NFS / Blobstore✦ Managing Access with ACLs

✦ Create Group Bucket Policy for “Deny DeleteBucket”

✦ Turn on versioning { "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Action": [ "s3:DeleteBucket", "s3:DeleteObjectVersion" ], "Resource": [ "*" ] } ] }

© Copyright 2014 Pivotal. All rights reserved.© Copyright 2014 Pivotal. All rights reserved.

Takeaway

Takeaways✦ Tradeoffs: No “One Size Fits All”

✦ Service Layer

✦ Existing: Environmental Security and Networking Constraints

✦ Backup: Configuration, Databases, Blobstore (This is your CF).

KEEPCALM

AND

CF PUSH

top related