pivotal cloud foundry 2.4 edition -...

Pivotal Ready Architecture for vSphere Reference

Architecture Pivotal Cloud Foundry 2.4 edition

1

Pre-Reqs

VMware vSphere 6.5U2 (not v6.7)

VMware NSX-T 2.3 or newer

Pivotal Ops Manager 2.4 or newer

Pivotal PAS 2.4 or newer

Pivotal VMware PKS 1.2 or newer

vxRail ESSM 4.5.150 or newer

Dell EMC ECS 3.1 or newer

This recommended architecture includes VMware vSphere and NSX-T, a software-defined network virtualization that runs on VMware ESXi virtual hosts and combines routing, firewall, NAT/SNAT, and load balancing. In the absence of NSX-T, see below for architectures that do not rely on it. To use all features listed here, NSX-T requires at least Advanced licensing from VMware when used with PAS. The equivalent of that licensing is included with the PKS product. For more information about installing and configuring NSX-T for use with PCF on vSphere, see VMware’s official documentation and the Pivotal Cookbook for NSX-T. This design supports long-term use, capacity growth at the IaaS (vSphere) level and PaaS (PAS/PKS) level, and maximum installation security through the NSX-T firewall. It allocates a minimum of four servers to each cluster to meet VMware’s best practice around HA and VSAN. It also spreads PCF components across three clusters, as recommended for High Availability. Major areas of consideration are:

1. Introducing Dell EMC PRA a. Key Benefits b. Hardware Reference Design

2. SDN or No SDN Deployments (Will NSX be used?) 3. PAS without NSX-T 4. PKS without NSX-T 5. Network design and allocations 6. PAS and PKS with NSX-T 7. Storage design 8. Compute and HA considerations 9. Scaling up and capacity management considerations

2

https://docs.google.com/document/d/1hw7USpyMZpKoT5MvSaP9NcnTHsdvMd_w0EGZ19cL8nQ/edit

https://docs.pivotal.io/pivotalcf/2-0/concepts/high-availability.html

If your design approach includes PKS, you will have NSX-T included and, though not required, that technology opens up many interesting and flexible options. If you want to deploy without NSX-T involved (see separate materials on NSX-V design considerations) you can be successful with PAS and/or PKS native on vSphere but the design choices are very different.

Introducing Dell EMC Pivotal Ready Architecture Pivotal Ready Architecture (PRA) is an integrated development platform that is based on cloud computing and storage, offering performance and convenience, addresses the challenges of creating, testing, and updating applications in a consolidated production environment.

Key Benefits PRA enables you to quickly and reliably create an integrated development platform and provides a variety of deployment options to meet your needs. It is a tested PCF deployment that uses the scalable, hyperconverged VxRail Appliance and VMware vSphere with VMware vSAN and NSX-T. With PRA, you can deploy a production-ready PAS environment or PKS Kubernetes clusters more quickly and reliably than when using a do-it-yourself approach. The PAS and PKS deployments are designed to meet application operational needs. Dell EMC, VMware and Pivotal Services can help with installing and testing PRA in your IT environment. IT professionals tasked with planning compute, network, and storage capacity can work with Dell EMC Services to determine the resource requirements for each component. PRA enables developers to publish, run, and scale legacy, COTS & cloud-native applications. The PRA platform is based on an integrated infrastructure that enables IT operators to easily manage and scale the development environment with tools such as VxRail Manager and PCF Operations Manager. With PRA, you benefit from:

● A reference architecture for a variety of highly available Pivotal software deployments

● A private cloud and infrastructure service with simplified deployment

● A highly available platform to develop and quickly deploy applications, reducing the time to application delivery

● A modern developer platform that boosts developer productivity by combining application services, service discovery, container management, and orchestration with an ecosystem of plug-ins for developer tools

● Increased cloud solution portability and agility, because you are not locked into public cloud APIs

3

https://docs.pivotal.io/pivotalcf/2-0/refarch/vsphere/vsphere_ref_arch.html

https://docs.pivotal.io/pivotalcf/2-0/refarch/vsphere/vsphere_ref_arch.html

● External S3-compatible storage or an internal Web Distributed Authoring and Versioning (WebDAV)

server for the Pivotal Application Service (PAS) blobstore

Architectural Overview

PRA is a highly available reference architecture that describes how to create a production-ready PCF deployment using vCenter, NSX-T, for PAS and/or PKS. The Management and Edge cluster is spread across three racks, and it is used to run a vCenter that manages the compute clusters, also known as Availability Zones (AZs), where NSX-T is deployed. You can deploy and run PAS or PKS or both on the three compute clusters or AZs. The following figure shows a high-level overview of PRA.

Management and Edge Cluster for NSX-T The Management cluster is used to deploy the NSX-T Management Plane (NSX-T Manager, and three Controller instances). The Edge Cluster is used to deploy large Edge VMs (16 GB memory, 8 vCPUs), two instances per rack. The Management and Edge Cluster is deployed using the internal vCenter Server. A separate vCenter is used to manage the Compute Cluster (the hosts the compute nodes use for deployment of PAS or PKS VMs and application containers). All PAS and PKS deployments use the external vCenter server. The vCenter setups use a three cluster, six node arrangement to provide high availability.

4

The minimum sizing for the Management Cluster is four nodes, but the standard is two nodes from each rack, starting at six total nodes. For NSX-T Edges, minimum sizing is co-located with the Management components in four nodes, standard sizing is co-located with the six node/three rack Management components or for max performance, a stand-alone four node cluster exclusive to Edges.

Top-of-rack switches

The VxRail Appliance is compatible with many customer-supplied networks and switches. VxRail Appliance nodes communicate over one or more customer-provided network switches, typically a top-of-rack (ToR) switch. The ToR switches must support IPv6 multicast pass-through and IPv4 unicast. A single ToR switch can be a single point of failure, so the recommendation is to use dual ToR switches, such as the Dell EMC Switch S4048. Each rack's ToRs must be connected to the customer's spine/core network.

Management and Edge cluster node networking

5

Compute cluster networking

Non-SDN Deployments

VLAN Considerations Without an SDN environment, you will draw network address space and broadcast domains from the greater datacenter pool. For this approach, you will need to deploy PAS and PKS in a particular alignment.

● Choose VLANs and address space for PAS that does not overlap with PKS ● Load balancing will be performed external to the PCF installation ● Client SSL termination will need to happen either at the load balancer or gorouters, or both, so plan for

certs to accomplish this ● Firewalling will be accomplished external to the PCF installations

C2C & CNI Considerations ● Container-to-container (C2C) networking is a feature of the tile, not the environment ● Changing the C2C strategy after deployment is possible, but with NSX-T, you’ll need both the C2C tile

and a infrastructure of NSX-T to be successful. Migrating to an SDN-enabled solution from this design is possible but best considered as a greenfield deployment; inserting a SDN layer under an active PCF installation is very disruptive.

PAS Without SDN Pivotal PAS, as a part of the PCF umbrella, is fully functional without an SDN overlay. Flexibility and other features are lost without SDN, but it is a valid approach. There’s no hard dependency on SDN features in Pivotal PAS to date.

6

Refer to Installing PAS on vSphere for more information. A small scale implementation would start with a Management Stack compute stack and a single AZ of compute for the PAS workloads. More AZs can be added as new capacity in the future as needed. This approach is represented here:

7

https://docs.pivotal.io/pivotalcf/2-3/customizing/vsphere.html

A full scale production approach would include three AZs of compute capacity at the start, with the growth vector being vertical (adding more compute hosts to the existing three clusters. All components would be spread across multiple racks to manage fault domains. This approach is represented here:

PKS Without SDN

8

Pivotal Container Service (PKS) is supplied with NSX-T SDN included, along with the associated licensing. The assumption is that the reader will want to use all of this technology at once to get full benefit from both. However, the bundled NSX-T SDN solution can be ignored, as there is a built-in network stack (flannel) that takes care of container networking. Refer to Installing PKS on vSphere for more information on PKS without NSX-T.

SDN-Enabled Deployments using NSX-T

Network Requirements for PAS PAS requires a number of statically defined networks to host its main elements. If we consider the Tenant side of an NSX-T deployment, we define a series of non-routable address banks that the NSX-T routers will manage for us. Similar to reference designs in the past, those are the following:

NOTE: Relative to reference designs from the PCF v1.xx days, these static networks have a smaller host address space assigned. With NSX-T, you will find that building for many dynamically assigned networks is a better strategy than a few, very large networks as in the past.

● Infrastructure - 192.168.1.0/24 ● Deployment - 192.168.2.0/24 ● Services - 192.168.3.0/24

The Services network is handy for use with Ops Manager tiles that you might add in addition to PAS. You can think of the Services network as the “everything else not PAS” network. Some of these tiles can call for additional network capacity to grow into on demand. Historically, we’ve allowed for a network called “Dynamic Services” for this use. Going forward, it is recommended that you consider adding a network per tile that would need this and pair them up in the tile’s config. For Example:

● OD-Services# - 192.168.4.0 - 192.168.9.0 in /24 segments Example: Redis tile asks for “Network” and “Services Network”. The first one is for placing the broker and the second one is for deploying worker VMs to support the service. For this we would deploy a new network “OD-Services1” and tell Redis to use the “Services” network for the broker and the “OD-Services1” network for the workers. The next tile can also use “Services” for the broker and a new “OD-Services2” network for workers, and so on.

● Isolation Segments - 192.168.10.0- 192.168.63.0 in /24 segments

9

https://docs.pivotal.io/runtimes/pks/1-2/

Isolation Segments can be added quickly and easily to an existing PAS installation. This range of address space is designated for use when adding one of more of them. A new /24 network for this range should be deployed for each new Isolation Segment. There’s reserved capacity for >50 Isolation Segments of >250 VMs per in this address bank.

● PAS Dynamic Orgs - 192.168.128.0/17 = 128 orgs/foundation

New to the PCF 2.x reference design is the concept of dynamically assigned Org networks that are attached to automatically generated NSX-T Tier-1 routers, a benefit of NSX-T. The operator doesn’t define these networks in Ops Manager, but rather allows for them to get built by supplying a non-overlapping block of address space for this purpose. This is configurable in NCP Configuration in the NSX-T tile in Ops Manager. The default is /24 (every Org gets a new /24). This reference uses a pattern that follows previous references, with the main difference being all networks break on the /24 boundary (for easier CIDR math) and the network octet is now numerically sequential (1-2-3).

Network Requirements for PKS

● PKS Clusters 172.24.0.0/14 ● PKS Pods 172.28.0.0/14

New to PCF 2.x is Pivotal Container Service (PKS). When deploying PKS via Ops Manager, you must allow for two blocks of address space for dynamic networks that PKS will deploy per cluster and per namespace for pods. The recommended address spaces are distinctly different that that used with PAS, to give the operator a quick visual queue as to which jobs relate to which service.

Summary To summarize the complete addressing strategy:

● Infrastructure - 192.168.1.0/24 ● Deployment - 192.168.2.0/24 ● Services - 192.168.3.0/24 ● OD-Services# - 192.168.4.0 - 192.168.9.0 in /24 segments ● Isolation Segments - 192.168.10.0 - 192.168.63.0 in /24 segments ● Undefined: 192.168.64.0 - 192.168.127.0 ● PAS Dynamic Orgs - 192.168.128.0/17 ● PKS Clusters - 172.24.0.0/14 ● PKS Pods - 172.28.0.0/14

Network Requirements for External Routing:

10

Routable external IPs on the Provider side, for example for NATs, PAS Orgs and load balancers get assigned to the T0 router fronting the PCF installation. There are two approaches to assigning address space blocks to this job, each with its own ramifications.

Without PKS at all, or with PKS Ingress: The T0 router needs some routable address space to advertise on the BGP network with its peers (or not advertise, if using static routing). Select a network range with ample address space that can be split into two logical jobs, one job being an advertised as a route for traffic and the other job is for aligning T0 DNATs/SNATs, load balancers and other jobs. Unlike NSX-V, you will consume much more address space for SNATs than before. With PKS No Ingress: Relative to above, this approach has much higher address space consumption for load balancer VIPs, so allow for 4x the address space due to the fact that Kubernetes service types allocate addresses very frequently. Reference: Kubernetes Ingress

● Provider Routable Address Space: /25 (or /23 for PKS No Ingress)

NSX-T handles the routing between a T0 router and any T1 routers connected to it. There are no design parameters needed from Pivotal to accomplish this. For the purposes of PCF, the T0/T1 routing relationship “just works” ( T1 Router is enabled with Route Advertisement and advertises all nsx connected routes on NSX end). For T1 Routers associated with Load Balancers, they should have Advertise All LB VIP routes enabled in addition. These are all handled from NSX-T.

PAS With NSX-T Expanding PAS with SDN features is best considered as a greenfield effort. Inserting an SDN layer under a working PAS installation is non-trivial and likely will trigger a rebuild. NSX-T constructs that can be used by PAS include:

● Logical Networks, encapsulated broadcast domains ● VLAN exhaustion avoidance through the use of Logical Networks ● Routing services and NAT/SNAT to network fence the PCF installation ● Load balancing services to pool systems such as go-routers ● SSL termination at the load balancer ● Distributed routing and firewalling services at the hypervisor

11

https://kubernetes.io/docs/concepts/services-networking/ingress/

Using the available NCP tile enables a sophisticated container networking stack in place of the built-in “silk” stack and interlocks with NSX-T already deployed in the IaaS. The NCP tile can not be used unless NSX-T has already been established in the IaaS. These technologies can not be retrofitted into an established PKS or PAS installation; please plan accordingly. Be prepared to supply:

● NSX Manager Host ● Username/Password ● CA Cert of NSX Manager ● PAS Foundation Name (must match the tag added to T0 router, External NAT Pool, and IP Block) ● Subnet Prefix (controls the size of each Org, defaults to /24, i.e 254 addresses) ● Enable SNAT for Container Networks (Automatically uses an IP from the external NAT pool to SNAT

outbound from each Org on a unique IP) Each new “job”, like an Isolation Segment, falls to a broadcast domain/logical switch connected to a T1 router acting as the gateway to that network. This approach provides a convenient DNAT/SNAT control point and a firewall boundary. The one exception to this is the Services network, which shares a T1 router with any associated OD-Services# networks, as all of these are considered part of the same system and inserting any *NATs/firewall rules between them isn’t needed or recommended.

Load Balancing For PAS Without NSX-T in use, a suitable load balancer must be utilized to send traffic to the go-routers and other systems. All installations approaching Prod-level use will make use of external load balancing from hardware appliance vendors or other network-layer solutions. With NSX-T, load balancing is available in the SDN layer as a feature. These load balancers are logical entities tied to the resources in the Edge Cluster and align to the network(s) represented by a T1 router. They function as a Layer 4 load balancer. There is SSL termination available on the NSX-T load balancer, but it is recommended to pass the SSL thru to the go-routers. Common deployments of load balancing in PAS are:

● HTTP/HTTPS traffic to/from go-routers ● TCP traffic to/from tcp routers ● MySQL proxies ● Diego Brains

NSX-T load balancers can support many VIPs, so it’s best to think of deploying one load balancer per network (T1) and one-to-many VIPs on that load balancer per job. Edge Cluster resources are consumed for load

13

balancing, so design carefully how many load balancers are needed vs. how much capacity is available for them. Bosh can manage the members of the server pools for the NSX-T load balancers using NS Groups. To configure the integration, see the NSX-T cookbook

PKS With NSX-T Pivotal Container Service is supplied with NSX-T SDN included, along with the associated licensing. The assumption is that the reader will want to use all of this technology at once to get full benefit from both. However, the bundled NSX-T SDN solution can be ignored, as there is a built-in network stack (flannel) that takes care of container networking.

Deploying Without NSX-T Deploy the PKS tile in Ops Manager (either alongside PAS tile or without) and choose “flannel” on the networking tab. Select from networks already identified in Ops Manager to deploy the PKS API Gateway and the Cluster(s). If following the reference design for PAS, you can use the “Services” and “OD-Services#” networks for these jobs. If no PAS networks are available, define a network named “PKS-Services” and another named “PKS-Cluster” in Ops Manager for these jobs.

Deploying With NSX-T You will supply certain details about the IaaS SDN layer to allow for dynamic constructs to be built. You’ll be asked to supply:

● NSX Manager Host ● Username/Password

14

https://docs.google.com/document/d/1hw7USpyMZpKoT5MvSaP9NcnTHsdvMd_w0EGZ19cL8nQ/edit#bookmark=id.1m9dnli2sepr

● CA Cert of NSX Manager ● T0 Router (to connect dynamically created namespace networks to) ● NSX IP Block (from which to pull networks for pods per namespace) ● NSX IP Block (from which to pull networks for each new cluster) ● Floating IP Pool ID

○ IP Pool to use for externally facing IPs (NATs/VIPs)

New T1 routers will be spawned on-demand as new clusters and namespaces are added to PKS. These can grow rapidly, so a large address block is desired for this use. Allocate a large block like /14 in NSX-T and NSX-T carves out address block of /24 by default from this large block. The /14 CIDR would provide 1024 namespaces/clusters.

CIDR block Number of namespaces/clusters available

/12 4096

/14 1024

/16 256

/18 64

15

/20 16

/22 4

In PKS 1.1.5 and above, multiple master nodes is a supported feature. The number of master nodes is defined per plan in the PKS tile in Ops Manager and should be an odd number to allow etcd to form a quorum. It is recommended to have at least 1 master node per AZ for HA and DR (if shared storage option is available). With the dynamic creation of networks for both PKS clusters and pods, multi-tenancy becomes a discussion topic. It is recommended to use multiple clusters as opposed to a single cluster with multiple namespaces. Multiple clusters provide security, customization-per-cluster, privileged containers, failure domains, version choice, etc. Namespaces are a naming construct as opposed to a tenancy construct.

NSX-T Edge Type and Load Balancer capacity planning The size of the NSX-T Edge and release of NSX-T determines the number of Load Balancers (varies by LB size) available to be used by the PKS Clusters and thus the number of PKS clusters that can be created.

Edge Node Max # of LB sized small

Max # of LB sized medium

Max # of LB sized large

NSX-T release 2.2 2.3 2.2 2.3 2.2 2.3

Edge VM - Small - - - - - -

Edge VM - Medium 1 1 - - - -

Edge VM - Large 40 40 1 4 - -

Edge Bare Metal 750 750 75 75 1 7

It is recommended to go with a minimum of 4 Large Edge VMs (2 instances is very minimal and necessary for HA) to have a decent number of available Load Balancer instances. A Production grade deployment should use Medium load balancers (small is really for pure POC/Dev scale load). So, even though there might be 40 small sized load balancers available on a large Edge VM, only the 4 medium sized load balancers can be used for Prod. The size of the Load Balancer determines the number of Virtual Servers, Pools, and Pool Members per LB instance. Note: Large LB is only available in Bare Metal Edge.

LB Service Small Medium Large (only with Bare Metal Edge)

16

NSX-T release 2.2 2.3 2.2 2.3 2.2 2.3

# of Virtual Servers per LB 10 10 100 100 1000 1000

# of Pools per LB 10 20 100 200 1000 2000

# of Pool Members per LB 30 200 300 2000 3000 10000

Important Note: Since PAS and PKS use active/standby Edge Node LB Service, one would always need to consider capacity as per Edge pair to derive the available number of Load Balancer Services. If there are 2 edge servers, only the load balancer service capacity of one instance can be taken into consideration as the other is treated as standby. So, the following table shows the capacity for 2 instances of large Edge VM as 4 for medium sized load balancers even though each Edge instance itself does have 4 load balance instance capacity.

Edge Cluster # LB sized small # LB sized medium

# LB sized large

NSX-T release 2.2 2.3 2.2 2.3 2.2 2.3

2x Edge VM - Large 40 40 1 4 - -

4x Edge VM - Large 80 80 2 8 - -

8x Edge VM - Large 160 160 4 16 - -

2x Edge Bare Metal 750 750 75 75 1 7

4x Edge Bare Metal 1500 1500 150 150 2 14

8x Edge Bare Metal 3000 3000 300 300 4 28

The number of available Load balancer instances is tied to the Edge instances is directly proportional (1:1) to the maximum number of Kubernetes clusters that can be supported. Each k8s cluster would use minimum of one load balancer from an active Edge instance. Based on the type of load balancer and Edge, the load balancer instances are fixed. So, the resulting number of k8s clusters created on the edge are also constrained by the number of free loadbalancer services on the active Edge. Note: Maximum number of Edge nodes per Edge Cluster is 10 (as of NSX-T 2.3). If number of Edge nodes > 10, create additional Edge clusters. If the Load Balancer service capacity is fully utilized on a given Edge pair (one active, other standby), then install and bring up additional Edge VM instance pairs in the same Edge Cluster to handle requirements for additional load balancers (for existing or new PKS clusters).

17

Note: PKS 1.2+ would support specifying size of the Load Balancer at cluster creation time. Bare Metal Edge support for PKS would be post PKS 1.2 GA. Current limitation around using only a single T0 Router per Edge instance (even for Bare Metal) would be removed around same time frame.

Ingress Routing And Load Balancing for PKS How you select ingress routing influences load balancing choices. You’re going to want both.

1. Ingress Routing - Layer 7 2. Service Type:LoadBalancer - Layer 4

Ingress Routing - Layer 7 NSX-T native ingress router is included when deploying with NSX-T. Third party options include Istio or Nginx running as containers in the cluster. Wildcard DNS entries are needed to point at the ingress service in the style of go-routers in PAS. Domain info for ingress is defined in the manifest of the kubernetes deployment. Here is an example.

apiVersion: extensions/v1beta1 kind: Ingress metadata:

name: music-ingress namespace: music1 spec:

rules:

18

https://istio.io/docs/setup/kubernetes/

https://www.nginx.com/blog/load-balancing-kubernetes-services-nginx-plus/

- host: music1.pks.domain.com http:

paths:

- path: /.* backend:

serviceName: music-service servicePort: 8080

Service Type:LoadBalancer - Layer 4 When pushing a kubernetes deployment with type set to “LoadBalancer”, NSX-T automatically creates a new VIP for the deployment on the existing load balancer for that namespace. You will need to specify a listening and translation port in the service, along with a name for tagging. You will also specify a protocol to use. Here is an example.

apiVersion: v1 kind: Service metadata:

...

spec:

type: LoadBalancer ports:

- port: 80 targetPort: 8080 protocol: TCP name: web

Storage Design PRA offers the following options for the PAS blobstore:

● ECS for external on-premises storage

● Virtustream Storage Cloud for off-premises external storage

● Pivotal Internal WebDAV server for non-high-availability deployments

● Customer-supplied S3-compatible blobstore, which is not supported by Dell EMC

19

Elastic Cloud Storage ECS object storage features a flexible software-defined architecture that provides the simplicity and low-cost benefits of the public cloud without the risk, compliance, and data sovereignty concerns. PRA can use the ECS platform as its blobstore location to store application code, buildpacks, and applications. The blobstore uses the S3 protocol and can be managed by ECS nodes. The ECS platform provides software-defined object storage that is designed for modern cloud-scale storage requirements. ECS benefits include:

● Simplicity and scale that comes from a single global namespace and unlimited applications, users, and files

● Universal accessibility, with support for object, file, and HDFS on a single platform ● Faster application development enabled by API-accessible storage and strong

consistency, which accelerate cloud applications and analytics ● Multitenancy with self-service access and metering capabilities ● Cross-site geo-replicated copies of your PAS blobstore data in dual-site ECS deployments ● Flexible on-premises deployment as an appliance or a software-only solution, or a combination of both

Virtustream Storage Cloud Virtustream Storage Cloud is a hyper-scale, enterprise-class, public cloud object storage platform, built for resiliency and performance. Virtustream provides a unique value proposition with enterprise-class off-premises primary and backup storage. Virtustream Storage Cloud provides:

● Simplicity in consumption and management ● Enterprise-class performance, availability, and data durability ● Hyper-scale capacity for large data stores ● Efficiency in data transport and storage to lower costs ● Security, whether onsite, in-flight, or at-rest ● Seamless, single-source billing and support

Pivotal internal blobstore Pivotal provides an internal WebDAV blobstore that PAS uses to store persistent data. The PAS internal blobstore is deployed as a single virtual machine in PAS. It does not offer high availability but can be backed up using Pivotal BOSH Backup and Restore (BBR). It is designed for small, non-production deployments. For PAS production-level deployments, Dell EMC and Pivotal recommend using an external S3-compatible blobstore.

20

Storage In General Shared storage is a requirement for PCF. You can allocate networked storage to the host clusters following one of two common approaches, _horizontal_ or _vertical_. The approach you follow should reflect how your data center arranges its storage and host blocks in its physical layout. Horizontal: You grant all hosts access to all datastores and assign a subset to each installation. This type of storage is required for PKS multi-AZ deployments on PRA but not provided by the platform natively. In PRA, all storage is deployed vertically aligned, so an external datastore must be sourced and networked/cabled to the installation for PKS needs. Single cluster/AZ PKS does not have this need. Vertical: You grant each cluster its own dedicated datastores, creating a "cluster-aligned" storage strategy. vSphere VSAN is an example of this architecture. For example, with six datastores `ds01` through `ds06`, you assign datastores `ds01` and `ds02` to your first cluster, `ds03` and `ds04` to your second cluster, and `ds05` and `ds06` to your third cluster. You then provision your first PCF installation to use `ds01`, `ds03`, and `ds05`, and your second PCF installation to use `ds02`, `ds04`, and `ds06`. With this arrangement, all VMs in the same installation and cluster share a dedicated datastore. Note: If a datastore is part of a vSphere Storage Cluster using sDRS (storage DRS), you must disable the s-vMotion feature on any datastores used by PCF. Otherwise, s-vMotion activity can rename independent disks and cause BOSH to malfunction. For more information, see <a href="./vsphere_migrate_datastore.html">How to Migrate PCF to a New Datastore in vSphere</a>. ### Storage Capacity and Type Capacity Pivotal recommends the following capacity allocation for PAS installations: - For production use, at least 8 TB of data storage, either as one 8TB store or a number of smaller volumes adding up to 8 TB. Frequently-used development may require significantly more storage to accommodate new code and buildpacks. - For small installations without many tiles, 4-6 TB. The primary consumer of storage is the NFS/WebDAV blobstore. PCF does not currently support using vSphere Storage Clusters with the latest versions of PCF validated for

21

the reference architecture. Datastores should be listed in the vSphere tile by their native name, not the cluster name created by vCenter for the storage cluster. PKS storage sizing is a unique science. Consider the following chart for some guidance on how much storage to allocate based on Pod and Node desires.

Compute and HA Considerations Considering PAS, a consolidation ratio for containers should follow a conservative 4:1 ratio of vCPUs to pCPUs. An even more conservative, meaning safe, consideration is 2:1. While this may seem “light” compared to vSphere standard practice, keep in mind that standard vSphere modeling is based on one OS and one (or only a few) apps per VM. With PAS, you will quickly get to a dozen apps per VM. A standard Diego cell by default is 4x16 (XL). At a 4:1 ratio you have a 16 vCPU budget, or about 12 containers after some waste running the OS. If you want to get to more containers in a cell, be sure to scale the vCPUs accordingly, with the caveat that high core count VMs become increasingly hard to schedule on the pCPU without high physical core counts in the socket. HA considerations: PAS isn’t considered HA until at least two AZs are defined (three are recommended). Using vSphere HA capabilities in conjunction with PCF HA capabilities is the best of all worlds. PCF gains redundancy thru the AZ construct and the loss of an AZ isn’t considered catastrophic. Using Bosh Resurrector can replace lost VMs as

22

needed to repair a foundation. Foundation backup and restoration is accomplished externally by BBR. Designing for HA in vSphere is beyond the scope of this document, but can be found from VMware. PKS has no inherent HA capabilities to design for. Make the best efforts to have HA design at the IaaS, storage, power and access layers to support PKS.

Pulling It All Together A fully meshed PCF 2.4 installation based on best practices and reference design considerations will look as follows:

Common elements are the NSX T0 router and the associated T1 routers. This approach allows for any possible cross traffic between PKS and PAS apps to stay within the bounds of the T0 and not exit. This also provides a convenient, one-stop access point to the whole installation, making deployment of multiple, identical installations easier to automate. Note that each design, green PKS and orange PAS, depict a Ops Manager + BOSH Director installation. This is the recommended approach at the time of this publication; you should not share a Ops Manager + BOSH Director install between both PKS and PAS.

23

https://docs.cloudfoundry.org/bbr/

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vcha65-perf.pdf

AZs are aligned to vSphere clusters, with resource pools an optional organizational tool to place multiple foundations into the same capacity. You can align PKS to any AZ/cluster desired, but there’s no driving need to keep PKS and PAS in completely separate AZs. It continues to be a good design choice to use Resource Pools in vSphere Clusters as AZ constructs to stack different installations of PCF. As server capacity continues to increase, the efficiency of deploying independent clusters of server just for one install of one product is low. Customers are commonly deploying servers with 768GB RAM and beyond, so stacking many PCF installations in clusters of these just makes good consumption sense. Pay close attention to the maximum capacities allowed per NSX-T installation. These targets change quickly as the product matures, but you may find that a NSX-T installation per foundation (as shown above) potentially consumes all of the capacity of a NSX install (not a vSphere install). It’s reasonable to consider a NSX-T installation per foundation to allow max capacity growth. You may be tempted to split the PAS and PKS installations into seperate network trees, behind separate T0 routers. Before doing this, review VMware’s best practises in T0 to T0 routing efficiencies and weaknesses to be sure that approach meets your needs. Please review the reference guidance from VMware for this solution, which is much more detailed in the IaaS constructs than shown here. https://communities.vmware.com/docs/DOC-38609

24

https://communities.vmware.com/docs/DOC-38609

pivotal cloud foundry 2.4 edition -...

Documents