service mesh - qconsp€¦ · between microservices and interferes in the traffic to increase the...

Post on 30-Jul-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Service MeshTechnology Deep dive and reasons for adoption

Diógenes Rettori - @rettoriExecutive Director - Cloud ArchitectureJPMorgan Chase & Co.

One Message.

One Message.

rettori

Agenda

Quick Introduction to Service Mesh 5m

Service Mesh x Distributed Systems 5m

Technology Options 2m

Istio and Linkerd Deep Dive 20m

How to chose 5m

Recommended tools 3

rettori

Service MeshIstio & Linkerd

Diogenes Rettori & Tiago Vieira

Currently Writing

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

?

rettori

AZ-1 AZ-2

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notificationspayments

catalog

notificationspayments

catalog

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notificationspayments

catalog

notificationspayments

catalog

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

catalog

notifications

catalog

1000/ day50 /second

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

catalog

notifications

catalog

1000/ day50 /second

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

rettori

A Service Mesh is an intelligent communications network that understands the relationships

between microservices and interferes in the traffic to increase the reliability and security of the

whole system.

rettori

A Service Mesh is an intelligent communications network that understands the relationships

between microservices and interferes in the traffic to increase the reliability and security of the

whole system.

Addresses needs of distributed systems.

rettori

Service Mesh x Distributed Systems

The network is reliable.

Latency is zero.

Bandwidth is infinite.

The network is secure.

Topology doesn’t change

There is one administrator.

Transport cost is zero.

The network is homogeneous.

Fallacies of Distributed Systems

Given that they are fallacies, we should

assume the opposite.

rettori

Service Mesh x Distributed Systems

The network is reliable.

Latency is zero.

Bandwidth is infinite.

The network is secure.

Topology doesn’t change

There is one administrator.

Transport cost is zero.

The network is homogeneous.

Fallacies of Distributed Systems

Circuit breaking and load balancing

Timeouts and retries

Rating and limiting

Mutual TLS

Service discovery

Role-based access control

gRPC / RSocket

Dynamic routing — A/B, canary deployments

Service Mesh features

rettori

Technology Options

AWS App Mesh

Service Mesh

rettori

rettori

Istio & Linkerd Deep Dive

● Traffic Management

● Security

● Installation / Configuration

● Supported Environments

● Observability

● Policy Management

● Performance

rettori

Traffic Management

Istio Comments Linkerd Comments

TCP Proxying Yes Yes

Load Balancing YesSupports: Round Robin, Least Conn, Random and Passthrough

Yesuses EWMA (exponentially weighted moving average) to identify optimal targets

Subset Load Balancing YesUseful for Canary, Blue/Green deployments and A/B tests

No

Session Affinity YesCookie Hash-based LB for HTTP providing soft session affinity

No

Circuit Breaking Yes as Outlier Detectionsee

comments

no configuration options. EWMA balancing will give less preference to unhealthy targets, achieving something circuit-breaker-like

Retries Yes Yes

rettori

Traffic Management

Points to Consider

- Load Balancing algorithms

- Subset Load Balancing

- Session Affinity

Round Robin, Least Conn, Random and Passthrough

Peak EWMA: maintain a moving average of each replica’s round-trip time, weighted by the number of outstanding requests, and distribute traffic to replicas where that cost function is smallest.

catalog v3

catalog v2

catalog v1

catalog green

catalog blue

rettori

Peak EWMA

EWMAt=λYt+(1−λ)EWMAt−1

For t=1,2,…,n.Where- EWMA0 is the mean of historical data (target) - Yt is the observation at time t n is the number of observations to be monitored including EWMA0 - - - 0<λ≤1 is a constant that determines the depth of memory of the EWMA.

rettori

T node1 node2 node3 EWMA NODE3 EWMA NODE3 EWMA NODE3

1 32 43 33

2 35 64 43 30.00 30.00 30.00

3 64 24 53 32.50 47.00 36.50

4 53 53 63 48.25 35.50 44.75

5 13 31 24 50.63 44.25 53.88

6 24 14 35 31.81 37.63 38.94

7 53 32 64 27.91 25.81 36.97

8 45 43 52 40.45 28.91 50.48

9 65 352 22 42.73 35.95 51.24

10 75 124 3402 53.86 193.98 36.62

11 14 464 35 64.43 158.99 1719.31

12 24 32 45 39.22 311.49 877.16

13 26 35 452 31.61 171.75 461.08

14 63 131 53 28.80 103.37 456.54

15 134 24 234 45.90 117.19 254.77

16 1353 531 53 89.95 70.59 244.38

17 314 132 522 721.48 300.80 148.69

517.74 216.40 335.35

rettori

Traffic ManagementIstio Comments Linkerd Comments

Retry Budgets No YesUsed to avoid retry storms and unnecessary retries.

Timeouts Yes Yes

Fault Injection Yes No

Ingress YesProvided by the Istio ingress-gateway, other gateways supported as well.

see comments

Linkerd does not ship its own Ingress proxy but can be configured to work with popular options such as Nginx, Gloo, and others.

Traffic Filters Yescustom envoy filters can be added to the chain.

No

External Routing Yes Yes

Header-based matching Yes No only path-based matching

Add/Change/Remove custom headers

Yes No

rettori

Points to Consider

- Fault Injection

- Custom Envoy Filters

- Header Based Matching

- Add / Remove Headers

Traffic Management

rettori

Custom Envoy Filter - Gloo Example

rettori

Retry Budgets and Retry Storm

A retry storm is an undesirable client/server failure mode where one or

more peers become unhealthy, causing clients to retry a significant

fraction of requests. This has the effect of multiplying the volume of

traffic sent to the unhealthy peers, exacerbating the problem.

Traffic Management

rettori

Retry Budgets and Retry Storm

Traffic Management

paymentsbooking !

paymentsbooking ! ! !

paymentsbooking

- Retry Ratio - amount of retries based on number of requests - example, 20%- TTL - how long should requests be considered

rettori

SecurityIstio Comments Linkerd

Supports mTLS Yes Yes

TLS on By Default see commentsIstio instructions include details on how to install with TLS on both permissive and restrictive mode

Yes

Certificate Rotation Yes Yes

External Root Certificate Support Yes Yes

Both technologies support Mutual TLS and can rely on external Root certificates.

rettori

For Linkerd, the pre-check (or check --pre) verifies if you have the permission to create Kubernetes resources required during the install process.

Installation and Configuration

Istio Linkerd Comments

Prerequisites check No Yes linkerd check --pre

Requires Sidecar No Yes

Supports automatic Sidecar Injection Yes Yes

rettori

Supported Environments and Deployment Models

Istio Comments Linkerd

Kubernetes Yes Yes

Non Kubernetes Yes Virtual Machines, Cloud Foundry, Consul/Nomad No

Multi-cluster Support - Multiple Control Planes Yes Yes

Multi-cluster Support - Single control plane Yes No

Points to Consider

- Linkerd 2.3 only Supports Kubernetes

- Both support multi-cluster with multiple control planes.

- Istio handles more complex multi-cluster scenarios

rettori

Istio - Multi-Cluster - Multiple Control Planes

rettori

Istio - Multi-Cluster - Single Control Plane - VPN

rettori

Istio - Multi-Cluster - SCP - Border Gateways

rettori

Observability

Istio Comments Linkerd Comments

Admin Dashboard No Yes

Observability Dashboard Yes Includes Kiali YesIncludes the Linkerd dashboard and also pre-configured Grafana dashboards

Tracing Yes No

Tracing can still be achieved by instrumenting applications.For debugging purposes, the Tap feature allows you to 'listen' to traffic on a resource.

Point to Consider

- Linkerd does not have Distributed Tracing but has a Tap Feature.

rettori

Observability

rettori

Observability

rettori

Policy Management

Template Provider

API Key

Analytics Apigee

Authorization Apigee, OPA

Check Nothing Denier

Edge

Kubernetes Kubernetes Env

List Entry Denier, List

Log Entry Fluentd, SolarWinds, Stackdriver, Stdio

MetricApache SkyWalking, Circonus, CloudWatch, Datadog, Prometheus, SignalFx, SolarWinds,

Stackdriver, StatsD, Stdio, Wavefront by VMware

Quota Denier, Memory quota, Redis Quota

Report Nothing

Trace Span SignalFx, Stackdriver, Zipkin

rettori

Policy Management

Istio Linkerd Comments

OIDC/Oauth2 Yes No Principal authentication is delegated to the applications

Rate Limits Yes No

Adapter Support Yes No

Point to Consider

- Linkerd does not a policy management system such as Istio. Policy needs to be

implemented at an Ingress or Application Level.

rettori

Performance

On the server side, the Istio/Envoy sidecar uses ~60% more CPU than Linkerd.

Source: https://medium.com/@michael_87395/benchmarking-istio-linkerd-cpu-c36287e32781

rettori

Performance

On the Linkerd2-meshed setup, the p99.9 latency (red) ranged from 8.0 ms to 12.0 ms.

The p99.9 latency (red) incurred by the Istio-meshed setup, ranging from 35.0 ms to 55.0 ms. The p99 latency (orange) fell in the range of 22.6 ms to 27.2 ms.

Source: https://medium.com/@ihcsim/linkerd-2-0-and-istio-performance-benchmark-df290101c2bb

rettori

How to Know if you need a Service Mesh

Service Governance05

Multiple Language Platforms04

Service Availability / SLA03

Running Distributed Systems01

Advanced CI/CD Pipelines02

rettori

Recommended Tools

$ supergloo install istio

$ supergloo install linkerdsupergloo.solo.io

Service Mesh Observabilitykiali.io

flaggerFlagger is a Kubernetes operator that automates the promotion of canary deployments using Istio or App Mesh routing for traffic shifting and Prometheus metrics for canary analysis.

flagger.app

rettori

Thank you.

top related