microservices, continuous delivery, and elasticsearch at capital one

Post on 06-Apr-2017

30 Views

Category:

Software

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Capital One

3/8/2017

Microservices, Continuous Delivery, and Elasticsearch at Capital One

Noriaki (Nori) Tatsumi, Bingchen (Ben) Hu, Anne Cather

Security breaches dominate the news

CYBER TECHDATA LAKE

Build vs. buy

• Industry tools only meet ~80% of our requirements

• Vendors’ priorities don’t align with ours

• Elasticsearch is an open source solution

• Open source technology is extensible

How we got here

Scale New features NFRs

• More data

• More processing

• Longer data retention

• More consumers

• Alerts console

• Cyber threat intelligence

repository

• And more!

Our initial requirements

• Uptime and DR

• Security

• Compliance

• Data management

The prototype we had

ElasticsearchData Nodes

ElasticsearchMaster Nodes

ElasticsearchClient NodeKibana Fork

w/ SSO Integration

AD SSO

MORE REQUIREMENTS,DELIVERY DATES,

BIGGER TEAMS=

HIGHER COMPLEXITY

Monolith

• Work in parallel

• Do one scope of things well

• Easy to understand and maintain

• Technology stack choice for features and teams

• Quicker, smaller, & independent deploys

• Fault isolation

What we wanted

MICROSERVICES

No SSO Integration!

Embracing microservices

ElasticsearchData Nodes

ElasticsearchMaster Nodes

ElasticsearchClient NodeKibana Fork

w/ SSO Integration

AD SSO

Alerts-API Alerts-UI CTI Repo

• A well known entry point to the system

• Security

• Dynamic routing

• Resiliency

• Latency and fault tolerance

• Monitoring and stats collection

Edge gatewayAlign same qualities to downstream services

• Spring Boot for developer productivity

• JVM-based for production supportability

• Netflix OSS that’s proven microservices technology

Spring CloudFoundation for our web microservices

@SpringBootApplication@EnableAutoConfiguration@EnableZuulProxypublic class EdgeGateway {public static void main(String[] args) throws Exception { SpringApplication.run(EdgeGateway.class, args);}

}

Getting started with Netflix Zuul is easy

Edge gateway

zuul.routes.kibana.path=/kibana/**zuul.routes.kibana.url=https://172.20.10.15:5601

Routing with Zuul

Edge gateway

ElasticsearchClient NodeKibana

ElasticsearchClient NodeKibana

Zuul: the edge gateway

ElasticsearchData Nodes

ElasticsearchMaster Nodes

EdgeGateway

ElasticsearchClient NodeKibana

AD SSO

Alerts API

Alerts UI Reports UI

CyberTechReports Repo

Auth

Asking engineers to maintain IP addresses

• Use cases

• Service connection information lookup

• Automated configuration of load balancing and failover

• Alternatives to Eureka with Spring Cloud

• HashiCorp Consul

• Apache Zookeeper

Discover serviceAutomate orchestration with Netflix Eureka

<application> <name>...</name> <instance> <instanceId>... </instanceId> <hostName>... </hostName> <app>...</app> <ipAddr>...</ipAddr> <status>UP</status> <overriddenstatus>UNKNOWN</overriddenstatus> <port enabled="false">...</port> <securePort enabled="true">...</securePort> <countryId>1</countryId> <dataCenterInfo class="com.netflix.appinfo.AmazonInfo"> <name>Amazon</name> <metadata> <accountId>...</accountId> <local-hostname>... </local-hostname> <instance-id>...</instance-id> <local-ipv4>...</local-ipv4> <instance-type>...</instance-type> <vpc-id>...</vpc-id> <ami-id>...</ami-id> <mac>...</mac> <availability-zone>...</availability-zone> </metadata> </dataCenterInfo> <leaseInfo> <renewalIntervalInSecs>...</renewalIntervalInSecs> <durationInSecs>...</durationInSecs> …..

zuul.routes.kibana.path=/kibana/**zuul.routes.kibana.serviceId=kibana

kibana.ribbon.listOfServers=172.20.10.11:5601,172.20.10.12:5601,172.20.10.13:5601,172.20.10.14:5601ribbon.eureka.enabled=false

Routing with Zuul without Eureka

Discover service

zuul.routes.kibana.path=/kibana/**zuul.routes.kibana.serviceId=kibana

Routing with Zuul with Eureka

Discover service

@SpringBootApplication@EnableDiscoveryClientpublic class Application {

public static void main(String[] args) { SpringApplication.run(Application.class, args);}

}

Making Spring Boot app discoverable with Eureka

Discover service

• Eureka Client (Java)

• Eureka-js-client (JavaScript)

• Eureka REST API (Polyglot)

• *Sidecar/App gateway (Polyglot)

Discover serviceMaking any app discoverable with Eureka

Solving the configuration nightmare

ElasticsearchData Nodes

ElasticsearchMaster Nodes

EdgeGateway

AD SSO

KibanaGateway

ElasticsearchClient Node

KibanaKibana

Gateway

ElasticsearchClient Node

KibanaKibana

Gateway

ElasticsearchClient Node

Kibana

EurekaDiscoveryService

/kibana

Alerts-UI

CyberTechReports UI

Alerts-API

CyberTechReports API

Multi-config Kibanas

ElasticsearchData Nodes

ElasticsearchMaster Nodes

EdgeGateway

AD SSO

KibanaGateway

ElasticsearchClient Node

KibanaKibana

Gateway

ElasticsearchClient Node

KibanaKibana

Gateway

ElasticsearchClient NodeKibana

(Console Off)

KibanaGateway

ElasticsearchClient NodeKibana

(Console On)

AuthorizationService

/kibana

/kibana-admin

Protected Elasticsearch gate

ElasticsearchData Nodes

ElasticsearchMaster Nodes

EdgeGateway

AD SSO

KibanaElasticsearchClient Node

ElasticsearchGateway

KibanaGateway

KibanaElasticsearchClient Node

ElasticsearchGateway

KibanaGateway

Kibana(Console OFF)

ElasticsearchClient Node

ElasticsearchGateway

KibanaGateway

Kibana(Console ON)

ElasticsearchClient Node

ElasticsearchGateway

Kibana-AdminGateway

AuthorizationService

/kibana-admin

/kibana

/esclient

Spring Boot Admin for Spring Cloud microservices

https://github.com/codecentric/spring-boot-admin

Distributed tracing with Spring Cloud Sleuth

https://cloud.spring.io/spring-cloud-sleuth/

Distributed tracing with Spring Cloud Sleuth

Distributed tracing with Spring Cloud Sleuth

• Successes

• Short circuited

• Thread timeouts

• Thread-pool rejections

• Failures/exceptions

• Error percentage

(Rolling 10 second counters)

Circuit breaker monitoring

Crushed it!

Elasticsearch

Kibana

Product delivered and released on time

MICROSERVICES=

PROFIT!

ELASTICSEARCH

OPERATIONS

Cluster on fire!

• Stability issues from end user queries

• Data ingestion latency problems

• Insufficient monitoring

Compliance requiring AMI refresh every 60 days

Finding the causes

• Inconsistent OS, JVM, and Elasticsearch configurations across cluster

• No circuit breakers

• Elasticsearch index templates were missing

• Shards improperly sized

• Incorrect field mappings

• Improper cluster sizing

DEV + OPS

CONTINUOUS DELIVERY

=REQUIREMENT

Configuration management +

Automation

Hello

Hardware Playbook

• Spin up AWS infrastructure

• Tag for purpose

• Configure subnet, security

group, VPC, etc.

Software Playbook

• Install common dependencies

• AWS tags determine software

• Deploy latest artifacts per

environment

Ansible deployment breakdown

Hardware playbook example

roles: - role: servers instances: - name: Elasticsearch_Master instance_type: m4.2xlarge number_of_instances: 3

- name: Elasticsearch_Data instance_type: m4.4xlarge number_of_instances: 100 additional_volume_sizes: [1000, 1000, 1000]

- hosts: tag_{{ ansible_ec2_tag }}_Elasticsearch_Data become: true roles: - role: elasticsearch es_heap_size: '{{ [(ansible_memtotal_mb / 1024) / 2, 16] | min | int }}g' es_plugins: - '{{ es_plugin_license }}' - '{{ es_plugin_marvel_agent }}' - '{{ es_plugin_cloud_aws }}' es_config: cluster.name: '{{ elasticsearch_cluster_name }}' node.name: '{{ ansible_default_ipv4.address }}' node.master: false node.data: true

indices.fielddata.cache.size: 10%indices.breaker.fielddata.limit: 15%indices.breaker.request.limit: 15%indices.breaker.total.limit: 30%network.breaker.inflight_requests.limit: 75%

Software playbook example

./hardware-playbook.yml --extra-vars @dev-vars.yml

./software-playbook.yml --extra-vars @dev-vars.yml

How to use

Monitor everything!Don’t run a black box

• Cloud metrics

• Server metrics

• JVM metrics (even built our own JVM agent)

• Application metrics

• …

What we should monitor

Time-series dashboards with Grafana

ANOTHER SERVICE?

Metrics cluster integration

ElasticsearchCyberLake Nodes

EdgeGateway

AD SSO

KibanaElasticsearchClient Node

ElasticsearchGateway

KibanaGateway

KibanaElasticsearchClient Node

ElasticsearchGateway

KibanaGateway

KibanaElasticsearchClient Node

ElasticsearchGateway

KibanaGateway

KibanaElasticsearch

GatewayKibana-Metrics

GatewayElasticsearchClient Node

/metrics

/kibana

/esclient

ElasticsearchMetrics Cluster

EurekaDiscoveryService

ES query data

ES query data

Service Availability Data

Service Availability Data

PLATFORM STABILITY

TAKEAWAYS

• Microservices architecture works for us

• Increase velocity and reduce maintenance effort

• Elastic stack can integrate easily

• Continuous Delivery must be a requirement

• Monitor everything!

Takeaways

MICROSERVICES+

CONTINUOUS DELIVERY=

PROFIT!

More Questions?

Visit us at the AMA

top related