microservices, continuous delivery, and elasticsearch at capital one
TRANSCRIPT
Capital One
3/8/2017
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Noriaki (Nori) Tatsumi, Bingchen (Ben) Hu, Anne Cather
Security breaches dominate the news
CYBER TECHDATA LAKE
Build vs. buy
• Industry tools only meet ~80% of our requirements
• Vendors’ priorities don’t align with ours
• Elasticsearch is an open source solution
• Open source technology is extensible
30+ data sources, 3B events and 6TB data/day
How we got here
Scale New features NFRs
• More data
• More processing
• Longer data retention
• More consumers
• Alerts console
• Cyber threat intelligence
repository
• And more!
Our initial requirements
• Uptime and DR
• Security
• Compliance
• Data management
The prototype we had
ElasticsearchData Nodes
ElasticsearchMaster Nodes
ElasticsearchClient NodeKibana Fork
w/ SSO Integration
AD SSO
MORE REQUIREMENTS,DELIVERY DATES,
BIGGER TEAMS=
HIGHER COMPLEXITY
Monolith
• Work in parallel
• Do one scope of things well
• Easy to understand and maintain
• Technology stack choice for features and teams
• Quicker, smaller, & independent deploys
• Fault isolation
What we wanted
MICROSERVICES
No SSO Integration!
Embracing microservices
ElasticsearchData Nodes
ElasticsearchMaster Nodes
ElasticsearchClient NodeKibana Fork
w/ SSO Integration
AD SSO
Alerts-API Alerts-UI CTI Repo
• A well known entry point to the system
• Security
• Dynamic routing
• Resiliency
• Latency and fault tolerance
• Monitoring and stats collection
Edge gatewayAlign same qualities to downstream services
• Spring Boot for developer productivity
• JVM-based for production supportability
• Netflix OSS that’s proven microservices technology
Spring CloudFoundation for our web microservices
@SpringBootApplication@EnableAutoConfiguration@EnableZuulProxypublic class EdgeGateway {public static void main(String[] args) throws Exception { SpringApplication.run(EdgeGateway.class, args);}
}
Getting started with Netflix Zuul is easy
Edge gateway
zuul.routes.kibana.path=/kibana/**zuul.routes.kibana.url=https://172.20.10.15:5601
Routing with Zuul
Edge gateway
ElasticsearchClient NodeKibana
ElasticsearchClient NodeKibana
Zuul: the edge gateway
ElasticsearchData Nodes
ElasticsearchMaster Nodes
EdgeGateway
ElasticsearchClient NodeKibana
AD SSO
Alerts API
Alerts UI Reports UI
CyberTechReports Repo
Auth
Asking engineers to maintain IP addresses
• Use cases
• Service connection information lookup
• Automated configuration of load balancing and failover
• Alternatives to Eureka with Spring Cloud
• HashiCorp Consul
• Apache Zookeeper
Discover serviceAutomate orchestration with Netflix Eureka
<application> <name>...</name> <instance> <instanceId>... </instanceId> <hostName>... </hostName> <app>...</app> <ipAddr>...</ipAddr> <status>UP</status> <overriddenstatus>UNKNOWN</overriddenstatus> <port enabled="false">...</port> <securePort enabled="true">...</securePort> <countryId>1</countryId> <dataCenterInfo class="com.netflix.appinfo.AmazonInfo"> <name>Amazon</name> <metadata> <accountId>...</accountId> <local-hostname>... </local-hostname> <instance-id>...</instance-id> <local-ipv4>...</local-ipv4> <instance-type>...</instance-type> <vpc-id>...</vpc-id> <ami-id>...</ami-id> <mac>...</mac> <availability-zone>...</availability-zone> </metadata> </dataCenterInfo> <leaseInfo> <renewalIntervalInSecs>...</renewalIntervalInSecs> <durationInSecs>...</durationInSecs> …..
zuul.routes.kibana.path=/kibana/**zuul.routes.kibana.serviceId=kibana
kibana.ribbon.listOfServers=172.20.10.11:5601,172.20.10.12:5601,172.20.10.13:5601,172.20.10.14:5601ribbon.eureka.enabled=false
Routing with Zuul without Eureka
Discover service
zuul.routes.kibana.path=/kibana/**zuul.routes.kibana.serviceId=kibana
Routing with Zuul with Eureka
Discover service
@SpringBootApplication@EnableDiscoveryClientpublic class Application {
public static void main(String[] args) { SpringApplication.run(Application.class, args);}
}
Making Spring Boot app discoverable with Eureka
Discover service
• Eureka Client (Java)
• Eureka-js-client (JavaScript)
• Eureka REST API (Polyglot)
• *Sidecar/App gateway (Polyglot)
Discover serviceMaking any app discoverable with Eureka
Solving the configuration nightmare
ElasticsearchData Nodes
ElasticsearchMaster Nodes
EdgeGateway
AD SSO
KibanaGateway
ElasticsearchClient Node
KibanaKibana
Gateway
ElasticsearchClient Node
KibanaKibana
Gateway
ElasticsearchClient Node
Kibana
EurekaDiscoveryService
/kibana
Alerts-UI
CyberTechReports UI
Alerts-API
CyberTechReports API
Multi-config Kibanas
ElasticsearchData Nodes
ElasticsearchMaster Nodes
EdgeGateway
AD SSO
KibanaGateway
ElasticsearchClient Node
KibanaKibana
Gateway
ElasticsearchClient Node
KibanaKibana
Gateway
ElasticsearchClient NodeKibana
(Console Off)
KibanaGateway
ElasticsearchClient NodeKibana
(Console On)
AuthorizationService
/kibana
/kibana-admin
Protected Elasticsearch gate
ElasticsearchData Nodes
ElasticsearchMaster Nodes
EdgeGateway
AD SSO
KibanaElasticsearchClient Node
ElasticsearchGateway
KibanaGateway
KibanaElasticsearchClient Node
ElasticsearchGateway
KibanaGateway
Kibana(Console OFF)
ElasticsearchClient Node
ElasticsearchGateway
KibanaGateway
Kibana(Console ON)
ElasticsearchClient Node
ElasticsearchGateway
Kibana-AdminGateway
AuthorizationService
/kibana-admin
/kibana
/esclient
Spring Boot Admin for Spring Cloud microservices
https://github.com/codecentric/spring-boot-admin
Distributed tracing with Spring Cloud Sleuth
https://cloud.spring.io/spring-cloud-sleuth/
Distributed tracing with Spring Cloud Sleuth
Distributed tracing with Spring Cloud Sleuth
• Successes
• Short circuited
• Thread timeouts
• Thread-pool rejections
• Failures/exceptions
• Error percentage
(Rolling 10 second counters)
Circuit breaker monitoring
Crushed it!
Elasticsearch
Kibana
Product delivered and released on time
MICROSERVICES=
PROFIT!
ELASTICSEARCH
OPERATIONS
Cluster on fire!
• Stability issues from end user queries
• Data ingestion latency problems
• Insufficient monitoring
Compliance requiring AMI refresh every 60 days
Finding the causes
• Inconsistent OS, JVM, and Elasticsearch configurations across cluster
• No circuit breakers
• Elasticsearch index templates were missing
• Shards improperly sized
• Incorrect field mappings
• Improper cluster sizing
DEV + OPS
CONTINUOUS DELIVERY
=REQUIREMENT
Configuration management +
Automation
Hello
Hardware Playbook
• Spin up AWS infrastructure
• Tag for purpose
• Configure subnet, security
group, VPC, etc.
Software Playbook
• Install common dependencies
• AWS tags determine software
• Deploy latest artifacts per
environment
Ansible deployment breakdown
Hardware playbook example
roles: - role: servers instances: - name: Elasticsearch_Master instance_type: m4.2xlarge number_of_instances: 3
- name: Elasticsearch_Data instance_type: m4.4xlarge number_of_instances: 100 additional_volume_sizes: [1000, 1000, 1000]
- hosts: tag_{{ ansible_ec2_tag }}_Elasticsearch_Data become: true roles: - role: elasticsearch es_heap_size: '{{ [(ansible_memtotal_mb / 1024) / 2, 16] | min | int }}g' es_plugins: - '{{ es_plugin_license }}' - '{{ es_plugin_marvel_agent }}' - '{{ es_plugin_cloud_aws }}' es_config: cluster.name: '{{ elasticsearch_cluster_name }}' node.name: '{{ ansible_default_ipv4.address }}' node.master: false node.data: true
indices.fielddata.cache.size: 10%indices.breaker.fielddata.limit: 15%indices.breaker.request.limit: 15%indices.breaker.total.limit: 30%network.breaker.inflight_requests.limit: 75%
Software playbook example
./hardware-playbook.yml --extra-vars @dev-vars.yml
./software-playbook.yml --extra-vars @dev-vars.yml
How to use
Monitor everything!Don’t run a black box
• Cloud metrics
• Server metrics
• JVM metrics (even built our own JVM agent)
• Application metrics
• …
What we should monitor
Time-series dashboards with Grafana
ANOTHER SERVICE?
Metrics cluster integration
ElasticsearchCyberLake Nodes
EdgeGateway
AD SSO
KibanaElasticsearchClient Node
ElasticsearchGateway
KibanaGateway
KibanaElasticsearchClient Node
ElasticsearchGateway
KibanaGateway
KibanaElasticsearchClient Node
ElasticsearchGateway
KibanaGateway
KibanaElasticsearch
GatewayKibana-Metrics
GatewayElasticsearchClient Node
/metrics
/kibana
/esclient
ElasticsearchMetrics Cluster
EurekaDiscoveryService
ES query data
ES query data
Service Availability Data
Service Availability Data
PLATFORM STABILITY
TAKEAWAYS
• Microservices architecture works for us
• Increase velocity and reduce maintenance effort
• Elastic stack can integrate easily
• Continuous Delivery must be a requirement
• Monitor everything!
Takeaways
MICROSERVICES+
CONTINUOUS DELIVERY=
PROFIT!
More Questions?
Visit us at the AMA