monitoring kubernetes across data center and cloud

37
Monitoring Kubernetes Across Data Center and Cloud Specifically Tectonic and Google Container Engine using Datadog Presenters: Ilan Rabinovitch, Director of Technical Community, Datadog Aleks Saul, Customer-Facing Engineer, CoreOS Aparna Sinha, Senior Product Manager, Google

Upload: datadog

Post on 15-Apr-2017

491 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Monitoring kubernetes across data center and cloud

Monitoring Kubernetes Across Data Center and Cloud

Specifically Tectonic and Google Container Engine using Datadog

Presenters:Ilan Rabinovitch, Director of Technical Community, Datadog

Aleks Saul, Customer-Facing Engineer, CoreOS Aparna Sinha, Senior Product Manager, Google

Page 2: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Kubernetes at a glance

Open source production-grade container scheduling and management

● Top 0.01% of all GitHub projects: 950+ contributors & 35,000+ commits

Run Anywhere: multi-cloud, on-prem, bare-metal, OpenStack etc

Broad industry adoption

Commercial Enterprise Support

Kubernetes at a glance

Page 3: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Kubernetes provides container-centric infrastructure

Once specific containers are no longer bound to specific machines/VMs,host-centric infrastructure no longer works

• Scheduling: Decide where my containers should run

• Lifecycle and health: Keep my containers running despite failures

• Scaling: Make sets of containers bigger or smaller

• Naming and discovery: Find where my containers are now

• Load balancing: Distribute traffic across a set of containers

• Storage volumes: Provide data to containers

• Logging and monitoring: Track what’s happening with my containers

• Debugging and introspection: Enter or attach to containers

• Identity and authorization: Control who can do things to my containers

Page 4: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Kubernetes offers choice and flexibility for Hybrid Cloud

Setting up and managing a cluster • Choose a cloud: GCP, AWS, Azure, Rackspace, on-premises, ...• Choose a node OS: CoreOS, Atomic, RHEL, Debian, CentOS, Ubuntu, ...• Provision machines: create VMs, install Docker, ...• Configure networking: IP ranges for Pods, Services, SDN, firewalls, ...• Start cluster services: DNS, logging, monitoring, …• Start and configure Kubernetes• Manage nodes: kernel upgrades, OS updates, hardware failures, …

GKE is Google hosted and managed Kubernetes• Directly uses upstream open source• Rolls out within 3-5 business days of the latest open source release• Alpha features also now available through ‘alpha clusters’

Page 5: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Google Container Engine (GKE)“It delivers a high-performing, flexible infrastructure that lets us independently scale components for maximum efficiency”

~ Philips (Hue Lights)

“Made our engineers more productive and helped us do more work with less staff” ~ CCP Games (EVE Online)

Page 6: Monitoring kubernetes across data center and cloud

Google Cloud Platform

How Monitoring Works in Google Container Engine

Master

Storage BackendHeapster

Kubelet

cAdvisorNode

Kubelet

cAdvisorNode

Page 7: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Google Container Engine Monitoring Server

Metrics used for self repair, and exposed to end users via Stackdriver

Primary job is to ensure that each Kubernetes master is available● Implements the repair logic for when a cluster is non-responsive● Automatically resizes master machines as the number of nodes grows

Also collects metrics for each cluster● Number of resources (nodes, pods, services, namespaces, etc)● CPU usage, limit, utilization ratio; Memory usage and limit; Page faults;

Disk usage and limit; Uptime● Uses number of nodes for report billing status

Page 8: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Pluggable interface for cloud monitoring

Run Influx and Grafana in the cluster● alternative to Google Cloud Monitoring

Plug in your own!● e.g., Prometheus, Datadog etc.

Kube State metrics: (node status, node capacity, replica state, etc)

Prometheus

Page 9: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Kube State Metrics

● Generates metrics about the state of Kubernetes logical objects(node status, node capacity, replica state, etc)

● Deployed alongside your other applications as a kubernetes service.

● Exposes metrics via HTTP API or Prometheus format

Page 10: Monitoring kubernetes across data center and cloud

Google Cloud Platform

We focus on delivering the capabilities required by enterprise organizations to run and manage kubernetes at scale...● Cluster installers (for AWS and bare metal, to start).● Management software to upgrade, backup, rollback, scale up and down the cluster. ● Console UI that surfaces management functionality, cluster information, and compute

usage to the user and includes add on services (Quay, identity and authentication).

Extending Kubernetes for the Enterprise

Page 11: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Tectonic Extends Upstream Kubernetes

● Container orchestration● Horizontal scale● High availability● Service discovery & load balancer● Installer ● Management console● Painless updates● Cluster scaling● Disaster recovery● Alerts and logging● Security (integrated)● Container registry (Quay)● Integration across environments

Extending Kubernetes for the Enterprise

Security Mgmt

Kubernetes

CoreOS Linux

Cloud Integration

Container Registry

Storage & Compute

apps/container/microservices

Page 12: Monitoring kubernetes across data center and cloud

Google Cloud Platform

Tectonic Kubernetes Security

● Clair: container vulnerability scanning

● KMS integration● LDAP integration● RBAC integration

Extending Kubernetes for the Enterprise

Mgmt

Kubernetes

CoreOS Linux

Cloud Integration

Container Registry

Storage & Compute

apps/container/microservices

Security

Page 13: Monitoring kubernetes across data center and cloud

• SaaS based infrastructure and application monitoring• Focus on modern environments

• Cloud, Containers, Microservices• Dynamic configuration models

• Processing nearly a trillion data points per day• Intelligent Alerting and Insightful Dashboards

• Anomaly and Outlier Detection

Datadog Overview

Page 14: Monitoring kubernetes across data center and cloud

Collecting data is cheap;not having it when you need it can be expensive

Page 15: Monitoring kubernetes across data center and cloud

Operating Systems, Cloud Providers, Containers, Web Servers, Datastores, Caches, Queues and more...

Monitor Everything

Page 16: Monitoring kubernetes across data center and cloud

Datadog● Deployed as a DaemonSet. One

instance per node.● Collects metrics and events from:

○ container engine (eg Docker)○ Kubernetes Heapster○ kube-state-metrics○ Deployed Applications○ Google Monitoring APIs

● Exposes statsd end point for custom metrics.

● Metrics are automatically tagged by PODs, Labels, etc

Page 17: Monitoring kubernetes across data center and cloud
Page 18: Monitoring kubernetes across data center and cloud

Operational Complexity Increases with..

• Number of things to measure• Velocity of change

Page 19: Monitoring kubernetes across data center and cloud

How much we measure?1 instance

• 10 metrics from cloud providers1 operating system (e.g., Linux)

• 100 metrics50~ metrics per application

Page 20: Monitoring kubernetes across data center and cloud
Page 21: Monitoring kubernetes across data center and cloud

Operational Complexity

100instances

500containers

Page 22: Monitoring kubernetes across data center and cloud

Operational Complexity: Scale

160metrics per host

800metrics per host

Assuming 5 containers per host

Page 23: Monitoring kubernetes across data center and cloud

Operational Complexity: Scale

100instances

80,000metrics

Assuming 5 containers per host

Page 24: Monitoring kubernetes across data center and cloud
Page 25: Monitoring kubernetes across data center and cloud

How much we measure?1 instance

• 10 metrics from cloud providers1 operating system (e.g., Linux)

• 100 metrics50~ metrics per applicationN containers

• 150*N metricsMetric

s Overload!

Page 26: Monitoring kubernetes across data center and cloud

Operational Complexity Increases with..

• Number of things to measure• Velocity of change

Page 27: Monitoring kubernetes across data center and cloud

Source: Datadog

Page 28: Monitoring kubernetes across data center and cloud

Operational Complexity Increases with..

• Number of things to measure• Velocity of change

Page 29: Monitoring kubernetes across data center and cloud

Monitoring Questions

• Where is a given container running?• What is the overall capacity of my cluster?• What port(s) are my applications running on?• What’s the total throughput of my application?• What’s its response time per tag? (app, version, data

center)

• What’s the distribution of 5xx error per container? What about by data center?

Page 30: Monitoring kubernetes across data center and cloud

Host Centric

Page 31: Monitoring kubernetes across data center and cloud

Service Centric

Page 32: Monitoring kubernetes across data center and cloud
Page 33: Monitoring kubernetes across data center and cloud
Page 34: Monitoring kubernetes across data center and cloud

Query Based Monitoring“What’s the average throughput of application:nginx per version ?”

“Alert me when one of my pod from replication controller:foo is not behaving like the others?”

“Show me rate of HTTP 500 responses from nginx”“… grouped by data center … running my app version 2….”

Page 35: Monitoring kubernetes across data center and cloud

Service Discovery

Docker API Kubernetes

Monitoring AgentContainer

A O A O

Containers List &Metadata

Additional Metadata(Tags, etc)

Config Backends

Integration Configurations

Host Level Metrics

Page 36: Monitoring kubernetes across data center and cloud
Page 37: Monitoring kubernetes across data center and cloud

Q&AYou can also follow us on Twitter:

@datadoghq@googlecloud@tectonicstack