distributed tracing with opentracing, zipkin and kubernetes
TRANSCRIPT
container-solutions.com | @containersoluti | [email protected]
Distributed Tracing with ZipKin &Kubernetes Maximilian Schöfmann@schoefmann
Container Solutions AG@containersoluti
container-solutions.com | @containersoluti | [email protected]
Microservices...
In short, the microservice architectural style is an approach to develop a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.
-- James Lewis and Martin Fowler
www.container-solutions.com | [email protected]
The “Socks Shop”...
www.container-solutions.com | [email protected]
Let’s install it in the cluster...
Distributed Tracing | container-solutions.com
Microservice benefits● Modeling after business domains
● Independent deployment
● Technology diversity
Distributed Tracing | container-solutions.com
Microservice costs● Distribution
● Eventual Consistency
● Operational complexity
Distributed Tracing | container-solutions.com
Microservice requirements● Rapid provisioning
● Monitoring
● Rapid deployment
● Autonomous teams
Distributed Tracing | container-solutions.com
Microservice architectures● Monolithic to microservice architecture
● Apps as a collection of distributed services
● Tools becoming necessary to gather metrics
Distributed Tracing | container-solutions.com
Why distributed tracing?
Example: Google search query● Multiple index lookups● Selecting Ads● Check spelling● Personalise results● Filter DMCA takedowns ● Include relevant images...● ...and videos● ...and news● ...
Distributed Tracing | container-solutions.com
Why distributed tracing?“Per-process logging and metric monitoring have their place, but neither can reconstruct the elaborate journeys that transactions take as they propagate across a distributed system. Distributed traces are these journeys.”
-- Chris Aniszczyk, Cloud Native Computing Foundation
Distributed Tracing | container-solutions.com
Fundamental requirements to make it work
● Ubiquitous deployment
● Continuous monitoring
See also: “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure”http://research.google.com/pubs/pub36356.html (2010)
Distributed Tracing | container-solutions.com
Requirements to make is useful● Low overhead
● Application-level transparency
● Scalability
● (Timely) data availability
Distributed Tracing | container-solutions.com
A distributed trace...
“A tracing infrastructure for distributed
services needs to record information
about all the work done in a system, on
behalf of a given initiator”
Distributed Tracing | container-solutions.com
Data aggregation
Message record:
Record = Message identifier + timestamped event
Data aggregation classes:
● Black box
● Annotation-based
Distributed Tracing | container-solutions.com
● Trace as a tree of nested calls
● Trace trees and spans
Trace data model
www.container-solutions.com | [email protected]
SpanLogged event in a typical span
● Span name● Span start time● Span end time● Trace id● Span id● Span parent id● Any timing information recorded by the instrumentation library (RPC, HTTP)● Additional custom labels (“foo”)
www.container-solutions.com | [email protected]
OpenTracing & ZipKinCommon libraries for several programming languages
➔ Libraries attach a trace context to the thread local storage
➔ RPC friendly (specially when using gRPC)
➔ The data is language-independent
opentracing.io zipkin.io
www.container-solutions.com | [email protected]
Let’s install that also in the cluster...
www.container-solutions.com | [email protected]
Supported languages➔ Javascript
➔ Python
➔ Java
➔ Scala
➔ Ruby
➔ C#
➔ Golang
www.container-solutions.com | [email protected]
Supported frameworks➔ Express (nodejs - http)➔ Jersey, RestEasy, JAXRS2, Apache HttpClient, Mysql (Java, HTTP, gRPC)➔ HDFS, HBASE➔ Spring, Spring Cloud➔ Apache Cassandra ➔ Finagle➔ Rack➔ Golang Context➔ GoKit➔ Akka, Spray, Play➔ Dropwizard➔ Roll your own
www.container-solutions.com | [email protected]
Opentracing Example (Go)
explicitely instrumenting a SQL query in a service written in Go
www.container-solutions.com | [email protected]
Annotations● Arbitrary text
● Key/Value pairs
➔Can be used for common vocabulary, e.g. “http.status_code”, “peer.service”, “sampling.priority”
www.container-solutions.com | [email protected]
Architecture (ZipKin with Scribe + Cassandra)
www.container-solutions.com | [email protected]
Performance
Low overhead is the key!
Sampling is the solution!
… at least partially...
www.container-solutions.com | [email protected]
Sampling➔ 2-stage sampling:
a. Client: Don’t send every trace instrumented
● limits client-side CPU and bandwidth overhead
● adjustable per service, hard to change in one go
b. Server: Don’t persist every trace received
● limits server-side IO and data volume overhead
● adjustable centrally with simple config change
➔ Adaptive sampling to trade off overhead against missing relevant traces
www.container-solutions.com | [email protected]
But what about...
● Proprietary services?
● Ancient/Legacy Services?
● 3rd-Party services outside your control?
www.container-solutions.com | [email protected]
But what about...
● Proprietary services?
● Ancient/Legacy Services?
● 3rd-Party services outside your control?
Proxying!
www.container-solutions.com | [email protected]
Linkerd overview
● Intelligent, adaptive load-balancing
● Global, fine-grained instrumentation
● Application-centric naming
● Powerful traffic routing mechanisms
www.container-solutions.com | [email protected]
Some of the answered questions......with a distributed tracing system are:
● Which parts of my system are slow?● Which call pattern can be optimized with parallelization?● Which calls are redundant?● Which routes are affected by this failing part?● Under which circumstances is it failing?● How often is it failing?● Detect queries issued to read and write masters,
instead of read only replicas
www.container-solutions.com | [email protected]
A word of caution about distributed tracing● Documentation is still rather poor
● Yet another moving part
● Can accumulate huge amounts of data
● Metrics need to be interpreted
● Commercial APM solutions might be an easier route for your use case...
www.container-solutions.com | [email protected]
A word of caution about distributed tracing● Documentation is still rather poor
● Yet another moving part
● Can accumulate huge amounts of data
● Metrics need to be interpreted
● Commercial APM solutions might be an easier route for your use case...
www.container-solutions.com | [email protected]
Demo time...
www.container-solutions.com | [email protected]
www.container-solutions.com | [email protected]
Questions? Want to learn more?
● Come to our 2 day tinyurl.com/microservice-workshop
(November 8. + 9. or at your company on request)
● Follow us on Twitter: @containersoluti
● Read more on our blog: container-solutions.com/blog
● Or just get in touch: [email protected]