REALTIME
AT
STREAMROOT
What Streamroot does ?
o Provide drop in/transparent P2P functionalities for large video broadcasters (vod, live)
o Allow broadcasters to save up to 70% bandwidth, handle huge ramp up/spikes for live events
Ingredients used to build the platform
Love & compassionTest Driven DevelopmentPragmatism (YAGNI !!!)
Ship when you are green (CD, CR)Beck's Simple design rules
Passes the tests
Reveals intention
No duplication
Fewest elements- Kent Beck, 1990
REALTIME
CONCURRENT
LOW LATENCY
RESPONSIVE
SCALABILITY != PERFORMANCE
SCALABILITY ~ CONCURRENCY
(Thinking more in process distribution rather than CPU perf)
… so Go was good candidate from the start
INPUT
SYSTEM
BUSINESS & OPERATIONAL OUTPUT
Processing time ~ Real life events
Realtime components at Streamroot
o Tracker: matching viewers for p2po Signaling server: initial exchange of
clients metadata before direct communication
o Autoscaler & traffic reportero Data pipeline for realtime display
Go signaling servero Relayer: very easy logico Persistent connections (web sockets)o Locking + map registryo Unit holds 150k with no load balancer
(HAProxy), 35,000 msg/secondso Spike in Elixir/Erlang: 200k
Go Trackero A tracker remembers and match viewers
amongst themselveso Responsiveness: in memory vs.
interprocesso Go has few data structures (usually
composition of map primitive)o Tracking and handling concurrent accesso Locking (W, RW), snapshot isolationo Optimistic control, channels
Our Autoscaler (and realtime reporter)
o One Go process is very good at doing different things simultaneously (a cron that exposes JSON on HTTP)
o Autoscaler: watch overall load; control cloud instances; report/store historically on action/data
o What I called Octopus pattern (built in concurrency, solid timers/tickers, http cancellation)
o Go is terse! Module has 619 Lines of code (include both Azure, AWS controls)
o Stable: been running for months without any quirkso (Demo image)
REALTIMEDATA
PIPELINE
Non functional requirements
DISPLAY REALTIME MULTIPLE GRAPHS & AGGREGATES
…ALL UNDER A SECOND
KAFKA CLUSTER
CONSUMER 2CONSUMER 1
TIME SERIE STORAGE
COLLECTOR COLLECTOR COLLECTOR
Low latency
o Realtime pipeline's components have all high write throughput (fast writes)
o Kafka: contiguous write, retention, …, no ack if needed
o Time series DB ( InfluxDB in Go): line protocol, batches, write ahead log, UDP client if really needed, ...
o … Collectors (HTTP end points)
Collectorso Simple Go http endpoints that received,
validate and push JSON text to the pipeline
o Out of the box Go stdlib endpoint holds 10,000 to 20,000 request per seconds (C10k problem solved ;) )
Collectors - Fail fasto If you bail out as soon as you're back on
your feet for otherso Go system language: great io, http
packageso io.LimitedReader, http.MaxBytesReader
Collectors – Payload sizeo For uncompressed JSON, payload size
matterso Go un/marshal is based on reflection.
Larger payloads suffero github.com/pquerna/ffjson on 1.5MB
payloads did not helpo Avoid switching on payload type. A url
path for a type when neededo Do not nest JSON until necessary.
Friendlier down the pipeline
Collectors – Reuse resourceso Great Go package synco sync.Pool (reminder of the Flyweight
pattern)o exampleo … have not tried/measured it yet.
YAGNI ;)
Collectors - Benchingo https://github.com/tsenart/vegetao Great first results even from your local
computero Easily push your bench command line
binary on any cloud machine with more cores
Data permanent storageInfluxDB
o Flexible and powerful time series DB … approaching version 1.0 ;)
o Allow fast query with useful functions (max, percentile, median, derivative, …)
o … less density in points leads to faster query overall
KAFKA CLUSTER
CONSUMER 2CONSUMER 1
TIME SERIE STORAGE
COLLECTOR COLLECTOR COLLECTOR
Go consumerso https://github.com/Shopify/saramao Lots of redundancy in incoming data
(JSON payloads) that can be reduce per broadcaster, per content, etc...
o Consumers:a. pulls JSON payloads from Kafkab. apply logic (reduce, discard)c. push to backend storage or anything
(live geo map, etc...)
Go consumers (2)o Consumer's main logic pattern is our
replay aggregatoro Aggregator:
a. consume live payloadsb. wait for configured timec. flushd. repeat
o Design allow anytime restart; supports failure; can stop service for long period if needed
DASHBOARD DEMO SCREENS
MISCELLANEOUS
YAGNI deploymento Disclaimer: I am not a sysadmin although
I enjoy ito Any Cloud → Ubuntu → Systemdo Use conventions for your binary
deploymento Capture your conventions with Ansibleo Easy rollback / respawn
YAGNI deployment (2)
$PROJECT_NAME-`git rev-parse --short HEAD`-`date +%Y-%m-%d`
# example: collector-073b570-2016-05-27
GOARCH=amd64 GOOS=linux go build -o pkg/$ARTIFACT_NAME
scp pkg/$ARTIFACT_NAME $HOST_ALIAS:~/goapps/$PROJECT_NAME/
ln -fs $ARTIFACT_NAME $PROJECT_NAME
sudo service restart $PROJECT_NAME
YAGNI deployment (3)
[Unit]Description=Payload collectorAfter=network.target
[Service]LimitNOFILE=1000000Environment=KAFKA_BROKERS=x.x.x.x:9092,x.x.x.x:9092,x.x.x.x:9092SyslogIdentifier=streamroot-traff-collectorExecStart="/home/streamroot/goapps/collector/collector"Restart=always
[Install]WantedBy=multi-user.target
Neutral binary. Injection from environment
Dependencies surface$ REPO=github.com/streamroot;$ for PROJECT in `ls $GOPATH/src/$REPO`;$ do go list -f '{{ join .Deps "\n" }}' $REPO/$PROJECT | grep -v $REPO | grep '\.' | grep -v 'internal' | sort | uniq; done
github.com/Azure/azure-sdk-for-gogithub.com/influxdata/influxdbgithub.com/dgrijalva/jwt-gogithub.com/gorilla/contextgithub.com/gorilla/websocketgolang.org/x/time/rategopkg.in/mgo.v2github.com/Shopify/saramagithub.com/rcrowley/go-metricsgithub.com/jeromer/syslogparser
Great place to work.You are responsible for your shit!
http://www.streamroot.io/[email protected]
(with Subject: Golang meetup)Core JS developer
Backend Scalability Engineer
Our developer's bloghttps://indevwith.streamroot.io/
Thank you!
Any questions?
(I am available for new projects)