webcamp:front-end developers day. Алексей Ященко, Сергей Руденко...

Post on 08-Aug-2015

85 Views

Category:

Technology

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Frontend Monitoring@Grammarly

Grammarly Products

● Web editor - single page app

● Browser extensions for ○ Chrome, Safari, FireFox

● M$ Office Add-in

● Funnels

Grammarly Products :: Load

● 1M+ active users; 0.5M+ daily active

● 10M+ users planned next year

● 30 services on 300+ servers

● 130M+ requests/day

● 3.8M+ WebSocket connections/day

What’s ur problem,bro?

The problem

● Model != Reality

● 1B websites * X browsers

● Free users: Problem? => ⌘+W | Alt+F4

● Paid users: Problem? => Let’s torture support

The problem :: CI / …

● Bugs

● Daily releases

● Performance testing

● A/B testing

The Solution...

The Solution :: Monitoring

● Monitoring != Tracking● Monitoring:

○ all data are volatile○ helps to assess quality in terms of

■ stability and ■ performance

○ fast problem detection and alerting○ troubleshooting○ different data sources incl. tracking events

Grammarly FE Monitoring :: The Saga

● Manual testing● NewRelic● Errorception● Sentry● …● Profit (Custom Solution)

FE Monitoring @ Grammarly

● Logging○ events with context (userId, UserAgent,

stacktraces, etc...)○ special cases only, no tracing ‘blah-blah’ logs

● TS Metrics○ everything else :)

● Alerting

FE Monitoring Web browser

Elasticsearch x 4

Grafana Kibana

x 2Nginx

Access logs

LogstashStatsD

Graphite

x 2Sensu Checks

Sensu Server

OpsGenie

Logging :: Backend Web browser

Elasticsearch x 4

Grafana Kibana

x 2Nginx

Access logs

LogstashStatsD

Graphite

x 2Sensu Checks

Sensu Server

OpsGenie

TS Metrics :: Backend Web browser

Elasticsearch x 4

Grafana Kibana

x 2Nginx

Access logs

LogstashStatsD

Graphite

x 2Sensu Checks

Sensu Server

OpsGenie

FE Monitoring

metrics codec

Logstash

metrics data~ 2600 RPS

~ 90 GiB / day

logs data~ 450 RPS

~ 50 GiB / day

Nginx access logs

StatsD

logs codec(+source maps)

ElasticSearch

tail *.log files

UDP HTTP

FE Monitoring in numbers

● 38M logs/day

● up to 3K logs/sec @ busy hours

● ~100 Graphite metrics

● 6 servers + 2 shared w/ backend monitoring

Logging :: JS Library

● legacy codebase from raven-js

● named loggers

● log levels (info, warn, error)

● default data in all events (aka MDC)

● scopes (lifetime, session, document)

Logging :: JS Library

kibana screen

TS Metrics

● StatsD metrics: ○ counter (inc/dec)

○ timer: values for which StatsD calculates avg, min, max, percentile

○ set

TS Metrics :: JS Library

Metric name: ui.performance.chrome.popup.loadCardinality is limited by Graphite storage (whisper)● product● version● browser● region (US | World)

TS Metrics :: JS Library

TS Metrics :: UI

Case study● “Creeping” Versions● Active users● WebSocket errors● Stability● Performance● Page loads success/errors percentage● Bugs: …

The Solution :: Adoption Problems● JS monitoring code bugs =>

○ wrong data○ self-DDoS

● FP alerts even on a correct data● Developers aren’t very passionate about writing logs

and metrics● Some education activity is needed to promote usage● turning monitoring into engineering practice

The Solution :: Other Problems● Lack of data verification mechanism

● Graphite disk space issues

● High load as users base grows

● Monitoring infrastructure stability

Near Future Plans● Graphite disk space scaling (Cassandra)● Client/server protocol optimization● Simple API for getting monitoring data for tests● Trends / New Events dashboards/facility● Simplified ES => Graphite metrics routing● Automatic code changes verification with A/B testing &

logs/metrics analysis

Questions?

Sergey Rudenko

Frontend engineer

fb://rudenko.sergii@Rudegrudenko.sergey92@gmail.com

Aleksey Yashchenko

Backend engineer

fb://tuxslayer@tuxslayertuxslayer@gmail.com

top related