tapjoy openstack summit paris breakout session

23
Tapjoy & OpenStack Delivering Billions of Requests Daily Wes Jossey Head of Operations @Tapjoy

Upload: weston-jossey

Post on 13-Jul-2015

429 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Tapjoy OpenStack Summit Paris Breakout Session

Tapjoy & OpenStackDelivering Billions of

Requests Daily

Wes JosseyHead of Operations @Tapjoy

Page 2: Tapjoy OpenStack Summit Paris Breakout Session

Tapjoy

● Global App-Tech Startup● We Power For Mobile Developers:

○ Monetization○ Analytics○ User Acquisition○ User Retention

● 450M+ Monthly Users Across 270k+ Apps● Worldwide Presence

Page 3: Tapjoy OpenStack Summit Paris Breakout Session

Technical Details

● Early AWS Adopter. ● Grew Predominantly on AWS.● Over 1,100 AWS VMs Daily (10/2014)● Active Regions in Asia, Europe, N.A.● Over One Trillion Requests Handled

Annually

Page 4: Tapjoy OpenStack Summit Paris Breakout Session

Tech Philosophy

● Compute (EC2 & Nova) Driven Company○ Operate Your Own Infrastructure

■ But Not Necessarily Built-From-Scratch○ Zero Heart-Attack Nodes

■ All Nodes Are Ephemeral■ Data is Always Distributed■ Failure is Always Tolerated■ Misbehaving Instances Are Terminated Quickly

Page 5: Tapjoy OpenStack Summit Paris Breakout Session

Services We Use

● SQS○ Simple, Inexpensive, Durable. ○ Currently Building New Internal System Influenced

by SQS, but with Different Guarantees○ No Lock-In (See https://github.com/Tapjoy/chore)

● RDS○ No Lock in. Simple. Easy.

● Cloudwatch (but also statsd)

Page 6: Tapjoy OpenStack Summit Paris Breakout Session

Services We Use Cont.

● ELB○ SSL Termination Only. Routing Handled Elsewhere.

● Auto-Scaling○ Traffic can fluctuate 30% peak to valley

● S3○ Where we store ALL the things○ Still price competitive for what it provides. No plans

to leave as of today.

Page 7: Tapjoy OpenStack Summit Paris Breakout Session

Use Compute Everywhere

● Every Dev Has Access to Either AWS or Tapjoy-1 (Tapjoy’s OpenStack Deployment)

● Simulate Changes Against Useful Data● Test Algorithms on Large Hadoop Clusters● Practice for Failure With Access to Real

Services (not mock endpoints)

Page 8: Tapjoy OpenStack Summit Paris Breakout Session

Going Hybrid● We Spend in the Millions on AWS● Picked Data-Science Infrastructure because

of Portability, and Ability to Leverage More Nodes

● Lower Risk than Tier-1 Production Services● Wanted a Partner to Maintain OpenStack

like Amazon ‘Maintains’ AWS● We Want to Operate Apps

Page 9: Tapjoy OpenStack Summit Paris Breakout Session

OpenStack Timeline

Page 10: Tapjoy OpenStack Summit Paris Breakout Session

Vendors (It Matters)

● Metacloud○ Verified our Design○ Deployed Openstack○ Provisioned Network○ Allowed Us to Focus on Business Applications

● Equinix○ Cooling & Power Design○ Remote Hands○ Went Above and Beyond on Numerous Occasions

Page 11: Tapjoy OpenStack Summit Paris Breakout Session

Vendors: Full List

● Metacloud● Equinix● Quanta● Cumulus● Level3● Newegg

Page 12: Tapjoy OpenStack Summit Paris Breakout Session

Challenges● Hardware Delays Killed Our Timelines

○ Blew through our contingency windows.○ Hurt our budgets.○ Delayed subsequent purchases

● Setting Up IP Transit Can Be Slow● No Physical Presence in DC

○ Also a Pro● No Internal Previous Success Story… So

Lots of Skepticism

Page 13: Tapjoy OpenStack Summit Paris Breakout Session

The Not So Glamorous Job

● Negotiations Can Be Exhausting● If You’re An Engineer, the Turn Around Time

Can Be Frustrating● You Probably Need a Gantt Chart● There’s Nothing Agile About Writing a Big

Check

Page 14: Tapjoy OpenStack Summit Paris Breakout Session

348 ‘Data’ All Purpose Nodes● Quanta S910-X31E: 12 Node Configuration● Per Node

○ Intel 1265Lv3 @ 2.5GHz○ 4x1TB 7200RPM ○ 32GB RAM○ Dual 1Gig NIC

● ‘Recyclable’ for Other Tasks if we Evolve

Tapjoy-1: Data Nodes

Page 15: Tapjoy OpenStack Summit Paris Breakout Session

12 ‘Management’ Nodes● Quanta S180: 4 Node Configuration● Per Node

○ Intel 2650v2 x2 @2.60GHz○ 128GB RAM○ 6x480GB SSD○ Dual 10Gig NIC

Tapjoy-1: Management Nodes

Page 16: Tapjoy OpenStack Summit Paris Breakout Session

Glamor Shot

Page 17: Tapjoy OpenStack Summit Paris Breakout Session

Same Price, Different Outcome

Page 18: Tapjoy OpenStack Summit Paris Breakout Session

Diagrams!

Page 19: Tapjoy OpenStack Summit Paris Breakout Session

High-Level Request Flow Architecture

Page 20: Tapjoy OpenStack Summit Paris Breakout Session

Detailed Flow

Page 21: Tapjoy OpenStack Summit Paris Breakout Session

Data Pipeline

Tapjoy-1

Page 22: Tapjoy OpenStack Summit Paris Breakout Session

Plan For Failure

● Hardware○ I’m Not Saying You Shouldn’t Use CEPH…

■ But You’ll Notice it’s Absent Here● Service Boundaries

○ Have Hardware & Software Contingencies■ Backup Links■ Temporary Cache(s)

○ Actually Test Failure in Production

Page 23: Tapjoy OpenStack Summit Paris Breakout Session

Info

● Twitter! @dustywes● Email: [email protected]