aws start-up tour 2009 / sharethis

13
ShareThis on AWS Paco Nathan, Data Insights ShareThis.com AWS Start-Up Tour 2009-06-16

Upload: paco-nathan

Post on 23-Jun-2015

876 views

Category:

Technology


0 download

DESCRIPTION

ShareThis, AWS Start-Up Tour 2009, Sunnyvale

TRANSCRIPT

Page 1: AWS Start-Up Tour 2009 / ShareThis

ShareThis on AWS

Paco Nathan, Data Insights ShareThis.com

AWS Start-Up Tour 2009-06-16

Page 2: AWS Start-Up Tour 2009 / ShareThis

What Does ShareThis Do?

AWS Start-Up Tour 2009-06-16

• “Make it simple to share any online content”

• Social content sharing platform

• ESPN, FOX, CS Monitor, HuffPost, CBS Marketwatch,

Wired, TechCrunch, ThinkGeek, etc.

• When a news story goes viral on a major publisher,

our sharing services must scale-out to keep pace

Page 3: AWS Start-Up Tour 2009 / ShareThis

AWS Start-Up Tour 2009-06-16

Page 4: AWS Start-Up Tour 2009 / ShareThis

Why Our Company Uses AWS

AWS Start-Up Tour 2009-06-16

• >10^6 publishers, >10^9 users, >10^10 urls

• Early stage start-up, < 25 people, “wearing lots of hats”,

ultra fast-paced R&D

• Spikes in popular stories impose demands throughout

the architecture: API services, loggers, DW, BI, etc.

• How can this level of service be built 100% in the cloud?

Page 5: AWS Start-Up Tour 2009 / ShareThis

AWS Start-Up Tour 2009-06-16

http://shar.es/1B7

Page 6: AWS Start-Up Tour 2009 / ShareThis

System Architecture

AWS Start-Up Tour 2009-06-16

• Each service designed for cost-effective, horizontal scale-out

• API served by cluster of LAMP stack + cluster of NginX

• AsterData: nCluster infrastructure “hub-and-spoke” pattern

• Cascading: abstraction layer for tying together components

• Batch jobs on Elastic MapReduce, AsterData SQL/MR

• SQS, EBS, SimpleDB, MTurk, plus other AWS services

Page 7: AWS Start-Up Tour 2009 / ShareThis

AWS Start-Up Tour 2009-06-16

Page 8: AWS Start-Up Tour 2009 / ShareThis

Key Learnings

AWS Start-Up Tour 2009-06-16

• Capability to scale-out horizontally without having to

recode, rebuild, etc. — add new EC2 nodes to clusters

• Authoritative data + backups in S3, great approach for DR

• Wide range of use cases implemented: widget API,

log clean-up, vertical search, business intelligence, etc.

• Developers launch their own sandbox instances —

makes dev/test/debug cycles more efficient

• Staff enabled to “wear even more hats” with less risk

Page 9: AWS Start-Up Tour 2009 / ShareThis

Cascading + Elastic MapReduce

AWS Start-Up Tour 2009-06-16

Page 10: AWS Start-Up Tour 2009 / ShareThis

Cascading + Elastic MapReduce

AWS Start-Up Tour 2009-06-16

• “Syntax is for humans, APIs are for software”

• Defines apps as set operations applied to data flows

• Engineers & data scientists don’t think in terms of

MapReduce primitives, key/value pairs, etc.

• Integrates Hadoop API + other APIs (S3, SQS, JDBC)

• Expresses end-points as Java design patterns,

compiled code — not just a scramble of scripts

Page 11: AWS Start-Up Tour 2009 / ShareThis

Cascading + Elastic MapReduce

AWS Start-Up Tour 2009-06-16

• Highly scalable, fault-tolerate framework for batch jobs

• Dramatically reduced need for Ops overhead

• Excellent command line tools make the dev/test/debug

cycle very efficient with “Big Data”

• Highly expert staff, very responsive and helpful in forums

• Cascading example code in developer resources:

“LogAnalyzer for CloudFront” and “Multitool”

Page 12: AWS Start-Up Tour 2009 / ShareThis

Hadoop Book / Case Study

AWS Start-Up Tour 2009-06-16

ShareThis case study, "Cascading"

by Chris K Wensel, in…

Page 13: AWS Start-Up Tour 2009 / ShareThis

Contacts

http://sharethis.com

@pacoid on Twitter

AWS Start-Up Tour 2009-06-16