aws start-up tour 2009 / sharethis
DESCRIPTION
ShareThis, AWS Start-Up Tour 2009, SunnyvaleTRANSCRIPT
ShareThis on AWS
Paco Nathan, Data Insights ShareThis.com
AWS Start-Up Tour 2009-06-16
What Does ShareThis Do?
AWS Start-Up Tour 2009-06-16
• “Make it simple to share any online content”
• Social content sharing platform
• ESPN, FOX, CS Monitor, HuffPost, CBS Marketwatch,
Wired, TechCrunch, ThinkGeek, etc.
• When a news story goes viral on a major publisher,
our sharing services must scale-out to keep pace
AWS Start-Up Tour 2009-06-16
Why Our Company Uses AWS
AWS Start-Up Tour 2009-06-16
• >10^6 publishers, >10^9 users, >10^10 urls
• Early stage start-up, < 25 people, “wearing lots of hats”,
ultra fast-paced R&D
• Spikes in popular stories impose demands throughout
the architecture: API services, loggers, DW, BI, etc.
• How can this level of service be built 100% in the cloud?
AWS Start-Up Tour 2009-06-16
http://shar.es/1B7
System Architecture
AWS Start-Up Tour 2009-06-16
• Each service designed for cost-effective, horizontal scale-out
• API served by cluster of LAMP stack + cluster of NginX
• AsterData: nCluster infrastructure “hub-and-spoke” pattern
• Cascading: abstraction layer for tying together components
• Batch jobs on Elastic MapReduce, AsterData SQL/MR
• SQS, EBS, SimpleDB, MTurk, plus other AWS services
AWS Start-Up Tour 2009-06-16
Key Learnings
AWS Start-Up Tour 2009-06-16
• Capability to scale-out horizontally without having to
recode, rebuild, etc. — add new EC2 nodes to clusters
• Authoritative data + backups in S3, great approach for DR
• Wide range of use cases implemented: widget API,
log clean-up, vertical search, business intelligence, etc.
• Developers launch their own sandbox instances —
makes dev/test/debug cycles more efficient
• Staff enabled to “wear even more hats” with less risk
Cascading + Elastic MapReduce
AWS Start-Up Tour 2009-06-16
Cascading + Elastic MapReduce
AWS Start-Up Tour 2009-06-16
• “Syntax is for humans, APIs are for software”
• Defines apps as set operations applied to data flows
• Engineers & data scientists don’t think in terms of
MapReduce primitives, key/value pairs, etc.
• Integrates Hadoop API + other APIs (S3, SQS, JDBC)
• Expresses end-points as Java design patterns,
compiled code — not just a scramble of scripts
Cascading + Elastic MapReduce
AWS Start-Up Tour 2009-06-16
• Highly scalable, fault-tolerate framework for batch jobs
• Dramatically reduced need for Ops overhead
• Excellent command line tools make the dev/test/debug
cycle very efficient with “Big Data”
• Highly expert staff, very responsive and helpful in forums
• Cascading example code in developer resources:
“LogAnalyzer for CloudFront” and “Multitool”
Hadoop Book / Case Study
AWS Start-Up Tour 2009-06-16
ShareThis case study, "Cascading"
by Chris K Wensel, in…
Contacts
http://sharethis.com
@pacoid on Twitter
AWS Start-Up Tour 2009-06-16