webinar: scaling mongodb through sharding - a case study with cignex datamatics

27
CIGNEX Datamatics Con1idential www.cignex.com Scaling MongoDB with Sharding – A Case Study Presented by Yash Badiani and Rahul Nair

Upload: mongodb

Post on 15-Jan-2015

1.210 views

Category:

Technology


0 download

DESCRIPTION

This webinar will walk through the solution CIGNEX developed for a real-time event logging application along with some of the key technical considerations, like selecting the proper shard key. Yash will explain the key decision factors and performance statistics that went into their solution. By selecting the correct shard key MongoDB is able to handle approximately 30 Million inserts and 5 million updates per hour. This case study will cover everything from hardware recommendations to cluster configuration management with scale.

TRANSCRIPT

Page 1: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Scaling  MongoDB  with  Sharding  –  A  Case  Study  

Presented  by  Yash  Badiani  and  Rahul  Nair          

Page 2: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

About  CIGNEX  Datamatics  

A  subsidiary  of  Datamatics  Global  Services  Limited  

 

2  

Page 3: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Introduction  of  Datamatics  (DGSL)  

•  Mission  –  Experts  in  improving  Enterprise  productivity    through    Process  Engineering  &    Information  Management  Solutions  

•  Key  Highlights  –  Founded  in  1975  

–  Publicly  listed  in  India  

–  Annual  consolidated  revenue  of  US$100  Million  

–  Fortune  500  clients  

–  4,400+  employees  across  22  of1ices  in  9  countries  

Strategic  Alliances  

3  

Page 4: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

What  Does  CIGNEX  Datamatics  Do?  

4  

Since  2000,  making  Open  Source  work  for  

the  enterprise  through  adoption  and  

integration  to:  

•  Address  business  goals  

•  Increase  business  velocity  

•  Lower  the  cost  of  doing  business  

•  Reduce  TCO  

•  Gain  competitive  advantage  

Portal  Solutions   Content    

Solutions  

Big  Data  Solutions  

400+  implementations  worldwide  across  industries  

Page 5: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Where  We  Can  Help  You  

5  

 SOLUTIONS  

Managed  Cloud  Services  -­‐  Develop,  Deploy,  Manage  

VAR/Annual  Product  Subscrip>on  -­‐  Liferay,  Alfresco,  Cloudera  Hadoop,  MongoDB    

Extended  Development  Center  –  Center  of  Excellence    

UI,    Development  ,  Integra>on,    Customiza>on,    Migra>on  ,  Tes>ng,      Training  ,    Support  (24*7)  

User  eXperience    PlaRorm  

Portals   Liferay,  Drupal,  JBoss,  

ZK,  HTML5,  

MuleSoW  

•  Intranet    •  Extranet  •  EAI  •  SOA  

•  S o c i a l  Collabora>on  

•  Mobile  Portals    

Enterprise  Content  Management  

Content  Alfresco,  Adobe  CQ,    

Drupal,  Magento,    

 JBoss,  Moodle,  EphesoW,  

Liferay  

 

 

•  WCM  •  DM  •  RM  •  CMS  •  DAM    

•  E-­‐Commerce  •  E-­‐learning  •  ERP  •  Imaging          Solu>ons  

 SERVICES  

Making  Data  Work  Big  Data   Hadoop,    MongoDB,  Neo4j,  

Flume,  Hive    

Solr,    Pentaho,  JaspersoW  

•  Analy>cs  •  Mobile  •  Social  •  Web  •  Real-­‐>me  

 

•  DW  -­‐  BI  •  Log  Processing  and  Analysis    

•  Enterprise  Search  

Page 6: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

About  the  Presenters  

•  Yash  Badiani  is  the  Big  Data  Practice  Lead  at  CIGNEX  Datamatics  and  focuses  on  Big  Data  Technologies  including  MongoDB  &  Hadoop.  He  has  worked  extensively  on  large  Data  warehousing  &  Business  Intelligence  projects  with  tools  such  as  Business  Objects,  Microsoft  SQL  Server,  Microstrategy,  IBM  Cognos.    

   •  Gaurav  Khambhala  works  at  CIGNEX  Datamatics  as  Technical  Lead.  

He  is  the  senior  member  of  the  PHP  Practice  at  CIGNEX  Datamatics  and  is  involved  on  various  technology  initiatives  like  Big  Data  where  he  focuses  on  integration  of  PHP  with  NoSQL  sources  like  MongoDB.  He  has  a  wide  industry  experience  in  software  development  &  management  in  Open  Source  technologies  such  as  Drupal  &  Moodle  

6  

Page 7: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com   7  

•  CIGNEX  Datamatics  –  Introduction  &  Offerings  •  Use  Case  &  Database  Requirements  •  Challenges  with  Traditional  Databases  •  Why  MongoDB?  •  Solution    

–  Approach  –  Architecture  and  Hardware  Sizing  

•  Scaling  with  Sharding  –  Sharding  Basics  –  Sharding  –  Choosing  the  RIGHT  Shard  Key  –  Benchmarking  with  Results  

•  Key  Takeaways    

Agenda  

Page 8: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Big  Data  Practice  At  CIGNEX  Datamatics  

8  

Brief  Snapshot  

Technology  Partnership  •  ~40  employee  Big  Data  Practice  focused  on  Hadoop,  MongoDB,  Neo4j,  Solr  

•  Professionals  formally  trained  /  certi1ied  from  Cloudera  and  10gen  

•  Expertize  in  Hadoop  Eco-­‐System  (HBase,  Pig,  Hive,  Flume,  Sqoop,  Oozie,  Zookeeper)  

•  Strong  partnerships:  •  System  Integration  partners  with  Cloudera  for  CDH  

•  Global  partner  with  10gen  for  MongoDB  –  multiple  webinars  on  different  solutions  

Page 9: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Our  Offerings  –  Big  Data  

9  

Consulting   Implementation   Support  &  Training  

Consulting  •  Business  Analysis    •  Technology  Evaluation  •  Architecture    •  Design  Framework  •  Cluster  sizing  •  Deployment  planning  •  Proof-­‐of-­‐Concept  •  Health  Check  •  Performance  

Benchmarking  

Implementation  •  UI  Development  •  Application  Integration  •  Customization  •  Migration  •  Testing  •  Performance  Tuning  

Support  &  Training  •  DBA  Support  •  Application  Support  •  Enhancements  •  24*7  Production  

Support(Tier  1/2/3)  •  Trainings  

Page 10: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

 

 

10  

Users   Devices   Load  Balancer   Database  

End  Users   7  Million  Users  

Spread  Across  Geography  

Devices   8  devices  /  user  Home/OfMice/Anywhere  

App.  Layer  

Load  Balancer  Receives    high  volume  of  concurrent  CRUD  requests  Routes  request  trafMic  to  DB  cluster  

Data  Storage  

mongoDB  cluster  Sharding  Replication  with  Automatic  Failover  Indexes    

Use  Case  

Page 11: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Database  Requirements  

11  

Flexibility    in  Schema  

High  Performance  

Availability  

Agility  in    Development    &  Deployment  

Enterprise    Level  Support  

Page 12: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Support  limited  to    terabytes  

 

Limitations  of  RDBMS  

RDBMS  can’t  manage  all  dimensions    of  data  with  speed  &  at  lower  cost.  

Manage  only  Structured  Data  

RDBMS  doesn’t  scale  inherently  

Feature  rich  but  slow  performance  

Complex  to  Shard/Partition  due  to  maintenance  of  schema  

Limitations  in  scaling  High  volume  of  concurrent  CRUD  

$  

Specialized  Hardware  -­‐  Expensive  

Vertical  Scaling  expensive  and  dif1icult  to  scale  

12

Page 13: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

•  Global  Coverage  •  24x7  Support  •  Ease  of                  maintenance  

Why  MongoDB?  

13  

•  Programming                  Language  drivers  •  Shorter  Dev  cycle  •  Faster  deployment  

•  Automatic  failover  •  Redundancy  •  100%  uptime    

Agility  in    Development      &  Deployment  

Availability  

•  Easy  integration  •  Ease  of  schema                  design  •  Document  oriented                  storage  

 Flexibility    in  Schema  

Schema  free  

Replication  

Driver  Support  

Enterprise  Level  Support  

Strong  Community  

•  Concurrent  CRUD    •  Fast  Updates  •  Write  distribution                  with  Sharding  

High    Performance  

Indexes  &  Sharding  

Page 14: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Solution:  Approach  

14  14  

 

 

• Schema  Design                                                                    • Collections  and  Field  De1initions  Schema  

• Document  Size  • Total  expected  data  size  Database  Size  

• Frequency  of  CRUD  operations  • Read/Write  ratio  Concurrent  Load  

• Automatic  Failover  • Replication  and  Backup  Availability  

• Working  Set  • Access  Patterns  Indexing  

• Horizontal  Scaling  • Query  Performance  Sharding  

• Cluster  sizing  • RAM  and  Disk  storage  Hardware  Sizing  

Page 15: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Solution:  Architecture  

15  

 mongod  Secondary  

 

mongod  Primary   Mongod  

Arbiter  

mongod  Secondary  

 mongod  Primary  

 Mongod  Arbiter  

mongod  Secondary  

 mongod  Primary  

 Mongod  Arbiter  

mongod  Secondary  

mongod  Primary   Mongod  

Arbiter  

mongod  Secondary  

mongod  Primary   Mongod  

Arbiter  

mongos  

mongos  

mongos  

mongos  

mongos  

mongos  

App  

Server  

App  

Server  

App  

Server  

App  

Server  

App  

Server  

App    

Server  

Data  Tier  

mongod   mongod  

mongod  

Con1ig  Servers  

App  Tier  

Shard  1  

Shard  2  

Shard  3  

Shard  4  

Replica  Set  

Routed  Requests  from  mongos  to  shards  

Routed  for  non-­‐sharded  collections  

Load    

Balancer  

Page 16: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Sharding  –  What  is  it?  

16  

•  Distributes  single  logical  database  system  across  clusters  

•  Allows  to  partition  a  collection  across  #  of  mongod  

instances(shards)  

•  Advantages:  –  Increases  write  capacity  

–  Ability  to  support  larger  working  sets  

–  Raises  limits  of  data  size  beyond  a  single  node  

 

Page 17: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Sharding  -­‐  Features  

17  

•  Range-­‐based  Data  Partitioning  

•  Automatic  Data  volume  distribution  

•  Transparent  query  routing  

•  Horizontal  capacity  –  Additional  write  capacity  through  distribution  

–  Right  shard  key  allows  expansion  of  working  set  

 

 

Page 18: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Sharding  –  When  to  use?  

18  

Storage  Drive  

Your  data  set  approaches  or  exceeds  the  storage  capacity  of  a  single  node  in  your  system  

Working  Set  

RAM  

The  size  of  your  system’s  active  working  set  will  soon  exceed  the  capacity  of  the  maximum  amount  of  RAM  for  your  system  

Storage  Drive  

Your  system  has  a  large  amount  of  write  activity,  a  single  MongoDB  instance  cannot  write  data  fast  enough  to  meet  demand,  and  all  other  approaches  have  not  reduced  contention      

Page 19: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Shard  Keys  

•     The  ideal  shard  key  :  

–  Easily  divisible  which  makes  it  

easy  for  MongoDB  to  distribute  

content  among  the  shards  

–  Higher  “randomness”  

–  Targeted  queries  

–  May  need  to  be  computed  

19  

Shard  Keys:  Exist  in  every  document  in  a  collection  that  MongoDB  uses  to  distribute  documents  among  the  shards  like  indexes,  they  can  be  either  a  single  1ield,  or  a  compound  key  

Page 20: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Choosing  Right  Shard  Key  

20  

Different  approach  for  Shard  Keys    

•  Approach  1:  Random  Key    –  UserId  

•  Approach  2:  Coarsely  ascending  key  +  Random  Key  –    

YearMonth  +  UserId  

 

Page 21: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Benchmarking  /  Load  Testing  Approach  

21  

Automated  scripts  with  varied  load    

 

Page 22: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Results  -­‐  INSERTS  

22  

Over  80  million  documents  inserted  with  a  decreasing  threshold  over  10  million  

Over  225  million  documents  inserted  at  a  stable  rate  of  6000  documents/sec  

Approach  1  

Approach  2  

Benchmarks  done  on  8GB  Test  H/W  Machines  

Page 23: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Results  -­‐  UPDATES  

23  

Over  50  million  documents  updated  at  avg.  400  documents/sec  

Over  100  million  documents  updated  at  as  high  as.  4000  documents/sec  

Approach  1  

Approach  2  

Benchmarks  done  on  8GB  Test  H/W  Machines  

Page 24: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Results  –  INSERT,  UPDATE  

24  

>6000  documents/  second  >70  million  records  

>6000  documents/  second  >50  million  records  

Simultaneous  INSERT  

Simultaneous  UPDATE  

Approach  2  

Benchmarks  done  on  8GB  Test  H/W  Machines  

Page 25: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Benchmarking  –  Sharding  Vs  Non  Sharding  

25  

Operation   Sharding  (YearMonth  +  UserId)  

Non-­‐Sharding  

INSERTS   ~6000  docs/sec   ~2900  docs/sec  UPDATES   ~4000  docs/sec   ~620  updates/sec  INSERT  &  UPDATES  

~6000  docs/sec  &  ~6100  docs/sec  

~2000  docs/sec  &  ~600  docs/sec  

Benchmarks  done  on  8GB  Test  H/W  Machines  

Page 26: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

Key  Takeaways  

26  

•  Comprehensive  approach  on  Performance  Tuning  

•  Plan  Early  for  Performance  

•  MongoDB  scales  &  shines  

•  Sharding  scales  INSERTS/UPDATES  vs.  Non  sharding  

•  Sharding  with  Approach  2  (Coarsely  ascending  Key  +  Random  

Key)  provides  sustained  results  &  better  utilization  of  the  RAM    

•  Different  set  of  server/s  for  NON-­‐Sharded  collections  

•  Indexes  to  be  de1ined  carefully  

•  Sharded  collections  to  have  minimal  number  of  indexes  

 

 

Page 27: Webinar: Scaling MongoDB through Sharding - A Case Study with CIGNEX Datamatics

CIGNEX  Datamatics  Con1idential   www.cignex.com  

For  queries  reach  out  to  us  at  [email protected]          

Thank  You.  Any  Questions  ?  

Making  Open  Source  Work