scaling mongodb: avoiding common pitfalls - percona · scaling mongodb: avoiding common pitfalls...

27
Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer [email protected] @jontobs www.linkedin.com/in/jonathanetobin

Upload: others

Post on 10-Jun-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

Scaling MongoDB: Avoiding Common Pitfalls

Jon Tobin Senior Systems Engineer [email protected] @jontobs www.linkedin.com/in/jonathanetobin

Page 2: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Agenda  

•  Document  Design  •  Data  Management  •  Replica3on  &  Failover  •  Sharding  

Page 3: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

Document Design

•  We’re Not in SQL Anymore •  Embed vs Reference

•  The Gray Area

Page 4: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Document  Design  

•  Most  common  source  of  confusion  &  substandard  performance  

•  Dynamic  schemas  are  meant  for  agility  (flexibility)  •  No  subs3tute  for  careful  design  •  You  have  to  live  with  your  decisions  

•  Consider  design  carefully  •  NoSQL  for  a  reason  

•  Denormaliza3on  is  common    •  Flexibility  means  that  there  are  oIen  no  right  or  wrong  

answers  •  Only  the  correct  design  for  your  use  case  

•  MongoDB  is  not  always  the  correct  solu3on  •  Consider  mixing  databases  to  maximize  strengths  

Page 5: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

QnD  Doc  Design  Pointers  

Embed  •  Query  performance  priority  •  Fields  are  fairly  sta3c  •  Size  of  doc  can  be  reasonably  determined  •  Eventual  consistency  acceptable  

{ “_id” : ObjectId("53d98f1…")                                          “firstName" : “Jonathan”, “lastName" : “Tobin”, “year" : 3, “classes" : [

{ “class” : “Calc 101”,

“credits” : 3, }, { -etc-

}

{                                          “_id” : ObjectId("53d98f1…") “firstName" : “Jonathan”, “lastName" : “Tobin”, “year" : 3, “classes" : [

ObjectID(<of_class_1>), ObjectID(<of_class_2>), ObjectID(<of_class_3>), ]

}

Reference  •  Insert  performance  priority  •  Updates  are  common  •  Immediate  consistency  necessary  •  Field  size  can’t  be  determined  

Page 6: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Finding  Middle  Ground  

•  Embed  fields  that  are  oIen  fetched  •  If  they  don’t  grow  boundlessly  

•  Limit  growing  keys  to  1/per  doc  •  Move  to  last  key  

•  Reference  fields  that  are  vola3le  •  Or  are  occasionally  queried  

•  Atomicity  can  be  achieved  @  single  doc  level  •  Take  care  in  design  •  Try  TokuMX  (ACID  &  MVCC)  

•  Index  judiciously  •  Re-­‐evaluate  oIen  

•  Store  relevant  data  •  Archive  old  data  (when  possible)  •  Or  delete  

Page 7: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Doc  Design  –  Useful  Resources  

•  MongoDB:  The  Defini3ve  Guide  -­‐  On  Amazon  •  A  li^le  outdated,  but  s3ll  useful  

•  MongoDB  Doc  Design  •  MongoDB  Use  Cases  -­‐  Design  

Page 8: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

Data Management

•  Why & How •  Details & Uses •  Other Options

Page 9: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Data  Management  

Why?  •  Ephemeral  data  •  Logs  •  Sensor  readings  •  Space  management  •  Performance??  

•  wiredTiger Op3ons  •  TTL  •  Capped  Collec3on  •  Par33oned  Collec3ons  (TokuMX)  •  “Ghe^o  Par33oning”    

Page 10: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Capped  Collec?ons  

Good  •  Easy  to  use  

•  Db.createCollection(“test” , { capped : true , size : <size_in_bytes> , max : <opt_max_no_of_docs> } )

•  Automa3cally  drop  documents  •  Based  on  inser3on  order  

•  Easy  to  tail  Bad  •  No  sharding  •  Limited  ability  to  update  docs  •  No  deletes  •  Unpredictable  performance  @  scale    

•  Especially  queries  

Page 11: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

TTL  Internals  

•  Addi3onal  Index  •  Single  key  only  (date())  

•  Uses  a  background  thread  on  every  RS  member  •  Runs  EVERY  60  seconds  •  Query  full  collec3on  by  TTL  •  Deletes  every  doc  “>”  expireAfterSeconds  older  than  current  server  

3me  •  Single  doc  deletes  

•  Docs  are  only  deleted  from  primary  •  Thread  s3ll  runs  on  every  secondary    

Page 12: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Time  to  Live  (TTL)  Index  

Good  •  Easy  to  use  

•  Db.coll.createIndex( {“modDate” : 1} , {expireAfterSeconds : 600} )

•  Automa3c  expira3on  of  docs  Bad  •  Query  &  single  doc  deletes  •  No  guarantees    Math  1000  inserts/sec  *  600  (expireAfterSeconds)=    

600,000  single  doc  deletes!!  

Page 13: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Other  Op?ons  

•  Par33oned  Collec3ons  •  TokuMX  v1.5  and  later  •  Splits  collec3on  into  separate  files  based  on  key  •  Like  par33oned  tables  in  MySQL  •  Deletes  are  “batched”  and  lightning  fast  

•  “Ghe^o  Par33oning”  •  Separate  your  data  into  collec3ons  based  on  3me  •  Difficult  to  s3tch  collec3ons  together  •  Need  to  keep  track  of  collec3ons  in  applica3on  •  Dropping  a  collec3on  is  very  efficient  (rela3vely  speaking)  

Page 14: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

TTL  Tips  

•  When  adding  TTL  to  exis3ng  collec3on  •  Be  careful  of  first  run!!!  

•  Monitor  disk  u3liza3on  closely  1.  Increase  resource  pool  2.  Shard  

•  Keep  an  eye  on  fragmenta3on  •  Consider  using  an  expireAt field  instead  

•  Will  set  the  clock  3me  expira3on  instead  •  Can  use  low  workload  period  for  dele3ons  

•  Check  out  preso  by  Kim  Wilkins  from  ObjectRocket  •  In  “Useful  Resources”  slide  (hidden)  

Page 15: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

TTL  -­‐  Useful  Resources  

•  Kimberly  Wilkins  (ObjectRocket)  Math  is  Hard:  TTL  Configura3on  &  Considera3ons  

•  MongoDB  Indexing  Whitepaper    •  Open  TTL  Jira  Ticket  •  MongoDB  TTL  Tutorial  •  MongoDB  Google  Group  Discussion-­‐  Par33oning    •  TokuMX  Par33oned  Collec3ons  

Page 16: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

Durability & High Availability

•  Life of a Write •  Durability

•  Replication’s Effects •  Considerations Along Your Journey

Page 17: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Life  of  a  Write  

1.  Send  by  applica3on  2.  Receipt  by  mongod  3.  Applica3on  in  memory  4.  Applica3on  to  journal  (if  enabled)  5.  Write  to  data  file  6.  Applica3on  to  replica3on  journal  (oplog)  7.  writeConcern  determines  “strength  of  guarantee”  (more  to  

come)  •  Default  “acknowledged”  (in  memory)  

8.  Acknowledge  of  write  sent  to  host  (depends  on  writeConcern)    

Page 18: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Replica?on  

•  Every shard consists of a replica set •  Each replica set has a Primary & (n) Secondaries or Arbiters (no data) •  Secondaries read the Primary’s replication log (oplog) and apply it

asynchronously

Page 19: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Replica?on  Lag  

•  Secondaries  may  lag  behind  the  primary  •  Rs.status() will  get  the  current  replica  set  config  and  replica3on  

informa3on  •  Will  affect:  

•  Durability  •  Elec3ons  •  Write  throughput  (depending  on  writeConcern)  

Why?  •  Primary  has  concurrency.  Replica3on  is  single  threaded  •  High  resource  u3liza3on  on  secondary  

•  Likely  I/O  bound  •  Network  latency  

Page 20: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Concerns,  Tips  &  Tricks  

•  Lag  affects  everything  we’ve  talked  about  •  1000  inserts/sec  +  5  sec  lag  +  elec3on  

•  5000  inserts  rolled  back    •  Determine  it’s  source  and  a^empt  to  eliminate  

•  Most  common  cause  is  single  threaded  replica3on  •  Double  check  network  latency  between  hosts  

•  Solu3on:  micro-­‐sharding  •  One  server  –  mul3ple  mongods  •  Parallelize  replica3on  

•  TokuMX  –  Read  Free  Replica3on  •  Use  Fractal  Tree  to  speed  up  replay  

Page 21: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Replica?on  –  Useful  Resources  

•  K  Chodrow  Elec3on  Internals  Blogs  •  MongoDB  Replica3on  Whitepaper  •  MongoDB  Troubleshoo3ng  Replica  Sets  

Page 22: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

Sharding & Balancing

•  Setup Tips •  Shard Keys

•  A Look Into the Balancer

Page 23: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Setup  Tips  &  Gotchas  

•  Make  use  mul3ple  config  servers  •  Three  is  a  safe  bet  •  Back  them  up  regularly  

•  One  mongos  per  app  server  •  When  conver3ng  from  single  RS  

•  Point  all  app  servers  to  mongos  •  Prevent  app  servers  from  connec3ng  directly  to  RS  

•  Chunk  size  determines  “efficiency”  of  balancing  •  Also  effects  frequency  of  splixng  overweight  chunks  

•  Make  sure  config  servers  stay  up  •  One  server  down  prevents  shard  reconfig  ops  

•  Balancing  •  Splixng    

•  Use  DNS  CNAMEs  to  refer  to  config  servers  •  Otherwise  restart  every  mongod  &  mongos  on  rename  

Page 24: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Shard  Keys  

•  For  insert  speed:  •  High-­‐entropy  shard  key  (mostly  random).  •  Balances  load  across  all  shards.  •  Avoid  migra3ons,  too  expensive  in  MongoDB.  •  “Sca^er”  is  good.  

•  For  query  speed:  •  Low-­‐entropy  shard  key  (mostly  sequen3al).  •  Range  queries  should  only  hit  1  shard.  •  “Sca^er”  is  bad.  

Page 25: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Balancing  

•  Main  problem:  random  workload  •  Everything  relies  on  shard  (secondary)  key  •  Documents  aren’t  stored  in  chunk  (shard  key)  order  

•  Three  biggest  opera3ons  are  high  ac3vity  1.  Range  query  on  shard  key  for  chunk  2.  Dona3on  to  recipient  shard  3.  Range  delete  on  shard  key  for  dona3ng  shard  

Take  away:  •  You  have  to  balance  

•  Don’t  put  it  off  for  long  periods  •  Schedule  balancing  during  low  impact  periods  •  Manually  balance  

Page 26: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Sharding  Resources  

•  MongoDB  Balancing  Process  •  MongoDB  Balancing  Thresholds  •  MongoDB:  The  Defini3ve  Guide  -­‐  On  Amazon  

•  A  li^le  outdated,  but  s3ll  useful  •  TokuMX  Balancing  Improvements  &  Benchmarks  •  Managing  the  Balancer  

Page 27: Scaling MongoDB: Avoiding Common Pitfalls - Percona · Scaling MongoDB: Avoiding Common Pitfalls Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs

www.percona.com

Use Discount Code “WebinarPLAM”

for an additional 40 Euros off standard registration rates!