building a crm on top of elasticsearch
DESCRIPTION
How EverTrue is building a donor CRM on top of ElasticSearch. We cover some of the issues around scaling ElasticSearch and which aspects of ElasticSearch we are using to deliver value to our customers.TRANSCRIPT
![Page 1: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/1.jpg)
+
How we’re building a CRM on top of ElasticSearch
![Page 2: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/2.jpg)
About me (quickly)
Director of Engineering @ EverTrue
Love distributed data stores, love them!
Using ElasticSearch for ~1 year
Mark Greene / @markjgreene
![Page 3: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/3.jpg)
What does EverTrue do?
We help nonprofits raise more money
by allowing them to identify and build relationships with potential donors
![Page 4: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/4.jpg)
How do we do that?
Obligatory database tube
Resolving identities across third party data sources
![Page 5: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/5.jpg)
Cluster Setup•3 Masters, 2 data nodes, AZ aware
•~40m documents, ~25GB
•1 index, 7 types
•5 shards, 1 replica
•Peak work loads equate to 4-5k ops/s
•Using mostly default settings
![Page 6: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/6.jpg)
Data Model•Mapping contains ~50 default fields.
•Most fields are stored as both analyzed and not analyzed
•Leverage dynamic templates for custom fields created by our customers
•Each custom field is stored by as analyzed and not analyzed
![Page 7: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/7.jpg)
Write Path
SQSSQSSQSSQS
Background Background JobsJobs
Background Background JobsJobs
![Page 8: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/8.jpg)
Read Path
3. Load full contact objects w/ meta Offline streaming jobs
ContactContacts APIs API
ContactContacts APIs API
Search Search APIAPI
Search Search APIAPI
1. Submit EverTrue DSL
Query
2. Translate to ES Query, returns contact
Id’s
![Page 9: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/9.jpg)
Arbitrary field filtering
Aggregations ES Hadoop Plugin
![Page 10: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/10.jpg)
Filter Cache: Our first scaling issue
Turns out field cache is unbounded by default...
![Page 11: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/11.jpg)
First Solution
• We set indices.fielddata.cache.size to 50%
• No more OOME Crashes
• Then something else happened....Really slow queries (Problem sign #1)
![Page 12: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/12.jpg)
![Page 13: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/13.jpg)
Slow Query?... More Hardware Right?!
Type m1.xlarge r3.2xlarge r3.2xlarge
Hardware
4 CPU 8 CPU 8 CPU
15GB RAM 60GB RAM 60GB RAM
Round disk thingy SSD’s SSD’s
ES Version v1.1.2 v1.1.2 v1.3.2
has_child query time 12-15s 6-8s ~100ms
![Page 14: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/14.jpg)
Lessons Learned
•Watch the release notes & GH issues like a hawk
•Don’t fall to far behind w/r/t versions
•We waited to long (6 months)
•Keep ES fed with plenty of memory
•Need monitoring to have any hope of understanding operational issues
![Page 15: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/15.jpg)
Settings We Tweaked
• indices.store.throttle.max_bytes_per_sec
• Default 20mb -> 60mb (SSD’s can handle it)
• indices.fielddata.cache.size
• Set to 70% of heap
![Page 16: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/16.jpg)
ES Hadoop Integration
•We use it for a lot of our offline jobs
•One map task per shard
•Small shard deployments may underutilize your hadoop cluster
•Mapper inputs do not contain meta fields like _version
•Forces another read for write back scenarios
![Page 17: Building a CRM on top of ElasticSearch](https://reader033.vdocuments.site/reader033/viewer/2022061202/547c558eb4af9fa11f8b45f8/html5/thumbnails/17.jpg)
tail -f ~/questions