benchmarking solr performance
TRANSCRIPT
Search | Discover | Analyze
Confidential and Proprietary © Copyright 2013
Benchmarking Solr
Performance
June 18, 2014
Timothy Potter
Confidential and Proprietary © Copyright 2013
My SolrCloud Experience
• At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr
committer
• Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18
shards ~900M docs)
• Built a Fabric/boto framework for deploying and managing a cluster in
EC2
• Co-author of Solr In Action
Confidential and Proprietary © Copyright 2013
Agenda
• Indexing performance tests
• Solr Scale Toolkit
• Next steps
Confidential and Proprietary © Copyright 2013
Cluster sizing
How many servers do I need to index X docs?
... shards ... ?
... replicas ... ?
I need N queries per second over
M docs, how many servers do I need?
It depends?!?
Confidential and Proprietary © Copyright 2013
Methodology
• Transparent repeatable results– Ideally hoping for something owned by the community
• Synthetic docs ~ 1K each on disk, mix of field types– Data set created using code borrowed from PigMix
– English text fields generated using a Zipfian distribution
• Java 1.7u55, Amazon Linux, r3.2xlarge nodes– enhanced networking enabled, placement group, same AZ
• Stock Solr (cloud) 4.8.1– Using Shawn Heisey’s GC tuning parameters
• Use Elastic MapReduce to generate load– As many nodes as I need to drive Solr!
Confidential and Proprietary © Copyright 2013
Indexing Results
Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec
10 10 1 48 1762 73,780
10 10 2 34 3727 34,881
10 20 1 48 1282 101,404
10 20 2 34 3207 40,536
10 30 1 72 1070 121,495
10 30 2 60 3159 41,152
15 15 1 60 1106 117,541
15 15 2 42 2465 52,738
15 30 1 60 827 157,195
15 30 2 42 2129 61,062
Confidential and Proprietary © Copyright 2013
Direct Updates
IndexingClient 1
CloudSolrServer(SolrJ)
ZooKeeper/clusterstate.json
Shard 1(leader)
Shard 2(leader)
Shard 3(leader)
<doc>
<doc>
Watch /clusterstate.json
<doc><doc>
compute shardassignment on
clientbatch
Confidential and Proprietary © Copyright 2013
Replication
CloudSolrServer(SolrJ)
ZooKeeper/clusterstate.json
Shard 1(leader)
Shard 2(leader)
Shard 3(leader)
<doc>
<doc>
Watch /clusterstate.json
<doc>Shard 1(replica)
Shard 2(replica)
Shard 3(replica)
Blocks for responsefrom replica(s)
Confidential and Proprietary © Copyright 2013
Don’t swamp your servers!
Confidential and Proprietary © Copyright 2013
Lessons Learned
• Know what throughput your client side is capable of generating– If in MapReduce, index from reducers with speculative execution
disabled
• Don’t change Solr config without good reasons for doing so
• Overshard (but not too much)
• Near-linear scalability as I added nodes!
Confidential and Proprietary © Copyright 2013
Query Performance Tests
• All nodes in SolrCloud perform indexing and execute queries
• Using the TermsComponent to build queries based on the terms in each field.
• Harder to accurately simulate user queries over synthetic data– Need mix of faceting, paging, sorting, grouping, boolean clauses, range
queries, boosting, filters (some cached, some not), etc ...
• Does the randomness in your test queries model (expected) user behavior?
• Start with one server (1 shard) to determine baseline query performance.– Look for inefficiencies in your schema and other config settings
Confidential and Proprietary © Copyright 2013
Solr Scale Toolkit
• Fabric / Python based toolset for deploying and
managing SolrCloud clusters
• SolrJ-based client application useful for building
tools that need access to cluster state information
in ZooKeeper
• Code to support benchmarks for Solr
Confidential and Proprietary © Copyright 2013
Python-based Tools
boto – Python API for AWS (EC2, S3, etc)
Fabric – Python-based tool for automating system admin tasks
over SSH
pysolr – Python library for Solr (sending commits, queries, ...)
kazoo – Python client tools for ZooKeeper
Supporting Cast:
JMeter – run tests, generate reports
collectd – system monitoring
Logstash4Solr – log aggregation
JConsole/VisualVM – monitor JVM during indexing / queries
Confidential and Proprietary © Copyright 2013
Solr Scale Toolkit: Demo
• Launch a meta node
– Log agg / basic monitoring using SiLK
• Launch ZooKeeper Ensemble
– 3 nodes to establish quorum
– Setup cron job to clean-up snapshots
• Launch SolrCloud cluster
• Create new collection and index some docs
– Attach JConsole while indexing
• Run a healthcheck on the collection
• Checkout Banana Dashboard
• Backup / Restore
– Requires patch for SOLR-5956
– Use fab patch_jars to update jars and do a rolling restart
Confidential and Proprietary © Copyright 2013
• Custom built AMI?
• Block device mapping
– dedicated disk per Solr node
• Launch and then poll status until they are live
– verify SSH connectivity
• Tag each instance with a cluster ID and username
Provisioning machines
fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge
Confidential and Proprietary © Copyright 2013
• Two options:– provision 1 to N nodes when you launch Solr cluster
– use existing named ensemble
• Fabric command simply creates the myidfiles and zoo.cfg file for the ensemble– and some cron scripts for managing snapshots
• Basic health checking of ZooKeeper status:– echo srvr | nc localhost 2181
ZooKeeper
fab new_zk_ensemble:zk1,n=3
Confidential and Proprietary © Copyright 2013
• Upload a BASH script that starts/stops Solr
• Set system props: jetty.port, host, zkHost, JVM
opts
• One or more Solr nodes per machine
• JVM mem opts dependent on instance type and
# of Solr nodes per instance
• Optionally configure log4j.properties to append
messages to Rabbitmq for Logstash4Solr
integration
SolrCloud
fab new_solrcloud:test1,zk=zk1,nodesPerHost=2
Confidential and Proprietary © Copyright 2013
• BASH script that implements:
– start/stop Solr nodes on each EC2 instance
– sets JVM memory options, system properties
(jetty.port), enable remote JMX, etc
– backup log files before restarting nodes
– ensure JVM is killed correctly before restarting
• Environment variables in:
solr-ctl-env.sh
solr-ctl.sh
Confidential and Proprietary © Copyright 2013
• Deploy a configuration directory to ZooKeeper
• Create a new collection
• Attach a local JConsole/VisualVM to a remote JVM
• Rolling restart (with Overseer awareness)
• Build Solr locally and patch remote
– Use a relay server to scp the JARs to Amazon network once and then
scp them to other nodes from within the network
• Put/get files
• Grep over all log files (across the cluster)
Miscellaneous Utility Tasks
Confidential and Proprietary © Copyright 2013
• fab mine: See clusters I’m running (or for other users too)
• fab kill_mine: Terminate all instances I’m running– Use termination protection in production
• fab ssh_to: Quick way to SSH to one of the nodes in a
cluster
• fab stop/recover/kill: Basic commands for controlling
specific Solr nodes in the cluster
• fab jmeter: Execute a JMeter test plan against your cluster– Example test plan and Java sampler is included with the source
Other useful stuff ...
Confidential and Proprietary © Copyright 2013
• Java-based command-line application that uses SolrJ’s
CloudSolrServer to perform advanced cluster
management operations:
– healthcheck: collect metadata and health information from all
replicas for a collection from ZooKeeper
– backup: create a snapshot of each shard in a collection for
backing up to remote storage (S3)
• Framework for building complex tools that benefit from
having access to cluster state information in ZooKeeper
SolrCloud Tools (SolrJ client app)
./tools.sh –tool healthcheck
Confidential and Proprietary © Copyright 2013
SiLK Integration
• SiLK: Solr integrated with Logstash and Kibana
– Index time-series data, such as log data (collectd, Solr logs, ...)
– Build cool dashboards with Banana (fork of Kibana)
• Easily aggregate all WARN and more severe log
messages from all Solr servers into logstash4solr
• Send collectd metrics to logstash4solr
Confidential and Proprietary © Copyright 2013
SiLK Integration
Confidential and Proprietary © Copyright 2013
What’s Next?
• Migrate to using Apache libcloud instead of using boto
directly
• Benchmark mixed work-loads (queries and indexing)
• SiLK is improving rapidly!
• Chaos monkey tests
– integrate jepsen?
• Open source so please kick the tires!
Confidential and Proprietary © Copyright 2013
Wrap-up
• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk
• LucidWorks: http://www.lucidworks.com
• SiLK: http://www.lucidworks.com/lucidworks-silk/
• Solr In Action: http://www.manning.com/grainger/
• Connect: @thelabdude / [email protected]
Questions?