benchmarking solr performance

Search | Discover | Analyze

Confidential and Proprietary © Copyright 2013

Benchmarking Solr

Performance

June 18, 2014

Timothy Potter


My SolrCloud Experience

• At LucidWorks, mostly focused on hardening SolrCloud; Lucene/Solr

committer

• Operated 36 node cluster in AWS for Dachis Group (1.5 years ago, 18

shards ~900M docs)

• Built a Fabric/boto framework for deploying and managing a cluster in

EC2

• Co-author of Solr In Action


Agenda

• Indexing performance tests

• Solr Scale Toolkit

• Next steps


Cluster sizing

How many servers do I need to index X docs?

... shards ... ?

... replicas ... ?

I need N queries per second over

M docs, how many servers do I need?

It depends?!?


Methodology

• Transparent repeatable results– Ideally hoping for something owned by the community

• Synthetic docs ~ 1K each on disk, mix of field types– Data set created using code borrowed from PigMix

– English text fields generated using a Zipfian distribution

• Java 1.7u55, Amazon Linux, r3.2xlarge nodes– enhanced networking enabled, placement group, same AZ

• Stock Solr (cloud) 4.8.1– Using Shawn Heisey’s GC tuning parameters

• Use Elastic MapReduce to generate load– As many nodes as I need to drive Solr!


Indexing Results

Cluster Size # of Shards # of Replicas Reducers Time (secs) Docs / sec

10 10 1 48 1762 73,780

10 10 2 34 3727 34,881

10 20 1 48 1282 101,404

10 20 2 34 3207 40,536

10 30 1 72 1070 121,495

10 30 2 60 3159 41,152

15 15 1 60 1106 117,541

15 15 2 42 2465 52,738

15 30 1 60 827 157,195

15 30 2 42 2129 61,062


Direct Updates

IndexingClient 1

CloudSolrServer(SolrJ)

ZooKeeper/clusterstate.json

Shard 1(leader)

Shard 2(leader)

Shard 3(leader)

<doc>

<doc>

Watch /clusterstate.json

<doc><doc>

compute shardassignment on

clientbatch


Replication

CloudSolrServer(SolrJ)

ZooKeeper/clusterstate.json

Shard 1(leader)

Shard 2(leader)

Shard 3(leader)

<doc>

<doc>

Watch /clusterstate.json

<doc>Shard 1(replica)

Shard 2(replica)

Shard 3(replica)

Blocks for responsefrom replica(s)


Don’t swamp your servers!


Lessons Learned

• Know what throughput your client side is capable of generating– If in MapReduce, index from reducers with speculative execution

disabled

• Don’t change Solr config without good reasons for doing so

• Overshard (but not too much)

• Near-linear scalability as I added nodes!


Query Performance Tests

• All nodes in SolrCloud perform indexing and execute queries

• Using the TermsComponent to build queries based on the terms in each field.

• Harder to accurately simulate user queries over synthetic data– Need mix of faceting, paging, sorting, grouping, boolean clauses, range

queries, boosting, filters (some cached, some not), etc ...

• Does the randomness in your test queries model (expected) user behavior?

• Start with one server (1 shard) to determine baseline query performance.– Look for inefficiencies in your schema and other config settings


Solr Scale Toolkit

• Fabric / Python based toolset for deploying and

managing SolrCloud clusters

• SolrJ-based client application useful for building

tools that need access to cluster state information

in ZooKeeper

• Code to support benchmarks for Solr


Python-based Tools

boto – Python API for AWS (EC2, S3, etc)

Fabric – Python-based tool for automating system admin tasks

over SSH

pysolr – Python library for Solr (sending commits, queries, ...)

kazoo – Python client tools for ZooKeeper

Supporting Cast:

JMeter – run tests, generate reports

collectd – system monitoring

Logstash4Solr – log aggregation

JConsole/VisualVM – monitor JVM during indexing / queries


Solr Scale Toolkit: Demo

• Launch a meta node

– Log agg / basic monitoring using SiLK

• Launch ZooKeeper Ensemble

– 3 nodes to establish quorum

– Setup cron job to clean-up snapshots

• Launch SolrCloud cluster

• Create new collection and index some docs

– Attach JConsole while indexing

• Run a healthcheck on the collection

• Checkout Banana Dashboard

• Backup / Restore

– Requires patch for SOLR-5956

– Use fab patch_jars to update jars and do a rolling restart


• Custom built AMI?

• Block device mapping

– dedicated disk per Solr node

• Launch and then poll status until they are live

– verify SSH connectivity

• Tag each instance with a cluster ID and username

Provisioning machines

fab new_ec2_instances:test1,n=3,instance_type=m3.xlarge


• Two options:– provision 1 to N nodes when you launch Solr cluster

– use existing named ensemble

• Fabric command simply creates the myidfiles and zoo.cfg file for the ensemble– and some cron scripts for managing snapshots

• Basic health checking of ZooKeeper status:– echo srvr | nc localhost 2181

ZooKeeper

fab new_zk_ensemble:zk1,n=3


• Upload a BASH script that starts/stops Solr

• Set system props: jetty.port, host, zkHost, JVM

opts

• One or more Solr nodes per machine

• JVM mem opts dependent on instance type and

# of Solr nodes per instance

• Optionally configure log4j.properties to append

messages to Rabbitmq for Logstash4Solr

integration

SolrCloud

fab new_solrcloud:test1,zk=zk1,nodesPerHost=2


• BASH script that implements:

– start/stop Solr nodes on each EC2 instance

– sets JVM memory options, system properties

(jetty.port), enable remote JMX, etc

– backup log files before restarting nodes

– ensure JVM is killed correctly before restarting

• Environment variables in:

solr-ctl-env.sh

solr-ctl.sh


• Deploy a configuration directory to ZooKeeper

• Create a new collection

• Attach a local JConsole/VisualVM to a remote JVM

• Rolling restart (with Overseer awareness)

• Build Solr locally and patch remote

– Use a relay server to scp the JARs to Amazon network once and then

scp them to other nodes from within the network

• Put/get files

• Grep over all log files (across the cluster)

Miscellaneous Utility Tasks


• fab mine: See clusters I’m running (or for other users too)

• fab kill_mine: Terminate all instances I’m running– Use termination protection in production

• fab ssh_to: Quick way to SSH to one of the nodes in a

cluster

• fab stop/recover/kill: Basic commands for controlling

specific Solr nodes in the cluster

• fab jmeter: Execute a JMeter test plan against your cluster– Example test plan and Java sampler is included with the source

Other useful stuff ...


• Java-based command-line application that uses SolrJ’s

CloudSolrServer to perform advanced cluster

management operations:

– healthcheck: collect metadata and health information from all

replicas for a collection from ZooKeeper

– backup: create a snapshot of each shard in a collection for

backing up to remote storage (S3)

• Framework for building complex tools that benefit from

having access to cluster state information in ZooKeeper

SolrCloud Tools (SolrJ client app)

./tools.sh –tool healthcheck


SiLK Integration

• SiLK: Solr integrated with Logstash and Kibana

– Index time-series data, such as log data (collectd, Solr logs, ...)

– Build cool dashboards with Banana (fork of Kibana)

• Easily aggregate all WARN and more severe log

messages from all Solr servers into logstash4solr

• Send collectd metrics to logstash4solr


SiLK Integration


What’s Next?

• Migrate to using Apache libcloud instead of using boto

directly

• Benchmark mixed work-loads (queries and indexing)

• SiLK is improving rapidly!

• Chaos monkey tests

– integrate jepsen?

• Open source so please kick the tires!


Wrap-up

• Solr Scale Toolkit: https://github.com/LucidWorks/solr-scale-tk

• LucidWorks: http://www.lucidworks.com

• SiLK: http://www.lucidworks.com/lucidworks-silk/

• Solr In Action: http://www.manning.com/grainger/

• Connect: @thelabdude / [email protected]

Questions?

https://github.com/LucidWorks/solr-scale-tk

http://www.lucidworks.com

http://www.lucidworks.com/lucidworks-silk/

http://www.manning.com/grainger/

mailto:[email protected]

benchmarking solr performance

Technology

proprietary copyright

steps confidential

action confidential

user queries

range queries

n queries

ec2 coauthor of solr

x docs