solrcloud and shard splitting
DESCRIPTION
Presented on 8th June 2013 at the first Bangalore Lucene/Solr Meetup.TRANSCRIPT
SolrCloud and Shard Splitting
Shalin Shekhar Mangar
Bangalore Lucene/Solr Meetup8th June 2013
Who am I?
● Apache Lucene/Solr Committer and PMC member
● Contributor since January 2008
● Currently: Engineer at LucidWorks
● Formerly with AOL
● Email: [email protected]
● Twitter: shalinmangar
● Blog: http://shal.in
Bangalore Lucene/Solr Meetup8th June 2013
SolrCloud: Overview
● Distributed searching/indexing
● No single points of failure
● Near Real Time Friendly (push replication)
● Transaction logs for durability and recovery
● Real-time get
● Atomic Updates
● Optimistic Concurrency
● Request forwarding from any node in cluster
● A strong contender for your NoSQL needs as well
Bangalore Lucene/Solr Meetup8th June 2013
Bangalore Lucene/Solr Meetup8th June 2013
Document Routing
80000000-bfffffff
00000000-3fffffff
40000000-7fffffff
c0000000-ffffffff
shard1shard4
shard3 shard2
1f273c71
(MurmurHash3)
1f27000
01f27 ffffto
(hash)
shard1
q=my_queryshard.keys=BigCo!
numShards=4router=compositeId
id = BigCo!doc5
Bangalore Lucene/Solr Meetup8th June 2013
SolrCloud Collections API
● /admin/collections?action=CREATE&name=mycollection– &numShards=3
– &replicationFactor=4
– &maxShardsPerNode=2
– &createNodeSet=node1:8080,node2:8080,node3:8080,...
– &collection.configName=myconfigset
● /admin/collections?action=DELETE&name=mycollection
● /admin/collections?action=RELOAD&name=mycollection
● /admin/collections?action=CREATEALIAS&name=south– &collections=KA,TN,AP,KL,...
● Coming soon: Shard aliases
Bangalore Lucene/Solr Meetup8th June 2013
Shard Splitting: Background
● Before Solr 4.3, number of shards had to fixed at the time of collection creation
● Forced people to start with large number of shards
● If a shard ran too hot, the only fix was to re-index and therefore re-balance the collection
● Each shard is assigned a hash range
● Each shard also has a state which defaults to 'ACTIVE'
Bangalore Lucene/Solr Meetup8th June 2013
Shard Splitting: Features
● Seamless on-the-fly splitting – no downtime required
● Retried on failures
● /admin/collections?action=SPLITSHARD&collection=mycollection– &shard=shardId
● A lower-level CoreAdmin API comes free!– /admin/cores?action=SPLIT&core=core0&targetCore=core1&targetCore=core2
– /admin/cores?action=SPLIT&core=core0&path=/path/to/index/1&path=/path/to/index/2
Bangalore Lucene/Solr Meetup8th June 2013
Shard2_0
Shard1
replica
leader
Shard2
replica
leader
Shard3
replica
leader
Shard2_1
update
Shard Splitting
Bangalore Lucene/Solr Meetup8th June 2013
Shard Splitting: Mechanism
● New sub-shards created in “construction” state● Leader starts forwarding applicable updates, which are buffered
by the sub-shards● Leader index is split and installed on the sub-shards● Sub-shards apply buffered updates● Replicas are created for sub-shards and brought up to speed● Sub-shard becomes “active” and old shard becomes “inactive”
Bangalore Lucene/Solr Meetup8th June 2013
Shard Splitting: Tips and Gotchas
● Supports collections with a hash based router i.e. “plain” or “compositeId” routers
● Operation is executed by the Overseer node, not by the node you requested
● HTTP request is synchronous but operation is async. A read timeout does not mean failure!
● Operation is retried on failure. Check parent leader's logs before you re-issue the command or you may end with more shards than you want
Bangalore Lucene/Solr Meetup8th June 2013
Shard Splitting: Tips and gotchas
● Solr Admin GUI is not aware of shard states yet so the inactive parent shard is also shown in “green”
● The CoreAdmin split command can be used against non-cloud deployments. It will spread docs alternately among the sub-indexes
● Inactive shards have to be cleaned up manually. Solr 4.4 will have a delete shard API
● Shard splitting in 4.3 release is buggy. Wait for 4.3.1
Bangalore Lucene/Solr Meetup8th June 2013
Shard Splitting: Looking towards the future
● GUI integration and better progress reporting/monitoring
● Better support for custom sharding use-cases
● More flexibility towards number of sub-shards, hash ranges, number of replicas etc
● Store replication factor per shard
● Suggest splits to admins based on cluster state and load
Confidential and Proprietary © 2012 LucidWorks14
About LucidWorks
• Intro to LucidWorks (formerly Lucid Imagination)– Follow: @lucidworks, @lucidimagineer– Learn: http://www.lucidworks.com
• Check out SearchHub: http://www.searchhub.org• Solr 4.1 Reference Guide: http://bit.ly/11KSiMN
– Older versions: http://bit.ly/12t1Egq
• Our Products– LucidWorks Search– LucidWorks Big Data
• Lucene Revolution– http://www.lucenerevolution.com
Bangalore Lucene/Solr Meetup8th June 2013
Thank you
Shalin Shekhar MangarLucidWorks