queue based solr indexing with collection management: presented by devansh dhutia, gannett co
TRANSCRIPT
5
01Agenda
• Solr @ Gannett • Current State • Collection Management • Queuing Solution • Future Work • Questions
6
02
@
Site Search CMS Search
Analytics Personalization
40+ Applications 20M+
Integral pillar of Gannett’s Digital Platform
total documents
800,000+ per month
Growing rapidly
100,000+ requests per minute
Highly Available
~100ms average response time
Extremely Fast
8 nodes
256 gb memory per availability zone
8
01Current State
• Synchronous Operations • Near Realtime • Time Consuming schema changes • Visible outage impact
9
01Collection Management
• Create Collection • Deploy Batch Indexer • Index new Collection • Update Alias to new Collection • Run catch up • Deploy Search/Index Apps
13
01Outage Problems
• Spinning Wheel • Duplicate content • Unable to find new content • Frustrated editors • Ux & other presentation layers
14
01Enter Queues
• Asynchronous Write Operations • Near Realtime • Faster schema changes • Auto scale indexing workers • Low authoring outage impact • Eventually consistent
16
01RabbitMQ
• Clustered & Highly Available • FIFO • pub/sub model • Consistent Hash / Multiple Queues
18
01Components • Realtime Queue • Batch Queue • Prep Queue • Deadletter Queue • Indexing Service • Prep mode • Batch Push Service
19
01Future Work
• Continuous Delivery of schema • Build payload in one zone only • Automated Deadletter handling • Earlier notification of potential failure