netdocuments- journey from fast to solr

Journey from FAST to Solr

Presented By : David Hamson , Mou Nandi

Goal of the Session

•  NetDocuments •  Why move to Solr from FAST •  Architec8ng Solr to work as a core module for a Cloud Document

Management product user interface building and document discovery

•  Tes8ng and benchmarking Solr to scale and perform for billions of documents with 200 QPS and 200 DPS

•  Lessons learned/ shortcuts found migra8ng from FAST to Solr

2/14

Who We Are

2/14

A Leading cloud content management and collabora8on service for small to medium businesses (SMB) and professional services firms

Who We Serve

We service over 1,000 customers across 128 countries worldwide and host over 250+million documents.

2/14

Why Migrate to Solr

•  Product roadmap does not fit with company roadmap •  Large hardware footprint , expensive to scale •  High indexing latency •  Unpredictable and untraceable document loss •  A black box search engine, dependency on MicrosoT FAST support team •  No control over new features •  Expensive license

2/14

•  Solr supports massive index •  Ac8ve hardworking development community •  Access to what’s happening under the hood •  Improved hardware footprint •  Reduced licensing cost

Migration to Solr

2/14

FAST Instance 1

FAST Instance 2

FIXML

More FAST Instances

MDI + FTI

FIXML

Fast Doc Processors

Fast Doc Processors

ND Document

Fast Indexer

Fast Indexer MDI + FTI

•  95 % of searches are metadata search -‐ Metadata index does not need rich text processing

•  Flexibility to implement different architecture for MDI and FTI

•  Highest level of logging can not trace the document loss during a heavy feeding traffic

Migration to Solr – Solr Indexing

2/14

ND Pipeline

Solr MD XML

Solr FT XML

Aspire

ND Document

Solr MD Instance 1

MDI Solr MDI

MD

FT

Solr FTI FTI

Solr FT Instance

Solr MD Instance 1

MDI Solr MDI

Solr FTI FTI

Solr FT Instance

The Migration Project

2/14

•  Only create MDI •  Use FAST data to prototype Solr •  Use the fixmls to build the Solr index •  Use 100% filter queries

Phase 1 - MDI

•  Build a robust feeding pipeline to handle both MD FT •  Building a text processing pipeline

Phase 2 – FTI

•  Implement new Solr features Phase 3

Some ft. view of NetDocuments Search Architecture

2/14

Web App

File System

Web Queue Solr MDI

Solr FTI

Web App

MD H

andl

er P

ool

FT P

roce

ssor

poo

l

Disp

atch

er p

ool

Query Distributor

Administration ( monitoring, debugging, stats)

FT Q

ueue

Disp

atch

er q

ueue

MDH5

MDH4

MDH3

MDH2

MDH1

FTP5

FTP4

FTP3

FTP2

FTP1

D5

D4

D3

D2

D1

NDPipeline -‐

Benchmarking Solr Config Parameter for indexing

•  Created Solr index from fixmls with different ram buffer, merge factor and auto commit configura8on

2/14

•  We did not see any performance difference between HDD ( 15k rpm) and the iodrive2 with ND documents

•  15 threads running at a 8me from client feeder applica8on

Testing with HDD and SSD

2/14

Testing using different file system

•  We did not see huge performance difference between ext3 and xfs on HDD or SSD, with ND Documents

•  We chose to use ext3 for FTI with 15K HDD on RAID10 •  We are using xfs for iodrive for MDI as suggested by fusion Io

Benchmarking Solr Indexing and Query Process

2/14

search going to 5 shards search going to 10 shards

5 solr meter instances 10 Solr meter instances

Each shard serving 3000 queries per min Each shard serving 1500 queries/min

Total 15000 queries/min Total 15000 queries/min

avg response 8me 8 ms avg response 8me 12 ms

cpu 20 % cpu 32 %

ram -‐ 52 G ram -‐ 53 G

cache warmup 8me 2.5 S cache warmup 8me 2.7 S

cachehit ra8o .98 cachehit ra8o .98

cache size 2276 cache size 2276

no evic8on no evic8on

index updated every 7 sec index updated every 7 sec

test ran 5 min test ran 8 min

Implemented and compared mul8-‐core index processing and query performance compared to single core index

6/14

qTime does not vary much with start row increase.

Benchmark qtime increase as Solr scales and start row increases

Tuning System queries for Solr

•  System searches are metadata searches •  Thousands of real-‐life queries were extracted from FAST query log •  Extensive use of filter queries and filter cache give excellent response 8me for complex queries

•  Example queries:

FAST Query : ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1", mode="and"),ndcredate:range(2011-‐09-‐26T00:00:00,2012-‐04-‐13T23:59:59)),FILTER(ndacl:string(“acl1 acl2 acl3 ",mode="OR"))),nddeletedcabs:string(“cab1", mode="and")),ndexten:string("ndws", mode="and")),ndexten:string("ndflt", mode="and")) Solr Query: hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx+desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-‐09-‐26T00:00:00Z TO 2012-‐04-‐13T23:59:59Z]&fq={!cache=false cost=100}(ndacl_smul8_idx:acl1 OR ndacl_smul8_idx:acl2 OR ndacl_smul8_idx:acl3)&fq=-‐nddeletedcabs_smul8_idx:cab1&fq=-‐ndexten_s_idx:ndws&fq=-‐ndexten_s_idx:ndflt

2/14

THANK YOU

netdocuments- journey from fast to solr

Technology

solr fast instance

fast architec8ng solr

solr index use

benchmarking solr indexing

solr meter instances

new solr features

solr product roadmap

fast instances