netdocuments- journey from fast to solr

16
Journey from FAST to Solr Presented By : David Hamson , Mou Nandi

Upload: lucenerevolution

Post on 07-Jul-2015

905 views

Category:

Technology


2 download

DESCRIPTION

Presented by David Hamson & Mou Nandi, NetDocuments - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 NetDocuments, a SaaS document management company, is migrating their large document repository from Microsoft FAST to Solr. During this presentation, the speakers will discuss the the entire process, including major decision points and lessons learned. The migration is a two-phase implementation: The first being a short-cut of moving the FAST xml data directly to Solr to get a Solr meta-data index available quickly and the second phase implements the full architecture, including both meta-data and full text processing and search. The presenters will talk about architecting Solr to meet the company's requirements of scaling to billions of work-product documents, low indexing latency, and high availability. NetDocuments uses the search engine to build the user experience and also for document discovery by users. Solr was architected to scale and perform in order to address these two very different needs and also to match all the features and functionality available with FAST. Finally, the presenters will share the benchmark results from tests run on various hardware configurations and on different file systems, and also share results from search quality testing as the capabilities of Solr were tested on a single server, both single Solr core as well as multiple Solr cores.

TRANSCRIPT

Page 1: NetDocuments- Journey from FAST to Solr

Journey from FAST to Solr

Presented By : David Hamson , Mou Nandi

Page 2: NetDocuments- Journey from FAST to Solr

Goal of the Session

•  NetDocuments  •  Why  move  to  Solr  from  FAST  •  Architec8ng  Solr  to  work  as  a  core  module  for  a  Cloud  Document  

Management  product  user  interface  building  and  document  discovery  

•  Tes8ng  and  benchmarking  Solr  to  scale  and  perform  for  billions  of  documents  with  200  QPS  and  200  DPS  

•  Lessons  learned/  shortcuts  found  migra8ng  from  FAST  to  Solr  

2/14

Page 3: NetDocuments- Journey from FAST to Solr

Who We Are

2/14

A  Leading  cloud  content  management  and  collabora8on  service  for  small  to  medium  businesses  (SMB)  and  professional  services  firms  

Page 4: NetDocuments- Journey from FAST to Solr

Who We Serve

We  service  over  1,000  customers  across  128  countries  worldwide  and  host  over  250+million  documents.    

2/14

Page 5: NetDocuments- Journey from FAST to Solr

Why Migrate to Solr

•  Product  roadmap  does  not  fit  with  company  roadmap  •  Large  hardware  footprint  ,  expensive  to  scale  •  High  indexing  latency    •  Unpredictable  and  untraceable  document  loss    •  A  black  box  search  engine,  dependency  on  MicrosoT  FAST  support  team    •  No  control  over  new  features  •  Expensive  license    

2/14

   

•  Solr  supports  massive  index  •  Ac8ve  hardworking  development  community  •  Access  to  what’s  happening  under  the  hood  •  Improved  hardware  footprint    •  Reduced  licensing  cost    

Page 6: NetDocuments- Journey from FAST to Solr

Migration to Solr

2/14

FAST Instance 1

FAST Instance 2

FIXML

More FAST Instances

MDI + FTI

FIXML

Fast Doc Processors

Fast Doc Processors

ND Document

Fast Indexer

Fast Indexer MDI + FTI

•  95  %  of  searches  are  metadata  search  -­‐  Metadata  index  does  not  need  rich  text  processing    

•  Flexibility  to  implement  different  architecture  for  MDI  and  FTI  

•  Highest  level  of  logging  can  not  trace  the  document  loss  during  a  heavy  feeding  traffic  

Page 7: NetDocuments- Journey from FAST to Solr

Migration to Solr – Solr Indexing

2/14

ND Pipeline

Solr MD XML

Solr FT XML

Aspire

ND Document

Solr MD Instance 1

MDI Solr MDI

MD

FT

Solr FTI FTI

Solr FT Instance

Solr MD Instance 1

MDI Solr MDI

Solr FTI FTI

Solr FT Instance

Page 8: NetDocuments- Journey from FAST to Solr

The Migration Project

2/14

•  Only create MDI •  Use FAST data to prototype Solr •  Use the fixmls to build the Solr index •  Use 100% filter queries

Phase 1 - MDI

•  Build a robust feeding pipeline to handle both MD FT •  Building a text processing pipeline

Phase 2 – FTI

•  Implement new Solr features Phase 3

Page 9: NetDocuments- Journey from FAST to Solr

Some ft. view of NetDocuments Search Architecture

2/14

Web App

File System

Web Queue Solr MDI

Solr FTI

Web App

MD H

andl

er P

ool

FT P

roce

ssor

poo

l

Disp

atch

er p

ool

Query Distributor

Administration ( monitoring, debugging, stats)

FT Q

ueue

Disp

atch

er q

ueue

MDH5

MDH4

MDH3

MDH2

MDH1

FTP5

FTP4

FTP3

FTP2

FTP1

D5

D4

D3

D2

D1

NDPipeline    -­‐    

Page 10: NetDocuments- Journey from FAST to Solr

Benchmarking Solr Config Parameter for indexing

•  Created  Solr  index  from  fixmls  with  different  ram  buffer,  merge  factor  and  auto  commit  configura8on  

2/14

•  We  did  not  see  any  performance  difference  between  HDD  (  15k  rpm)  and  the  iodrive2  with  ND  documents  

•  15  threads  running  at  a  8me  from  client  feeder  applica8on  

Testing with HDD and SSD

Page 11: NetDocuments- Journey from FAST to Solr

2/14

Testing using different file system

•  We  did  not  see  huge  performance  difference  between  ext3  and  xfs  on  HDD  or  SSD,  with  ND  Documents  

•  We  chose  to  use  ext3  for  FTI    with  15K  HDD  on  RAID10    •  We  are  using  xfs  for  iodrive  for  MDI  as  suggested  by  fusion  Io  

Page 12: NetDocuments- Journey from FAST to Solr

Benchmarking Solr Indexing and Query Process

2/14

search  going  to  5  shards  search  going  to  10  shards  

5  solr  meter  instances   10    Solr  meter  instances  

Each  shard  serving    3000  queries  per  min   Each  shard  serving    1500  queries/min  

Total  15000  queries/min   Total  15000  queries/min  

avg  response  8me  8  ms   avg  response  8me  12  ms  

cpu  20  %   cpu  32  %  

ram  -­‐  52  G   ram  -­‐  53  G  

cache  warmup  8me  2.5  S   cache  warmup  8me  2.7  S  

cachehit  ra8o  .98   cachehit  ra8o  .98  

cache  size  2276   cache  size  2276  

no  evic8on   no  evic8on  

index  updated  every  7  sec   index  updated  every  7  sec  

test  ran  5  min   test  ran  8  min  

Implemented  and  compared  mul8-­‐core  index  processing  and  query    performance  compared  to  single  core  index  

Page 13: NetDocuments- Journey from FAST to Solr

6/14

qTime does not vary much with start row increase.

Benchmark qtime increase as Solr scales and start row increases

Page 14: NetDocuments- Journey from FAST to Solr

Tuning System queries for Solr

•  System  searches  are  metadata  searches  •  Thousands  of  real-­‐life  queries  were  extracted  from  FAST  query  log  •   Extensive  use  of  filter  queries  and  filter  cache  give  excellent  response  8me  for  complex  queries  

•  Example  queries:  

FAST  Query  :  ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1",  mode="and"),ndcredate:range(2011-­‐09-­‐26T00:00:00,2012-­‐04-­‐13T23:59:59)),FILTER(ndacl:string(“acl1  acl2  acl3  ",mode="OR"))),nddeletedcabs:string(“cab1",  mode="and")),ndexten:string("ndws",  mode="and")),ndexten:string("ndflt",  mode="and"))    Solr  Query:  hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx+desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-­‐09-­‐26T00:00:00Z  TO  2012-­‐04-­‐13T23:59:59Z]&fq={!cache=false  cost=100}(ndacl_smul8_idx:acl1  OR  ndacl_smul8_idx:acl2  OR  ndacl_smul8_idx:acl3)&fq=-­‐nddeletedcabs_smul8_idx:cab1&fq=-­‐ndexten_s_idx:ndws&fq=-­‐ndexten_s_idx:ndflt  

2/14

Page 15: NetDocuments- Journey from FAST to Solr
Page 16: NetDocuments- Journey from FAST to Solr

THANK YOU