content search for business using solr: presented by wei zhao, box
TRANSCRIPT
4
to make organizations more productive,
competitive and collaborative by connecting
people and their most important information
Box mission
11
Sharding – splitting the index
Agenda
Highly available search
A few more things
1
2
3
4
5 Q&A
Currently working on
16
File ID: 12345
OwnerID: user1
Parent Folders IDs: folder1, folder2
File Name: Solr.ppt
File Content: blah
......
A typical Solr Document
17
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1
Parent:Folder1Folder4
File 1 File 2
File 3 File 4
18
User1 with no share folder
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1
Parent:Folder1Folder4
File 1 File 2
File 3 File 4
19
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1
Parent:Folder1Folder4
File 1 File 2
File 3 File 4
20
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1
Parent:Folder1Folder4
File 1 File 2
File 3 File 4
21
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1
Parent:Folder1Folder4
File 1 File 2
File 3 File 4
Removed out of Folder2
22
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1
Parent:Folder1Folder4
File 1 File 2
File 3 File 4
Removed out of Folder2
26
Box Front
EndUpload
Index Queue
Queue 1
Queue 2
Queue 3
Indexer 1
Indexer 3
Indexer 2
MySQL
Index1
Index2
Index2
28
Box Front
End
queryHA
Proxy Head node
HA Proxy
1 2 3 N
Box Front
End
queryHA
Proxy Head node
HA Proxy
1 2 3 N
Data center boundary
33
Raw file content
Language detector
English tokenizer
Spanish tokenizer
Japanese tokenizer
German tokenizer
file_content_en
File_content_es{hola}
file_content_ja....
File_content_de
36
• Front end informs backend to warm up on keyboard focus
• Backend prepares the search filter and caches it in a search session
• Backend sends a warm-up query to Solr
38
Things we are working on
• Search suggestions
• Search operators
• Use machine learning to influence ranking
• Logical sharding