solr
DESCRIPTION
Code & Beer Topic: Apache SolrTRANSCRIPT
![Page 1: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/1.jpg)
by NNNN (周世恩)Code & Coffee 2013/11/1
![Page 2: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/2.jpg)
What is Solr?
![Page 3: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/3.jpg)
What is
![Page 4: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/4.jpg)
• Full-featured text search
• High performance
• index size: 20-30% the size of text data.
• small RAM requirements(~1MB)
• Powerful, Accurate and Efficient Search Algorithms
• 100% in Java(^^)
![Page 5: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/5.jpg)
Lucene(cont.)
• Multiple Analyzer / Tokenizer
• Fields Searching
• Merge results
• Flexible faceting, highlighting, joins and result grouping
• Typo-tolerant suggesters(當然要⾃自⼰己建⽴立)
• Customize ranking model..(VSM, BM25)
![Page 6: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/6.jpg)
Lucene(Query)
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/Query.html
![Page 7: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/7.jpg)
http://www.ibm.com/developerworks/cn/java/j-lo-lucene1/fig001.jpg
![Page 8: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/8.jpg)
Where is index file stored?
• Memory
• File System
• HDFS
• FileSystem config 設定為 HDFS
![Page 9: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/9.jpg)
#Note
• 只有被index的field 才可以search
• 可以純store 不index
• ⽀支援多種Type(Long, Int, String, Text...)
• Indexing 就要決定好Tokenizer(Analyzer) 了
• ⽀支援同時searching and indexing?
![Page 10: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/10.jpg)
#Note
• 使⽤用前搖⼀一搖• ⼀一開始就要清楚有哪些Field
• 降低重建index的機會(RDB只要打個指令就好)
![Page 11: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/11.jpg)
Lucene Index file項⺫⽬目很多, 少⼀一個你就GG
![Page 12: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/12.jpg)
What is Solr?
![Page 13: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/13.jpg)
超屌企業級免費的
![Page 14: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/14.jpg)
Search Platform
![Page 15: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/15.jpg)
Lucene 功能該有的都有了
![Page 16: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/16.jpg)
Solr 還多了....
• 漂亮的Admin Interface!
• REST-like API(易與其他App結合)
• Dynamic clustering
• Database integration
• Geospatial search(Google Map?)
• 調整Cache Size
![Page 17: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/17.jpg)
還記得雲端的優勢...
• Highly reliable
• Scalable
• Fault tolerant
• Distributed indexing
• Replication
• Load-balanced
• Automated failover and recovery(?)
![Page 18: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/18.jpg)
![Page 19: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/19.jpg)
常⽤用的config
• schema.xml(定義每個field)
• solrconfig.xml (定義每個handler的URI)
• jetty.xml(!)
• solr.xml(定義core的數量)
![Page 20: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/20.jpg)
Real-time indexing?
![Page 21: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/21.jpg)
Near Real-time indexing
• Documents are available for search almost immediately after being indexed...
• 也要有commit 才算數(....)
https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
![Page 22: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/22.jpg)
Searching
• Query
“id: {id} AND name:{Name} OR title:{text}”
• Highlighting
• Projection
• Sorting(asc, desc)
• Output format: JSON, CSV, XML
• Others: spellcheck, Wildcard Query, +-*/
![Page 23: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/23.jpg)
Sample Output
![Page 24: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/24.jpg)
Import Data From DB
• 在solrconfig.xml 修改
http://wiki.apache.org/solr/DataImportHandler
![Page 25: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/25.jpg)
The diff between Solr and RDB
• Solr is for indexed text or lots of unstructured docs.
• Solr is optimized for searching, not for storage and retrieval of individual records.
http://stackoverflow.com/questions/5814050/solr-or-database
![Page 26: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/26.jpg)
Distributed Search cluster
• 很多台機器架設 Solr, 選⼀一台來進⾏行聯結
• 需要在config設定嗎?
![Page 27: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/27.jpg)
Distributed Solr Cluster & Load balancer
http://wiki.apache.org/solr/SolrReplication
![Page 28: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/28.jpg)
http://wiki.apache.org/solr/SolrReplication
![Page 29: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/29.jpg)
#Note
• 你可以先⽤用包有lucene indexing 功能的java application 先製作好index directory再給solr ⽤用
• 如果solr要進⾏行update時, 最好先確認沒有其他application正在進⾏行寫⼊入的程序, 否則GG
• indexing 時, 不管是solr還是lucene, write-lock不要亂刪
![Page 30: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/30.jpg)
Live Demo
眾神們曾經說過這很危險的
![Page 31: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/31.jpg)
下載Solr 最新版 $: cd solr-4.4.0/example $: java -Xmx2048m -jar start.jar
![Page 32: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/32.jpg)
The End
![Page 33: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/33.jpg)
好Tool 分享
• Luke(檢查index⽤用)
• Apache Tika
• Apache hadoop
• Apache Tomcat
![Page 34: Solr](https://reader031.vdocuments.site/reader031/viewer/2022020217/54c6caeb4a795943608b4575/html5/thumbnails/34.jpg)
BBQ(Bonus)
• Customize tokenizer
• Document Boosting
• Field Boosting
• Field aliasing / renaming