examiness hints and tips from the trenches
DESCRIPTION
Examine presentation slides for session at Umbraco UK festival 2012TRANSCRIPT
![Page 1: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/1.jpg)
Ismail MayatSenior Web Developer
@ The Cogworks
![Page 2: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/2.jpg)
Examiness
Hints and tips from the trenches
![Page 3: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/3.jpg)
What this talk is not
• How to install• How to configure
![Page 4: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/4.jpg)
What we will cover• Tools to help you• Hints and tips regarding indexing• GatheringNodeData event is your friend!• Indexing media (pdf,word etc)• Deep in the bowels with DocumentWriting event• Search highlighting• Deployment to staging / production environments• Faceting (Not exactly examine but still useful)• Food for thought• Questions and answers
![Page 5: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/5.jpg)
Tools to help you
![Page 6: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/6.jpg)
Tools to help you“Use the source Luke!”
http://code.google.com/p/luke/
![Page 7: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/7.jpg)
Tools to help you
• http://luke.codeplex.com/ (.net port)• Subset of common features present• Scripting with Rhino missing etc
![Page 8: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/8.jpg)
Using Luke
• Writing out generated queries to test in lukevar criteria = searcher.CreateSearchCriteria(IndexTypes.Content);
IBooleanOperation query = criteria.NodeTypeAlias("NewsItem");
query = query.Not().Field("umbracoNaviHide", 1.ToString());
var results = searcher.Search(query.Compile());criteria.ToString();
Generates the following querySearchIndexType: content, LuceneQuery: +(+__NodeTypeAlias:newsitem -umbracoNaviHide:1) +__IndexType:content
![Page 9: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/9.jpg)
Tools to help youhttp://our.umbraco.org/projects/developer-tools/examine-dashboard
![Page 10: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/10.jpg)
GatheringNode Data
• Examine has rich event system• In all my implementations I have used
GatheringNode– Merge into one contents field– Searching on path– Adding nodeTypeAlias field into pdf index
![Page 11: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/11.jpg)
GatheringNode DataMerge into contents field
• Example query
var query = searchCriteria.Field("nodeName","hello").Or().Field("metaTitle","hello").Field("metaDescription","hello").Compile();
![Page 12: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/12.jpg)
GatheringNode DataMerge to contents field
public class ExamineEvents:ApplicationBase { public ExamineEvents() {
ExamineManager.Instance.IndexProviderCollection[Constants.ATGMainIndexerName].GatheringNodeData += ATGMainExamineEvents_GatheringNodeData;
}
void ATGMainExamineEvents_GatheringNodeData(object sender, IndexingNodeDataEventArgs e) { AddToContentsField(e);
}
private void AddToContentsField(IndexingNodeDataEventArgs e) {
var fields = e.Fields; var combinedFields = new StringBuilder();
foreach (var keyValuePair in fields) { combinedFields.AppendLine(keyValuePair.Value);
}e.Fields.Add("contents", combinedFields.ToString());
} }
![Page 13: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/13.jpg)
GatheringNode DataMerge to contents field
• Query now looks likequery.Field(“contents”,”hello”)
• Adding new fields is just case of rebuild index
![Page 14: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/14.jpg)
GatheringNode DataCreating a searchable path
• Path is in index as 1,1056,1078 not tokenised• Add new field with , replaced with space
![Page 15: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/15.jpg)
GatheringNode Data
• How to query when no value e.g sql query like select where value=‘’
• Select all• Cannot do query like this in Examine / Lucene• However can use GatheringNode data event
to inject in some arbitrary value then query on that.
• E.g. field noData_Title value 1
![Page 16: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/16.jpg)
GatheringNode Data
• Re Indexing errors• MNTP field referencing a node that no longer
exists• Use try catch and log the offending node
![Page 17: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/17.jpg)
Document writing event• You need lower level Lucene access• E.g. boosting a field• What is boosting? Not all documents are equal you need to artificially give
higher ranking to certain documents . When sort by is just not enough e.g.
– Person doc type. If they have important title they need to appear at top of person search list
– Boost documents by age. Penalize older documents useful for news and business documents.
– Boost based on unique views (would need to know up front also base on time slots e.g last month, last week)
– Documents with more likes (custom like functionality)– Tagging using XFS Term selector with weighting http
://our.umbraco.org/projects/website-utilities/xfs-term-selector
![Page 18: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/18.jpg)
Document writing eventvar indexer = (UmbracoContentIndexer)ExamineManager.Instance.IndexProviderCollection[Constants.ATGMDirectoryIndexerName]; indexer.DocumentWriting += indexer_DocumentWriting;
void indexer_DocumentWriting(object sender, Examine.LuceneEngine.DocumentWritingEventArgs e) {
var title= e.Document.GetField("title");
if(title==“Partner”){ e.Document.SetBoost(1.5f);
} }
![Page 19: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/19.jpg)
Indexing media• Pdf indexer. Only indexes pdf content.• CogUmbracoExamineMediaIndexer (Available as package on our)
– Uses apache tika. Indexes content and any associated meta data– XML and derived formats– Microsoft Office document formats– OpenDocument Format– Portable Document Format– Electronic Publication Format– Rich Text Format– Compression and packaging formats– Text formats– Audio formats (MP3 etc)– Image formats– Video formats– Java class files and archives– The mbox format
![Page 20: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/20.jpg)
Search highlighting
• Lucene contrib package Highlighter.net• Highlights occurrences of your search term in
search results summary fragment.• Wiki on our
http://our.umbraco.org/wiki/how-tos/how-to-highlight-text-in-examine-search-results
![Page 21: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/21.jpg)
Deployment to staging / production environments
• Cannot copy index• Can check in but could corrupt• Selenium with ashx to rebuild index
![Page 22: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/22.jpg)
Deployment to staging / production environments
public class RebuildIndexes : IHttpHandler { readonly List<string> indexes = new List<string> { "ATGIndexer", "InternalIndexer", "directoryIndexer" }; public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; try { if(string.IsNullOrEmpty(context.Request.QueryString["index"])) { foreach (var index in indexes) { ExamineManager.Instance.IndexProviderCollection[index].RebuildIndex(); } } else { ExamineManager.Instance.IndexProviderCollection[context.Request.QueryString["index"]].RebuildIndex(); } context.Response.Write("done"); } catch(Exception ex) { context.Response.Write(ex.ToString()); } }
public bool IsReusable { get { return false; } } }
![Page 23: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/23.jpg)
Deployment to staging / production environments
[SetUp] public void SetupTest() { selenium = new DefaultSelenium("localhost", 4444, "*chrome", "http://mydevsite"); selenium.Start(); _verificationErrors = new StringBuilder(); }
[Test] public void RebuildIndex() { //not proper test but a hack to get indexes rebuilt after a deployment try { selenium.Open("/umbraco/webservices/RebuildIndexes.ashx");
} catch (SeleniumException se) { if (!se.Message.StartsWith("Timed out")) { throw; } } catch (AssertionException e) { _verificationErrors.Append(e.Message); } }
![Page 24: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/24.jpg)
Faceting• Faceted search, also called faceted navigation or faceted
browsing, is a technique for accessing information organized according to a faceted classification system, allowing users to explore a collection of information by applying multiple filters
• Amazon, LinkedIn http://www.linkedin.com/search/fpsearch?type=people&keywords=umbraco&pplSearchOrigin=GLHD&pageKey=member-home&search=Search
• LinkedIn uses Bobo browser. Written in java it has been ported to .net http://bobo.codeplex.com/
• Demo is SimpleFacetHandler others are available e.g RangeFacet,PathFacet, GetFacet
![Page 25: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/25.jpg)
Food for thought• Using the index as object db ala RavenDb• Scenario: You have nodes with large number of multi tree node pickers used as look ups
![Page 26: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/26.jpg)
Index as object db
![Page 27: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/27.jpg)
Index as object db
![Page 28: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/28.jpg)
Index as object db
![Page 29: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/29.jpg)
Food for thought
• In index node ids are stored as CSV list if MNTP set to csv.
• Use GatheringNodeData event to do lookups create a POCO with lookup data, serialise POCO to JSON and store that in index.
• Advantage: Instant lookup all data ready to use• Disadvantage: Need to keep up with lookup changes.
E.g. If Country code changes then you would need to lookup code already in use and update.
• Nice approach if lookup data is fairly static
![Page 30: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/30.jpg)
Food for thought
• POCO hydration using activelucenenet ala USiteBuilder
• Create pocos and decorate with attributes public class Product { [LuceneField(“sku")] public string Sku { get; set; }
[LuceneField(“productName")] public string ProductName { get; set; } }
![Page 31: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/31.jpg)
Food for thought
var luceneProductDoc = GetItFromLucene(1234);var product = LuceneMediator<Product>.ToRecord(luceneProductDoc );
Would need to use Lucene directly as there is a no way of getting the lucene document from examine search result wrapper?
![Page 32: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/32.jpg)
Take home today
• Use the index!!!
![Page 33: Examiness hints and tips from the trenches](https://reader031.vdocuments.site/reader031/viewer/2022020713/548c08b7b479599b348b45da/html5/thumbnails/33.jpg)
Questions
• ????• http://twitter.com/ismailmayat