multi-language content discovery through entity driven search: presented by alessandro benedetti,...
TRANSCRIPT
![Page 1: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/1.jpg)
![Page 2: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/2.jpg)
Multi-language Content Discovery Through Entity Driven SearchAlessandro Benedetti
Search Consultant and R&D Software EngineerZaizi
http://uk.linkedin.com/in/alexbenedetti
![Page 3: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/3.jpg)
Who I am
Alessandro Benedetti
Apache ManifoldCF committer Search Consultant R&D Software Engineer Master in Computer Science Information Retrieval Background Semantic, NLP, Machine Learning Technologies Enthusiast Beach Volleyball Player & Snowboarder
![Page 4: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/4.jpg)
ZAIZI
![Page 5: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/5.jpg)
ZAIZI
Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle
Alfresco & Ephesoft certified Platinum Partner
Red Hat Enterprise Linux Ready Partner
R&D department specialising in Open SourceSearch Solutions
Alfresco Partner of the Year 2012 and 2013
![Page 6: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/6.jpg)
Agenda
Context
Problem
Solution
Demo
What's upcoming
![Page 7: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/7.jpg)
Zaizi R&D Department
Giving sense to the content
Enriching it semantically
Adding value to ECM/CMS
More structured content, easy to manage, link and search
Improving search
Across different domains, data sources, User Experience
Machine Learning applied research
Content Organization – Recommendation Systems
![Page 8: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/8.jpg)
Enterprise Search Problems Challenge :
Search within Big and Heterogeneus Repositories
Heterogeneus data sources
Filesystems, DB, ECM/CMS, Email, …
Unstructured content in different formats
PDF, text plain, Word …
Documents not linked between each other
Federated Search
across data sources
preserving permissions
centralized endpoint
![Page 9: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/9.jpg)
Sensefy
Semantic Enterprise Search Engine
Federated Search
Evolved User Experience
Based on cutting-edge Open Source Frameworks
![Page 10: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/10.jpg)
Architecture
![Page 11: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/11.jpg)
Entity Driven Search
Moving from keywords to Entities More understandable to Humans
Process the unstructured text at indexing time
Enrich it
Build specific indexes
Use entities and concepts in searches• Trying to foresee the concepts the user wants to express
![Page 12: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/12.jpg)
What is an Entity in our domain ?
Real world concepts
Linked Data resources
Rdf(xml) structured data• Unique identifier + properties
Stored in a Knowledge Base ( Freebase, DbPedia, Custom Dataset)
![Page 13: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/13.jpg)
Redlink
Semantic Cloud platform Providing Software as a Service Text analysis and Entity Linking using Knowledge Bases Linked Data Publishing Enterprise Data Linking Open-Source based components
![Page 14: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/14.jpg)
Indexing - NLP & Semantic Enrichment
Apache ManifoldCF custom processors/output connectors
From unstructured to structured NLP Analysis. POS Tagging Named Entities Recognition Entity Linking using Knowledge Bases Disambiguation
Indexing in specific Solr Collections • Primary Index (documents)• Entity Index• Entity Types
![Page 15: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/15.jpg)
Search - Smart Autocomplete
Multi Phase suggestions
Closer to natural language query formulation
Named Entities
Entity Types
Document Titles
![Page 16: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/16.jpg)
Smart Autocomplete – Named Entities
Infix Suggestion ( ron → Cristiano Ronaldo)
Fuzzy suggestion ( cristinao → Cristiano Ronaldo)
Brief description of the suggested entity
Specific Solr index for the entities• Schema ( label, notable_type, occurrences...)• Edge-Ngram token filtered label field• Fuzzy queries with variable distance / classic queries to the label suggestion
field
![Page 17: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/17.jpg)
Smart Autocomplete – Entity Types
Infix Suggestion ( play → Football Player)
Fuzzy suggestion ( foobtall → Football Team)
Multi Language ( calcia → Calciatore[it]( Football Player)[en] )
Multi phase suggestion through properties ( ital → football player nationality italian)
Specific Solr collection for the entity types• SolrDocument is an entity type ( type,occurrences,attributes,type hierarchy...)• EdgeNgram token filtered type• Multi-language suggestion highlight
![Page 18: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/18.jpg)
Smart Autocomplete – configuration
Knowledge base for entity linking and dereference DbPedia, Freebase, Custom Dataset
Properties For each entity type of interest Ldpath will be used to identify the property in the graph
Hierarchy All the sub-instances of a type will automatically inherit their parent properties to ease the configuration
![Page 19: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/19.jpg)
Semantic Search
Search by Named Entity Ex. Give me all the documents related to
Christian Bale
Search by Entity Type Ex. Give me all the documents about football players
Search by Entity Type + properties Ex. Give me all the documents about football players whose nationality is British
Query time Join : Entity-Entity Type collection → primary Index
![Page 20: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/20.jpg)
Semantic Facets
Dynamic calculated semantic facets based on types and entities from documents
Improve the navigation of results
Allow refined search through semantic information
Configurable custom layer on top of Solr faceting component
![Page 21: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/21.jpg)
Semantic More Like This
Search for similar documents based on Entities and Entity Types
Similarity function based on document meaning
Multi Language / Not based on text tokens but concepts
Solr More Like This on custom fields
Entity Frequency / Inverted Document Frequency
Entity Type Frequency / Inverted Document Frequency
![Page 22: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/22.jpg)
Live Demo
Context
Problem
Solution
Demo
What's upcoming
![Page 23: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/23.jpg)
What's upcoming
Machine Learning components:– Classification– Topic annotation– Clustering
Secured Entity Search Image and Media searches Advanced Geo-search Personalized/collaborative search Recommendations Q&A Advanced configurable Admin Dashboard
![Page 24: Multi-language Content Discovery Through Entity Driven Search: Presented by Alessandro Benedetti, Zaizi](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3ce7e1a28ab2c0d8b470e/html5/thumbnails/24.jpg)
Any Questions?
Alessandro BenedettiSearch Consultant and R&D Software EngineerZaizi Email: [email protected]: @Zaizi