searching in inception - fif.tu-darmstadt.de · 13.03.2018 | computer science department | ukp lab...
TRANSCRIPT
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Searching in INCEpTION
Beto Boullosa, Michael Bugert, Jan-Christoph Klie, Peter Jiang, Wei Ding,
Maximilian Fuchs, Richard Eckart de Castilho, Iryna Gurevych
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Searching in INCEpTION
Search over annotations
Inside the projects
Focus: search passages which the user is
interested in analyzing / annotating
Search over background corpus
Outside the project
Focus: search passages / documents which the
user wants to import into the project
(Subcorporation)
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search over annotations
Users search over their own annotations / text
Admins / curators search over annotations / text
from all users
Usually fewer results
All annotation layers / types are potentially
searchable
Current status: Working on
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search over background corpus
Users search over large background corpora
Results can be imported into projects
Documents are automatic pre-annotated with
standard annotation types
Tokens
Lemmas
POS-tags
etc.
Current status: To do.
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search and index
What is indexed
Arbitrary span and relation annotations
Arbitrary features
Hierarchical structures (syntactic constituent
trees or document structure)
No restart when changing schema
Concurrent access
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search and index - components
Search service
API for indexing documents and doing queries
Search sidebar
Allows to search for passages among the
documents of a project
Search page
Allows to search for documents in a reference
corpus
Useful for sub-corporation
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search and index - providers
Providers
Mimir
Based on GATE
Initial search provider
Mtas
Based on Solr/Lucene
Merteens Institute (Amsterdam)
CQL query syntax
Modular architecture allows to add more
providers as needed
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Concept + Text Search
Goal are queries across text and knowledge base:
“Show me all occurrences of a named entity that is a deity”.
Annotation in
text
Information from
knowledge base
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search syntax (Mtas provider)
Based on CQL
Searches over single / multiple tokens
Searches over annotations and features
Sequential searches
Operators
intersecting, containing, within, |, !
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search syntax examples
Single token All occurrences of a given word:
Galicia
Multiple tokens All occurrences of a given expression:
to be or not to be
Single token annotations All tokens annotated as a NOUN:
[POS.PosValue="NOUN"]
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search syntax examples
Multi-token annotations Named entities
<Named_entity/>
Named entities of type location
<Named_entity.value="LOC“/>
Sequence queries All occurrences of “suddenly” immediately followed by a verb :
[Token=“suddenly"][POS.PosValue=“VERB"]
All occurrences of a verb immediately followed by a named entity:
[POS.PosValue=“VERB"]<Named_entity/>
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search syntax examples
Sequence queries Two named entities in a row
<Named_entity/> {2}
<Named_entity/> <Named_entity/>
A named entity followed by a token and another named entity:
<Named_entity/> [] <Named_entity/>
A named entity followed by an optional token and another named
entity:
<Named_entity/> []? <Named_entity/>
Two named entities separated by among 1 and 3 tokens
<Named_entity/> []{1,3} <Named_entity/>
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search syntax examples
Complex queries Determinants inside a named entity
[POS.PosValue="DET"] within <Named_entity/>
Determinants not inside a named entity
[POS.PosValue="DET"] !within <Named_entity/>
Named entities containing a determinant
<Named_entity/> containing [POS.PosValue="DET"]
Named entities not containing a determinant
<Named_entity/> !containing [POS.PosValue="DET"]
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Search syntax examples
Complex queries Named entities of type LOC or OTH contained in a
semantic argument
(<Named_entity.value="OTH"/> |
<Named_entity.value="LOC"/>) within <SemArg/>
Named entities of type LOC or OTH intersecting with a
semantic argument
(<Named_entity.value="OTH"/> |
<Named_entity.value="LOC"/>) intersecting
<SemArg/>
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa
Future steps
Implement the external search
Extend the search over the semantic
annotations
Integrate SPARQL to the search
Integrate machine learning methods to the
subcorporation