searching in inception - fif.tu-darmstadt.de · 13.03.2018 | computer science department | ukp lab...

18
13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa Searching in INCEpTION Beto Boullosa, Michael Bugert, Jan-Christoph Klie, Peter Jiang, Wei Ding, Maximilian Fuchs, Richard Eckart de Castilho, Iryna Gurevych

Upload: vuongkien

Post on 12-Sep-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Searching in INCEpTION

Beto Boullosa, Michael Bugert, Jan-Christoph Klie, Peter Jiang, Wei Ding,

Maximilian Fuchs, Richard Eckart de Castilho, Iryna Gurevych

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Searching in INCEpTION

Search over annotations

Inside the projects

Focus: search passages which the user is

interested in analyzing / annotating

Search over background corpus

Outside the project

Focus: search passages / documents which the

user wants to import into the project

(Subcorporation)

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search over annotations

Users search over their own annotations / text

Admins / curators search over annotations / text

from all users

Usually fewer results

All annotation layers / types are potentially

searchable

Current status: Working on

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search over background corpus

Users search over large background corpora

Results can be imported into projects

Documents are automatic pre-annotated with

standard annotation types

Tokens

Lemmas

POS-tags

etc.

Current status: To do.

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search and index

What is indexed

Arbitrary span and relation annotations

Arbitrary features

Hierarchical structures (syntactic constituent

trees or document structure)

No restart when changing schema

Concurrent access

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search and index - components

Search service

API for indexing documents and doing queries

Search sidebar

Allows to search for passages among the

documents of a project

Search page

Allows to search for documents in a reference

corpus

Useful for sub-corporation

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search and index - providers

Providers

Mimir

Based on GATE

Initial search provider

Mtas

Based on Solr/Lucene

Merteens Institute (Amsterdam)

CQL query syntax

Modular architecture allows to add more

providers as needed

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search over annotations

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Cross-document full text search

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Concept + Text Search

Goal are queries across text and knowledge base:

“Show me all occurrences of a named entity that is a deity”.

Annotation in

text

Information from

knowledge base

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search syntax (Mtas provider)

Based on CQL

Searches over single / multiple tokens

Searches over annotations and features

Sequential searches

Operators

intersecting, containing, within, |, !

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search syntax examples

Single token All occurrences of a given word:

Galicia

Multiple tokens All occurrences of a given expression:

to be or not to be

Single token annotations All tokens annotated as a NOUN:

[POS.PosValue="NOUN"]

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search syntax examples

Multi-token annotations Named entities

<Named_entity/>

Named entities of type location

<Named_entity.value="LOC“/>

Sequence queries All occurrences of “suddenly” immediately followed by a verb :

[Token=“suddenly"][POS.PosValue=“VERB"]

All occurrences of a verb immediately followed by a named entity:

[POS.PosValue=“VERB"]<Named_entity/>

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search syntax examples

Sequence queries Two named entities in a row

<Named_entity/> {2}

<Named_entity/> <Named_entity/>

A named entity followed by a token and another named entity:

<Named_entity/> [] <Named_entity/>

A named entity followed by an optional token and another named

entity:

<Named_entity/> []? <Named_entity/>

Two named entities separated by among 1 and 3 tokens

<Named_entity/> []{1,3} <Named_entity/>

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search syntax examples

Complex queries Determinants inside a named entity

[POS.PosValue="DET"] within <Named_entity/>

Determinants not inside a named entity

[POS.PosValue="DET"] !within <Named_entity/>

Named entities containing a determinant

<Named_entity/> containing [POS.PosValue="DET"]

Named entities not containing a determinant

<Named_entity/> !containing [POS.PosValue="DET"]

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Search syntax examples

Complex queries Named entities of type LOC or OTH contained in a

semantic argument

(<Named_entity.value="OTH"/> |

<Named_entity.value="LOC"/>) within <SemArg/>

Named entities of type LOC or OTH intersecting with a

semantic argument

(<Named_entity.value="OTH"/> |

<Named_entity.value="LOC"/>) intersecting

<SemArg/>

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

Future steps

Implement the external search

Extend the search over the semantic

annotations

Integrate SPARQL to the search

Integrate machine learning methods to the

subcorporation

13.03.2018 | Computer Science Department | UKP Lab | Beto Boullosa

That’s it!

THANK YOU!

QUESTIONS?