enhance legal retrieval applications with an automatically induced knowledge base

23
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo

Upload: yazid

Post on 25-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Enhance legal retrieval applications with an automatically induced knowledge base. Ka Kan Lo. Contents. Introduction Practice in legal retrieval Generation of Background concepts Combining concepts and contexts Conclusion. Introduction. Why needs advanced legal retrieval, e-discovery? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enhance legal retrieval applications with an automatically induced knowledge base

Enhance legal retrieval applications with an automatically induced knowledge base

Ka Kan Lo

Page 2: Enhance legal retrieval applications with an automatically induced knowledge base

Contents

Introduction Practice in legal retrieval Generation of Background concepts Combining concepts and contexts Conclusion

Page 3: Enhance legal retrieval applications with an automatically induced knowledge base

Introduction

Why needs advanced legal retrieval, e-discovery?

Document Collections Legal Requirements Efficiency

Page 4: Enhance legal retrieval applications with an automatically induced knowledge base

Introduction

What challenges?

Explosive growth of document size Extensive document source Expanding document format collection Informal language

Page 5: Enhance legal retrieval applications with an automatically induced knowledge base

Introduction

Opportunities:

Background contexts utilization Search documents deeply for every possible

evidence Examples – TREC: complaint as

background information More context information: Web and the links

Page 6: Enhance legal retrieval applications with an automatically induced knowledge base

Practice in Retrieval Process

TREC legal track practice:

Defendants devise queries Plaintiffs’ turns Final queries for production request Document Retrieved

Page 7: Enhance legal retrieval applications with an automatically induced knowledge base

Practice in Retrieval Process

What can be added to the process?

Exploit the background information – complaints

Merge with the larger background – Web and links

Proposal in this work – Use Wikipedia as an example

Page 8: Enhance legal retrieval applications with an automatically induced knowledge base

Modeling

Page 9: Enhance legal retrieval applications with an automatically induced knowledge base

Generation of Background concepts

Representation of Background concepts:

Entities & Relations Ease the conversion from texts to

concepts Facilitate unsupervised operations

Page 10: Enhance legal retrieval applications with an automatically induced knowledge base

Generation of Background concepts

Concepts sources – Wikipedia

Page: a document Title: central concept described by a

document Links: A set of concepts / terms to other

pages Word: Set of words

Page 11: Enhance legal retrieval applications with an automatically induced knowledge base

Generation of Background concepts

Facilitate lexical realization from texts to concepts:

Surface concepts: Mentioned by a page

Hidden concepts: Indexed by no pages but exist in pages

Page 12: Enhance legal retrieval applications with an automatically induced knowledge base

Generation of Background concepts

Entities:

Basic objects – named entities, locations, organizations ….

Definitions: e⊂c, e≠r, e∈role of relations

Page 13: Enhance legal retrieval applications with an automatically induced knowledge base

Generation of Background concepts

Relations:

Relationships between concept r⊂c, r≠e, r=<role1, role2, role3>, rolei = e

Page 14: Enhance legal retrieval applications with an automatically induced knowledge base

Semantical Domain

Semantical Domain:

Group of inter-related concepts, as defined by Wikipedians

Groups can be configured, reconfigured, depending on the size, nature of domains

Represent background information of different size, nature, structures

Page 15: Enhance legal retrieval applications with an automatically induced knowledge base

Semantical Domain

Operations:

D = {pagei} where pagei ∈ E Overlap Subsumed Join

Page 16: Enhance legal retrieval applications with an automatically induced knowledge base

Knowledge Extraction, Parsing

Parsing:

Conversion of syntactic parse into concepts representations

Dependency parsing Fill the entities and relations

automatically

Page 17: Enhance legal retrieval applications with an automatically induced knowledge base

Entities & Relations

Highlights of the process:

Syntactic parsing of sentences Conversion from linguistic

representation to concepts representation

Constraint the concept spaces by different sizes and scopes

Page 18: Enhance legal retrieval applications with an automatically induced knowledge base

Combining the concepts and background contexts

Algorithms:

Filter the background text and request text Match the term set into Wikipedia Build the network of concepts and relations Combine for single network and filter

unnecessary concepts Extract terms and concepts and expand the

query string Fire the query to retrieval

Page 19: Enhance legal retrieval applications with an automatically induced knowledge base

Conclusion

Page 20: Enhance legal retrieval applications with an automatically induced knowledge base

Conclusion

Challenges in legal retrieval Background contexts Generation of background concepts Project the context to concepts Expand the queries for retrieval

Page 21: Enhance legal retrieval applications with an automatically induced knowledge base

Conclusion

Current work: Integration of language learning (not only

parsing) and concepts generation process Large scale construction of networks with

full document set in 3 languages on Grid: English: 1.7 million Spanish: 300 thousand Chinese: 200 thousand

Page 22: Enhance legal retrieval applications with an automatically induced knowledge base

Conclusion

Current work: Experiments running on 20M web pages corpus for

expanded links Generated Language, Concept spaces used in

other Natural Language Technologies (NLT)

TREC-Legal: Testing the integration of knowledge base with the complaint text for queries

TREC-Legal: Building new matching mechanism (from KB induction) on small, concise set of documents

Page 23: Enhance legal retrieval applications with an automatically induced knowledge base

Thank you

QA