http:// aifb iuriservice ii ontology development núria casellas, denny vrandečić, joan josep...

Post on 14-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

http://www.sekt-project.com

AIF

B

Iuriservice II Ontology Iuriservice II Ontology DevelopmentDevelopment

Núria Casellas, Núria Casellas, Denny VrandeDenny Vrandečić, Joan Josep Vallbé, čić, Joan Josep Vallbé, Aleks Jakulin, Mercedes BlázquezAleks Jakulin, Mercedes Blázquez

Workshop on Artificial Intelligence and LawWorkshop on Artificial Intelligence and LawXXII World Congress of Philosophy of Law and Social PhilosophyXXII World Congress of Philosophy of Law and Social Philosophy

Granada, May 2005Granada, May 2005

May 25, 2005 2 http://www.sekt-project.com

AIF

B

• Introduction to SEKT Project and Legal Case Study

• Methodology• OPJK• Improving knowledge discovery

on the competency questions• Architecture

Agenda

May 25, 2005 3http://www.sekt-project.com

The inSEKTs

BT

University of Sheffield

Vrije Universiteit Amsterdam

Sirma AI

Empolis

Universität Karlsruhe

Ontoprise

Universitat Autònomade Barcelona

Universität Innsbruck

Jozef Stefan Institute

iSOCO

Kea-pro

May 25, 2005 4 http://www.sekt-project.com

AIF

BSEKT

• Main goals of SEKT• European Leadership in Semantic

Technologies• Core Research

• Combine Human Language Technologies, Knowledge Discovery and Ontology Technologies

• Provide intelligent knowledge access

May 25, 2005 5 http://www.sekt-project.com

AIF

BDescription of the Problem:

Legal Domain• In General:

• Complaint about diligence of legal administration. • The Judges are overworked.

• In Particular:• New Judges • A lot of theoretical knowledge, but few practical knowledge• On Duty.

• When they are confronted with situations in which they are not sure what to do

• “Disturb” experienced judges with typical questions. • Usually his/her former tutor (Preparador)

• Existing Technology • Legal Databases

• Essential in their daily work• Based on keywords and boolean operators• A search retrieves a huge number of hits

May 25, 2005 6http://www.sekt-project.com

Description of the Problem: Legal Domain

• Solution:• Design an intelligent system to help new judges with their

typical problems.• Extended FAQ system using Semantic Web technologies• Connect the FAQ system with the exiting jurisprudence.

• Search Jurisprudence using Semantic Web technologies.

May 25, 2005 7 http://www.sekt-project.com

AIF

B

• LLD [Language for Legal Discourse, L.T. McCarty, 1989]: Atomic formula, Rules and Modalities.

• NOR [Norma, R.K. Stamper, 1991, 1996]: Agents Behavioral invariants, Realizations.

• LFU [Functional Ontology for Law, R.W. van Kranlinger; P.R.S. Visser, 1995]: Normative Knowledge, World knowledge, Responsibility knowledge, Reactive knowledge and Creative knowledge.

• FBO [Frame-Based Ontology of Law, A. Valente, 1995]: Norms, Acts and Concepts Descriptions].

• LRI-Core Legal Ontology [J. Breuker et al., 2002]: Objects, Processes, Physical entities, Mental entities, Agents, Communicative Acts.

• IKF-IF-LEX Ontology for Norm Comparaison [A. Gangemi et al., 2001]: Agents, Institutive Norms, Instrumental provisions; Regulative norms; Open-textured legal notions, Norm dynamics.

State of the Art in Legal Ontologies

May 25, 2005 8 http://www.sekt-project.com

AIF

B

• Professional Knowledge (PK)• Legal Knowledge (LK)

Legal Core Ontologies (LCO) [based on General Theories of Law]

• Legal Professional Knowledge (LPK) OPLK

• Judicial Professional Knowledge (JPK) OPJK

Conceptual distinctions

May 25, 2005 9http://www.sekt-project.com

14

7

5

298

16

8

8

10

12 10

6 116

Total Autonomous Communities: 14 (out of 17)

Ethnographic survey

May 25, 2005 10 http://www.sekt-project.com

AIF

B

Statistical analysis of results

• Judicial units: heterogeneity

• Judge’s profile

Protocols of analysis

• Literal transcripts

• Completed questionnaires

• List of extracted questions

Preliminary exploitation of data

May 25, 2005 11 http://www.sekt-project.com

AIF

B

• Identification of possible concepts through ALCESTE’s results and TextToOnto conceptual distribution

• Domain detection

• Competency questions discussion and concept extraction

OPJK Modeling

May 25, 2005 12 http://www.sekt-project.com

AIF

B

JUDGE

ON-DUTYFAMILY ISSUES

IMMIGRATION

REAL ESTATE

DECISION-MAKING &

JUDGMENTS

PROCEEDINGSJUDICIAL CLERKS

COMMERCIAL LAW

CONTRACT LAW

CRIMINAL LAW GENDER

VIOLENCE

ORDER OF PROTECTION

/ INJUNCTION

Intuitive ontological subdomains

May 25, 2005 13 http://www.sekt-project.com

AIF

BTerm extraction using

TextToOnto

May 25, 2005 14 http://www.sekt-project.com

AIF

BTerm extraction using

TextToOnto and Spanish Gate

May 25, 2005 15 http://www.sekt-project.com

AIF

B

1. Identify important concepts that

should be represented2. Hierarchy construction3. Identify relations between them 4. Redefine the ontology repeting steps

1-4

May 25, 2005 16 http://www.sekt-project.com

AIF

B

Selecting (underlying) all the nouns (usually concepts) and adjectives (usually properties) contained in the competency questions.

• ¿Cuál es el tratamiento de las denuncias manifiestamente inverosímiles o relativas a hechos que evidentemente carecen de tipicidad?

• ¿Y si se trata de una querella que reúne todos los demás presupuestos procesales pero los hechos objeto de la misma carecen de relevancia penal o manifiestamente falsos?

• ¿Qué ocurre si comparece en el juzgado una persona que quiere denunciar hechos difícilmente creíbles, sin relación entre sí, dudándose por el juez de la capacidad mental del denunciante?

• ¿Ante quién debe interponerse el recurso de reforma contra la prisión, delante del juez de guardia o del juez que dictó el correspondiente auto de prisión?

Competency question discussion

May 25, 2005 17 http://www.sekt-project.com

AIF

BOPJK classes identified

May 25, 2005 18 http://www.sekt-project.com

AIF

BOPJK and Proton Integration

http://www.sekt-project.com

AIF

B

Improving knowledge discovery Improving knowledge discovery on the competency questionson the competency questions

May 25, 2005 20 http://www.sekt-project.com

AIF

B

Data: 3 text corpora (judges’ questions):

• Corpus 1: Scholar “on duty” questions (Spanish Judicial School = 99)

• Corpus 2: Practical “on duty” questions (= 163) (field work)

• Corpus 3: All practical questions (=756)(field work)

Method: • TEXT GARDEN (J. Stefan Institute,

Ljubljana)• ALCESTE -Analysis of the co-occurring

lexemes within the simple statements of a text [Reinert 2002, 2003]

Data and Method

May 25, 2005 21 http://www.sekt-project.com

AIF

B

The text needs to be represented in an appropriate way for statistical analysis:

1. Breaking text into “units” (lines, sentences, …)

2. Morphological categorization (adjectives, prepositions, …)

3. Putting words into canonical form:a) Lemmatization (is,was,are → be)b) Stemming (loved, loving → lov+)

4. Analysis:a) Clusteringb) Latent semantic indexingc) Correspondence analysisd) Classificatione) Visualization

Analysis of Text

ALCESTE (Reinert,1988)

Corpus

Segmented in chunks

Classes of related chunks

List of typical words related to each class

{ }

{ }

{ }

Geometric representation

Hierarchical descending clustering

Correspondence analysis

Folch & Habert (2000)

Example of Correspondence Analysis and Visualization

+-----|---------|---------|---------+---------|---------|---------|-----+• 20| solo| |• 19| | parte+ |• 18| | monitorio demand+ |• 17| | archiv+accion+ |• 16| present+ | falta+ vehiculo+fase+ |• 15| | seguir procurador+ |• 14| |recurso+ pago+quiebra+ |• 13| ofici+| gasto+ . .ejecut+ejecucion+ |• 12| sido dia+ .finca+embarg+verbal+ |• 11| interes+traficoacto+.notificacionentrega+ |• 10| momentocelebr+hall+ cuantia+resolver |• 9 | valor+ |auto+admit+qued+.juicio+deposit+ |• 8 | lesion+ venirdinero.. notific+pericial+ |• 7 | | si vista+aport+inform+ |• 6 madreacord+viviend+ | cabo solicit+ |• 5 | victima+maridoempresa+ | llev+ ya prueba+abogado+ |• 4 | ..tratosproteccion | |• 3 | .senor+alejamiento | responsabili |• 2 tema+mujer+malo+violencia | |• 1 | denunci+medida+visitas | |• 0 +--.separacion+orden+---------------+-----venirfiscal+------------------+• 1 | pidepresun+ | |• 2 | | |• 3 | | |• 4 | | |• 5 | | |• 6 | | |• 7 | dict+ | |• 8 | | |• 9 | | |• 10| | |• 11| | |• 12| | |• 13| | |• 14| | un |• 15| | |• 16| | levantamient |• 17| | tenerdeten+ libertadforense |• 18| |person+ .. . ..hacercausa+asunto+ |• 19| servicio+ ......judicial+actuacion+ |• 20| guardia+. juezllam+ .. .policiadetenido+ |• 21| | partido+ |• +-----|---------|---------|---------+---------|---------|---------|-----+

ALCESTE

TEXT GARDEN

Example of ClusteringClass 1: Judicial unit

funcionar+ (21), juzgar(26), oficina(11), trabaj+(13), decir(26), llam+(16), mand+(12), acudir(11), adjunto(4), busc+(4), consult+(4), dato(6), hablar(4), jurisprudencia(3), local+(3), material(6), necesit+(7), policia(14), prensa(4), sala(4), funerari+(2), hurto(3), informacion(5), miedo(3), robo(3), servicio+(7), sustitu+(4), tecnico(2), venir(15)

Class 2: Family lawalejamiento(22), malo(22), medida(16), orden+(23), proteccion(17), senor+(13), trat+(22), victima(11), mujer(11), padre(7), denunci+(12), domestico(8), violencia(8), agresor(4), dict+(10), madre(7), marido(6), nino(5), pension(4), psicolog+(5), separacion(5), abus+(5), alimento(3), ayud+(4), casa(3), cautelar+(3), divorcio(2), empresa(3), hijo(4), lesion+(6)

Class 3: Proceedingsescrit+(9), fiscal+(13), instruccion(9), ordinario(5), seguir(11), acumular(5), audiencia-provincia(2), conform+(2), contradictori+(3), criterio+(10), cuantia(5), falt+(7), injusto(3), interpretacion(3), ley(6), motiv+(3), pendiente(2), perito(5)

Class 4: Enforcement (judgment)ejecucion(14), ejecut+(15), embarg+(11), finca+(9), depositar+(6), interes+(6), pago(6), suspension(5), deposito(6), entreg+(6), quiebra(5), sentencia(9), solicit+(9), vehiculo(4), acreedor(3), administracion(4), cantidad(4), conden+(4), cost+(4), dinero(4), edicto(2), imposibilidad(3), multa(3), notificacion(4), pagar+(4)

May 25, 2005 25http://www.sekt-project.com

Stemming: the longest string of characters that is common to different words:

For all the variants of ‘love’, but also for ‘lover’ (noun), ‘lovely’ (adverb), it can offer the stem: lov+

Lemmatization respects the category:

3 different lemma: love (verb), lover (noun) lovely (adv)

If we apply this process to Spanish or Catalan (or every Romanesque language), which have a high flection capacity (60 forms for verbs, without taking into account the composed forms), stemming would hide a lot of information.

Stem Lemaacumulacion acumulaciónacumularseacumularacumul+ ---admision admisiónadmit+ admitircelebracion celebracióncelebr+ celebrarmisma+ mismomismo+ ---suspenderse suspendersuspend+ ---

EXAMPLES

Stemming vs Lemmatization

Quantitative ComparisonStemmed

Corpus

Lemmatized Corpus

Num. different

forms

3074 2064

Num. Ocurren

ces

19861 19946

Max. Freq. Of a form

1230 2208

Hapax 1666 934

• Lemmatized corpus has fewer word-forms than the stemmed version.• The LSI on the lemmatized corpus is able to reconstruct documents better, especially in few dimensions.• The lemmatized corpus clustering is more detailed.

May 25, 2005 27 http://www.sekt-project.com

AIF

B

1. Clustering with stemmed corpus offers us 4 classes:1. ‘On-duty’ actions (mixed with Judicial Office)

(54,06%)2. Proceedings and Trial (18,10%)3. Enforcement (judgements) (14,39%)4. Family Law (gender violence, divorce,

separation…) (13,46%)

2. Clustering with lemmatized corpus is more detailed and offers 6 classes:1. Judicial Office (20,11%)2. ‘On-duty’ actions (27,25%)3. Family Law (gender violence, divorce,

separation…)(14,55%)4. Proceedings (15,61%)5. Trial (8,47%)6. Enforcement (judgements) (14,02%)

Comparision of Clustering Results

May 25, 2005 28 http://www.sekt-project.com

AIF

BTake-Home Messages

• Do text analysis of legal documents!

• If you do that, Do lemmatization!

http://www.sekt-project.com

AIF

B

MethodologyMethodology

May 25, 2005 30 http://www.sekt-project.com

AIF

BInitial Methodology

+ Based on 800 competency questions

+ Questions were clustered

+ Middle-out strategy

– Usage of ontology not considered

– Repetitive discussions

– Long discussions

May 25, 2005 31 http://www.sekt-project.com

AIF

BConsidering the “Why”

• No normative knowledge

• Stick to the questions as sources

• Model the questions, not the answers

May 25, 2005 32http://www.sekt-project.com

Wiki visualization

May 25, 2005 33 http://www.sekt-project.com

AIF

BDiligent Argumentation

Ontology

• Argumentation ontology defined

• Based on Case Studies to identify the most effective types of arguments

• Argument type recognition based on RST

May 25, 2005 34 http://www.sekt-project.com

AIF

BMethodology changes

Using DILIGENT made the ontology engineering…

• … much faster

• … amenable to distributed development

• … better documented

• … trackable

• … better manageable

Also DILIGENT itself got changed!

May 25, 2005 35 http://www.sekt-project.com

AIF

BOutlook

• Better tool support – off-the-shelf wiki had weaknesses

• Moderator support in discussions

• Competency question clustering

• Gathering further experience from legal and other case studies

http://www.sekt-project.com

AIF

B

ArchitectureArchitecture

May 25, 2005 37http://www.sekt-project.com

High Level Requirements

• Judges should not be bothered with a complex user interface. • A simple natural language interface is probably appropriate.

• The decision as to whether a new question is similar to a stored question (with its corresponding answer) should be based on semantics rather than on simple word matching. • An ontology can be used to perform this semantic matching of

questions.• The questions included in the system should be of high

quality.• Be rather exhaustive and reflect the actual situation• As extensive survey with more than 250 Spanish judges forms the

basis for the questions.• Justify the answer provided by the system with existing

Jurisprudence.• Jurisprudence databases.• Metadata and Ontology process of documents.

• Knowledge Management at all levels

May 25, 2005 38 http://www.sekt-project.com

AIF

BExample Question-Answer

• Question: • What problems can we foresee with the analysis of

small amounts of drugs, where the identification test destroys the drugs?

• Answer: • This is an unrepeatable piece of evidence at the trial. In

these cases, the Spanish Criminal Procedure Act states that the adversarial principle should be respected. While the trial proceedings are prepared, the judge must explain to all parties that they may choose an expert to perform these tests.

May 25, 2005 39http://www.sekt-project.com

Court and docket number

Names of the magistrates

Date and place

Prefatory statement

History of the Case

Grounds of Decision

Example of judgment: parts

May 25, 2005 40http://www.sekt-project.com

Question

Answer

FAQFAQ

JudgementJudgement

Summary

Case History

Decision Grounds

Ruling

OPJKOPJK

Practical Practical KnowledgeKnowledgeInstancesInstances

Relations between the Question/Answer &

Judgment

May 25, 2005 41 http://www.sekt-project.com

AIF

BArchitecture

Questions-

Answers

Expert Knowledge

Semantic

Matching

DB 1

Decisions

DB N

Decisions

Ontology Learning

& feeding

Ontology Merging

Jurisprudence

Ontology

Alignment

Web

browserNatural

Language

May 25, 2005 42 http://www.sekt-project.com

AIF

BExpert Knowledge Retrieval

Design - Technological considerations

Ontology

Domain

Detection

Keyword

Matching

Ontology

Grapth

Path Matching

iFAQ System

Multistage Searching Subsystem

Ontology TechnologyNatural Language Processing

Caching subsystem Persistence subsystem

Efi

cien

cyA

ccuracy

May 25, 2005 43http://www.sekt-project.com

Expert Knowledge Retrieval

• Chain of Resposability pattern

FAQ

Candidates

FAQ FAQ FAQ

UserQuestion

iFAQ Search Engine

Ontology Domain

Detection

FAQ

Search Factory

Other search engines ...

Keyword/synonym

matching stage

Ontology graph

path matching

Plugged Searching Stages

May 25, 2005 44http://www.sekt-project.com

Expert Knowledge Retrieval

Ontology

Linking

NLPNL query POS list

(lemmas)

Semantic

Distance

Calculation

Semantic distance

Between queries

Term Coverage

Calculation between

queries

Best match

of stored queries

Ontology

Semantic Similarity: Main steps

May 25, 2005 45 http://www.sekt-project.com

AIF

BExpert Knowledge Retrieval

• The semantic distance is based on the weighted navigation distance between terms in the ontology.

• Navigation through the ontology means that one moves from one concept to another concept, via one of its relations or attributes.• Is a

• Follows

• Actor

• Etc.

• The task of associating distance costs:• Is a domain specific

• Needs to be performed by legal expert.

Semantic Similarity

Ontology

Accuse

Actions

Follow

Denounce

MotherSonSon

Mother

May 25, 2005 46 http://www.sekt-project.com

AIF

BConclusions

• Decision support system for unexperienced judges

• Using Semantic Web technology for handling knowledge• Provide knowledge for decision making process• Capture knowledge from experts• Share knowledge among all users

• Extended understanding capacities• Background knowledge: Professional Legal Ontology• Decision Explanation• Improved Knowledge Acquisition

top related