novyi mir research journal issn no: 0130-7673novyimir.net/gallery/nmrj2463 f.pdf · abstract:...

SEMANTIC RETRIEVAL OF LEGAL CASE DOCUMENTS USING

RELATIONAL METHONTOLOGY BASED C_MAPPING TECHNIQUE

1R. Priyadarshini, 2N. Rajendran, 3Amit Alex

1,2,3Department of Information Technology, B.S. Abdur Rahman Crescent Institute of

Science and Technology, Chennai, India

[email protected], [email protected], 3 [email protected]

Abstract: Retrieval of Information System (IRS) is used to retrieve documents based on

the keyword search. Semantic-based information retrieval is beyond standard information

retrieval and uses related information to get the crime incident documents from the

corpus. But semantic retrieval based documents is not efficient enough in real time.

Semantic retrieval is facilitated in the proposed work by means of building legal

ontologies using methontology and construction of legal ontology is supported by

classification techniques. The retrieval of grouping and matching of legal documents

(LD) are also visualized using C-Maps. The proposed methodology aims to build a

semantic model via legal ontology. To identify the crime incidents and to increase the

accuracy of retrieved crime incident documents using Classification techniques especially

c-mapping. Semantic model represents the data in a specific logical way for discovery of

hidden semantic structures in documents. C-Mapping or conceptual diagram that depicts

suggested relationships between concepts and also it builds upon previous knowledge by

connecting new information back to it. Experimental analysis on building methontology

based semantic retrieval of crime incident documents shows the improved results for

recall and precision up to 10%. Methontology is one of the methodology to build legal

ontologies based on specific legal domain.

Keywords: Topic Modeling, Semantic based LDA, C-Mapping, Legal Case Documents,

Methontology.

1. INTRODUCTION

Semantic search is a data searching technique in which a search query not only process

the meaning of the crime documents but also determine the relevant evidences, purpose

and relationship of the crime documents with the key topic. Semantic search systems

consider various points to provide relevant search results including the context of search

in a crime document, location, intent and conceptual matching. Semantic-based

information retrieval is beyond standard information retrieval and uses related

information to get the documents from the corpus. But semantic retrieval based

documents is not efficient enough in real time. Semantic information retrieval can be

implemented by means of ontology. Methontology is the methodology of ontology which

is designed on domain basis. Methontology is one among the building ontologies

either from scratch, reusing different ontologies,or by a method of re-engineering

them.. The legal methontologies are proposed to redesign in this project. The accuracy of

the retrieval in the model is determined by the quality and quantity of documents

collected and stored. The crime incident profile is based on complex information

associated with an individual crime or a group of crimes showed similar interests or

similar guidance behavior. The overall goal of the process is a combination of semantic

retrieval with classification and c-mapping.

ISSN No: 0130-7673

Page No: 118

NOVYI MIR Research Journal

Volume 5, Issue 4, 2020

mailto:[email protected]

mailto:[email protected]

2. RELATED WORKS

Probabilistic Topic Model for Learning Word Ontologies, this literature describes to

capture linguistics relationships between a word in topic and topic in document

understood in terms of probability distributions by using LSI-Probabilistic topic model. It

utilized for topic extraction in Information Retrieval. To find automatic learning in text

corpus based on methods. Here they have mentioned the algorithms for learning

ontology’s using the principle of topic relationship and exploiting information theory with

the probabilistic topic models learned. Experiments with different model parameters were

conducted and learned ontology statements were evaluated by the efficient domain

experts. The exactitude level of the learned ontology is spare for it to be deployed for the

aim of navigation, browsing of information, search and retrieval in data converter

libraries. This proposed model can considerably scale back time and effort within the

method. Learning at the same time many sub-trees within the topic hierarchy makes

ambiguous. [1].

Tackling topic general word in topic modeling, this paper describes topic models is a

current tool for vary over latent topics in documents, and for serving to finish Natural

language processing tasks. To obtain good topics for a corpus, a preprocessing step

usually essential to identify and remove TGW from the corpus done by manually. In

existing the user needs to remove a set of common stop words, such as ‘is’ and ‘a’, which

can be achieved using a standard stop word list. By using Topic modeling automatically

learns from previous results of various domain corpora to help identify TGWs in the

current domain. This model reduces time consumption and human intervention.

Automatic cleaning and extraction is not available for unstructured texts. [2].

A Social Media Ontology on Text Analytics, in real-world data with larger forms and can

be to a very great extent huge. For this concerns the need of algorithms which can deduce

and turn over the truth and valid information from the collection of raw facts. The

complex process of Text mining is used for this purpose successfully. This can be

alternately referred to as text mining, less or more substitute to data analytics, can be

defined as the process of extracting high-quality information from text. Text mining

involves the process of structuring the input data, deriving patterns among the structured

knowledge and lastly an explanation and declaration of the results. This paper provides

jilt on text analytics and social media analytics. The proposed work is based on ontology

framework with excessive social media textual information. By using Ontology based

framework provides users with a good interface to interact with the system. This

technique is capable of handling only one thousand requests at a time. [3].

Semi-Automatic Terminology Ontology Learning Based on Topic Modeling, this

literature describe ontology’s provide options like vocabulary, reusability, machine

readable content, and it allows for semantic search, make possible interactions, ordering

and structuring of information for the Semantic Web application. The challenge in

ontology engineering is automatic learning there is still a lack of fully automatic approach

from a corpus of various topics to form ontology using machine learning techniques. In

this paper, two topic modeling algorithms are explored, namely LSI & SVD and LDA for

learning topic ontology this determine the statistical relationship between document and

terms to create the topic ontology and reduce manual work. Building a topic ontology and

semantic retrieving corresponding topic ontology for the users query demonstrating the

effectiveness of the proposed approach. [4]

ISSN No: 0130-7673

Page No: 119



The retrieval of information model based on individual user interest, this literature

describes the problem of signified elucidation is often decomposed to the determination of

a linguistics similarity of words. The architecture of personalized information retrieval

based on user interest is presented. The architecture includes user interface models user

interest model, detecting interest model and update model. It established a user model for

personalized information retrieval based on user interest keyword list on client server

which can supply personalized information retrieval service for user with the

communication and cooperation of all modules of the architecture. Retrieval of relevant

documents based on personalized content is not efficient. [5].

3. SYSTEM DESIGN AND METHODOLOGY



classification techniques. The retrieval of grouping and matching of legal documents (LD)

are also visualized using C-Maps. The proposed methodology aim to build a semantic

model via legal ontology. To identify the crime incidents and to increase the accuracy of

retrieved crime incident documents using Classification techniques especially c-mapping.

Semantic modeling is a methodology of structuring raw facts so as to represent it in an

exceedingly specific logical method for discovery of hidden semantic structures in

documents. C-Mapping or conceptual diagram that depicts suggested relationships

between concepts and also it builds upon previous knowledge by connecting new

information back to it. Experimental analysis on building methontology based semantic

retrieval of crime incident documents shows the improved results. Methontology is one of

the methodology to build legal ontologies based on specific legal domain.

Figure 1: Architecture for semantic modeling of legal text documents using classification

and methontology.

ISSN No: 0130-7673

Page No: 120



3.1 Document Extraction:

The legal case texts documents are extracted from the legal repository web pages using

web Scraping. Web Scraping may be a technique which detects and removes the surplus

around the main textual content of a web page.

3.2 Data Pre-processing:

Figure-3: Data Pre-processing

The unstructured raw data (legal documents) are pre-processed using NLP techniques

such as stop words removal, lemmatization and tokenization.

Stop words are words which are filtered out before or after processing of natural

text language.

Tokenization is that the method of demarcating and presumably classifying

sections of a string of input characters.

Lemmatization is the process of grouping together the inflected forms of a word

so they can be analyzed as a single item, identified by the word's lemma, or

lexicon form.

3.3 Generation of sample data

Figure-4:Generation of sample data

Document Term Matrix (DTM)– a matrix that lists all chances of words within the clean

text, by document. In the DTM, the documents are represented by rows and the terms (or

words) by single file(columns).Structured data is generated along with weight assigned.

3.4 Topic Modeling using SLDA:

Documents are depicted as random mixtures over latent topics, wherever every topic is

characterized by distribution over words, V1, V2, V3, V4 & V5 shown in table 1 are the

values of the LDA documents to terms based on the topic modelling.

ISSN No: 0130-7673

Page No: 121



3.5 Generative Process

Step 1: By using Dirichlet distribution, choose a distribution over topics

Example: 70% War, 30% Election

Step 2: For each word in the document

a) By using multinomial distribution, choose a topic from the distribution over

topics. Example: War

b) By using multinomial distribution, choose a word from the Corresponding

topic. Example: Bomb

3.6 Clustering of Legal Documents:

Figure-5: Clustering of Legal Documents

Table 1: LDA Docs to Terms

S.No V1 V2 V3 V4 V5

1 0.078253 0.075837 0.054421 0.141923 0.526567

2 0.3 0.254852 0.255259 0.216815 0.077074

3 0.157843 0.239714 0.207103 0.111857 0.212143

4 0.113 0.296014 0.215474 0.181611 0.101763

5 0.218276 0.121938 0.117201 0.322138 0.122207

6 0.143935 0.274968 0.205452 0.245661 0.137484

7 0.249479 0.239394 0.197103 0.197181 0.126261

8 0.169354 0.199405 0.355234 0.134628 0.147279

9 0.168667 0.279778 0.260889 0.201189 0.020278

10 0.173233 0.212429 0.285671 0.205469 0.123278

Table 2 : LDA Docs to Terms

Legal Case Documents V1

CC-ANSHITA GUPTA versus DR. AMIT AGARWAL - LNINDORD 2017 SC 8680.txt 5

CC-Dilip Pandey versus The State Of Madhya Pradesh - LNIND 2014 MP 1082.txt 3

CC-Endolabs Limited Thru. Director Shri Dheeraj Lulla versus State Bank Of India Thru.

Branch Manager And 2 Others - LNIND 2013 MP 1021.txt 2

CC-Gopal Nagda versus State Of M.P. And 2 Ors. - LNIND 2014 MP 13588.txt 2

CC-Kalabai versus Saleem And 2 Ors. - LNIND 2012 MP 178.txt 4

ISSN No: 0130-7673

Page No: 122



CC-Khubchand versus Emirates Bank InternationalPetitioner Advocate - LNIND 2014 MP

22991.txt 2

CC-Kshama GourThru.KshamaGour versus Madhya Pradesh PashchimKshetraVidhyutVitaran

Company Ltd. - LNIND 2014 MP 17484.txt 1

CC-Laxman versus The State Of Madhya Pradesh - LNIND 2014 MP 1176.txt 3

CC-Madhulika versus Rajesh - LNIND 2014 MP 13113.txt 2

CC-Omprakash Garg versus The State Of Madhya Pradesh - LNIND 2014 MP 14765.txt 3

Legal Case Documents V1

CC-ANSHITA GUPTA versus DR. AMIT AGARWAL - LNINDORD 2017 SC 8680.txt 5

CC-Dilip Pandey versus The State Of Madhya Pradesh - LNIND 2014 MP 1082.txt 3

CC-Endolabs Limited Thru. Director Shri Dheeraj Lulla versus State Bank Of India Thru.

Branch Manager And 2 Others - LNIND 2013 MP 1021.txt 2

CC-Gopal Nagda versus State Of M.P. And 2 Ors. - LNIND 2014 MP 13588.txt 2

CC-Kalabai versus Saleem And 2 Ors. - LNIND 2012 MP 178.txt 4

CC-Khubchand versus Emirates Bank InternationalPetitioner Advocate - LNIND 2014 MP

22991.txt 2

CC-Kshama GourThru.KshamaGour versus Madhya Pradesh PashchimKshetraVidhyutVitaran

Company Ltd. - LNIND 2014 MP 17484.txt 1

CC-Laxman versus The State Of Madhya Pradesh - LNIND 2014 MP 1176.txt 3

CC-Madhulika versus Rajesh - LNIND 2014 MP 13113.txt 2

CC-Omprakash Garg versus The State Of Madhya Pradesh - LNIND 2014 MP 14765.txt 3

3.7 Analysis and Retrieval of matched documents:

Figure-7\6: Analysis and retrieval of matched documents

The input string is preprocessed and clustered using K-means. The document is retrieved

from the database based on user query. The document displayed is based on exact match

of the user input. The model loops on the words of every sentence and either tries to use

the particular word of to predict its neighbors (its context), within which case the tactic is

named “Skip-Gram”.

ISSN No: 0130-7673

Page No: 123



4. Semantic Retrieval



classification techniques. The retrieval of grouping and matching of legal documents (LD)

are also visualized using C-Maps. The proposed methodology aim to build a semantic

model via legal ontology. To identify the crime incidents and to increase the accuracy of

retrieved crime incident documents using Classification techniques especially c-mapping.

Semantic modeling is a methodology of structuring raw facts so as to represent it in an

exceedingly specific logical method for discovery of hidden semantic structures in

documents. C-Mapping or conceptual diagram that depicts suggested relationships

between concepts and also it builds upon previous knowledge by connecting new

information back to it. Experimental analysis on building methontology based semantic

retrieval of crime incident documents shows the improved results. Methontology is one of

the methodology to build legal ontologies based on specific legal domain.

4.1 Advantages:

Retrieval of relevant crime documents based on personalization will be available.

Processing time is reduced.

Automatic extraction of topics from crime incident documents or from collections

of documents will be available.

Relations between the documents are created exclusively using c-mapping.

Methontology-one for building ontologies either from scratch, reusing different

ontologies as they’re, or by a method of re-engineering them.

4.2 Proposed Algorithm: Ontology based semantic retrieval using c-mapping

relations.

Input: features extraction from user profile

Variables: Document Class (DC), Individual(User) Class (IC), subject (S)

Step 1: For each subject S, documents are retrieved based on topic modeling. The

documents are classified as DC1, DC2, and DC3 using categorization based on

technical relevancy.

Step 2: Each feature of individual person profile is represented with a set of

numerical attributes (e.g. Data Analyst-9)

Step 3: Each of the training data consists of set of features and a class label

related to each vector as IC1, IC2 and IC3.

Step 4: Classification of user is done by comparing features of different k

adjacent points. (k=4)

Step 5: Users are classified as Categories IC1, IC2 and IC3 based on the options

extracted from the individual user profile.

Step 6: Based on the query, user search retrieves relevant and customized

documents for user. As an example, if UC1 class user searches for a subject cloud

computing the system retrieves documents within the customized order as DC1,

DC2 and DC3.

ISSN No: 0130-7673

Page No: 124



5.Graphical Analysis

ONTOLOGICAL GRAPH

Figure 7: Ontology Graph for relations in semantic retrieval

ISSN No: 0130-7673

Page No: 125



Table 3 : Attribute Table Entity for Gene Ontology

iri label entity entity labelquality quality label

DEMOTRAIT_0000002seed morphologyPO_0009010seed PATO_0000051morphology

DEMOTRAIT_0000003seed shapePO_0009010seed PATO_0000052shape

DEMOTRAIT_0000004seed size PO_0009010seed PATO_0000117size

DEMOTRAIT_0000005seed weightPO_0009010seed PATO_0000128weight

DEMOTRAIT_0000007morphologyPO_0009011plant structurePATO_0000051morphology

DEMOTRAIT_0000008shape PO_0009011plant structurePATO_0000052shape

DEMOTRAIT_0000009size PO_0009011plant structurePATO_0000117size

DEMOTRAIT_0000010weight PO_0009011plant structurePATO_0000128weight

DEMOTRAIT_0000012leaf morphologyPO_0009025vascular leafPATO_0000051morphology

DEMOTRAIT_0000013leaf shapePO_0009025vascular leafPATO_0000052shape

DEMOTRAIT_0000014leaf size PO_0009025vascular leafPATO_0000117size

DEMOTRAIT_0000015leaf weightPO_0009025vascular leafPATO_0000128weight

DEMOTRAIT_0000017flower morphologyPO_0009046flower PATO_0000051morphology

DEMOTRAIT_0000018flower shapePO_0009046flower PATO_0000052shape

DEMOTRAIT_0000019flower sizePO_0009046flower PATO_0000117size

DEMOTRAIT_0000020flower weightPO_0009046flower PATO_0000128weight

DEMOTRAIT_0000022perianth morphologyPO_0009058perianth PATO_0000051morphology

DEMOTRAIT_0000023perianth shapePO_0009058perianth PATO_0000052shape

DEMOTRAIT_0000024perianth sizePO_0009058perianth PATO_0000117size

DEMOTRAIT_0000025perianth weightPO_0009058perianth PATO_0000128weight

The above ontology relation figure and the table shows that the input is obtained

automatically from the relation graph. The assigned weights in the table 6 is based on the

relations in the ontology. The automatic assignment and relational support increases the

accuracy compared to the other methods in the semantic modelling.

Figure 8: Precision and Recall for five sets of 50 documents with LDA weightage and C-

Mapping based relational weight.

The recall and precision for old methodology of LDA and proposed relational

based LDA of semantic modelling is analyzed in the figure 8 & figure 9 respective results

are plotted. It shows that the combined method shows significant accuracy compared to

the LDA weight method alone.

ISSN No: 0130-7673

Page No: 126



Figure 9: Precision and Recall for five sets of 50 documents with LDA weightage only

6. Conclusion and future work Latent Dirchlet Allocation model used to increase the accuracy of retrieved

relevant documents. The time consumption is reduced by using this models and efficient

number of relevant documents are retrieved. The system provides in-depth ontological

structure using methontology and C-mapping where providing accurate results for the

given crime incident scenario. There is a 10% increase in the recall and precision when

the figures are compared with the implementation of both the methodologies. Future work

can target data retrieval for the precise domain which will permit semantically accurate

retrieval of documents while not human intervention. The number of crime incident

document analysis can be maximized and various experiments can be conducted on legal

ontologies and c-mapping. Analytics can be applied over the obtained results to mine

useful patterns from the past crime incident searches.

References

[1]. Amit Kumar Dhar, Monika Rani, O.P. Vyas, “Semi-Automatic Terminology

Ontology Learning Based on Topic Modeling”, Elsevier-Engineering

Applications of Artificial Intelligence, Vol. 63, August 2017, pp 108-125 .

[2]. Nikhil Vohra, Pankajdeep Kaur, Pallavi Sharma, “An Ontology Based Text

Analytics on Social Media”, Elsevier-International Journal of Database Theory

and Application, Vol. 8, February 2015, pp 233-240

[3]. KaisongSong, ShiFeng, WeiGao ,DalingWang ,GeYu ,Kam-

FaiWong,”Personalized Sentiment Classification Based on Latent Individuality of

Microblog Users”, Engineering Applications of Artificial Intelligence, Vol. 7,

June 2017, pp 124-133 .

ISSN No: 0130-7673

Page No: 127



[4]. Mohammadreza, Shams, Ahmad, Baraani-Dastjerdi, “Enriched LDA(ELDA):

Combination of latent Dirichlet allocation with word co-occurrence analysis for

aspect extraction ”,Elsevier- Expert Systems With Applications : An International

Journal, Vol. 80 , February 2017, pp. 83-93.

[5]. Miha Pavlinek, Vili Podgorelec,”Text classification method based on selftraining

and LDA topic models”, Elsevier- Expert Systems with Applications: An

International Journal, Vol. 80, September 2017 pp. 83-93.

[6]. Nikhil Vohra, Pankajdeep Kaur, Pallavi Sharma, “An Ontology Based Text

Analytics on Social Media”, Elsevier-International Journal of Database Theory

and Application, Vol. 8, February 2015, pp 233-240.

[7]. Suh, S., Choo, J., Lee, J., & Reddy, C. K. , “L-EnsNMF: Boosted local topic

discovery via ensemble of nonnegative matrix factorization”, In Proceedings of

the IEEE 16th international conference on data mining,2016.

[8]. Xingwang Zhao,Jiye Liang, and Chuangyin Dang,”Clustering ensemble

selection for categorical data based on internal validity indices”, Pattern

Recognition vol.69,pp.150-168,2017.

[9]. Tu Ding,Chen Ling,Lv Mingqi,Shi Hongyu, and Chen Gencai,”Hierarchical

online NMF for detecting and tracking topic hierarchies in a text

stream”,Pattern Recognition vol.76,pp.203-214,2018.

[10]. Damir Korenčić,Strahil Ristov,Jan and Šnajder, ”Document-based topic

coherence measures for news media text”,Expert Systems with Applications", vol.114,pp.357-373,2018.

[11]. Yong Chen,Hui Zhang,Rui Liu,Zhiwen Ye,Jianying Lin,”Experimental

explorations on short text topic mining between LDA and NMF based Schemes.Knowledge-Based Systems”, vol.163, pp.1-13, 2019.

[12]. Yan Liang,Ying Liu,Chong Chen and Zhigang Jiang,”Extracting

topicsensitive content from textual documents—A hybrid topic model

approach”,Engineering Applications of Artificial Intelligence vol.70,pp.81-91,2018.

[13]. Yueshen Xu,Jianwei Yin,Jianbin Huang and Yuyu Yin,”Hierarchical topic

modeling with automatic knowledge mining”,Expert Systems with Applications vol.103,pp.106-117,2018.

[14]. R., P., Tamilselvan, L. and N., R. (2019), "Semantic tracking and

recommendation using fourfold similarity measure from large scale data using

hadoop distributed framework in cloud", International Journal of Intelligent

Unmanned Systems, Vol. 7, No. 4, pp. 189-208.

ISSN No: 0130-7673

Page No: 128



ISSN No: 0130-7673

Page No: 129