novyi mir research journal issn no: 0130-7673novyimir.net/gallery/nmrj2463 f.pdf · abstract:...
TRANSCRIPT
SEMANTIC RETRIEVAL OF LEGAL CASE DOCUMENTS USING
RELATIONAL METHONTOLOGY BASED C_MAPPING TECHNIQUE
1R. Priyadarshini, 2N. Rajendran, 3Amit Alex
1,2,3Department of Information Technology, B.S. Abdur Rahman Crescent Institute of
Science and Technology, Chennai, India
[email protected], [email protected], 3 [email protected]
Abstract: Retrieval of Information System (IRS) is used to retrieve documents based on
the keyword search. Semantic-based information retrieval is beyond standard information
retrieval and uses related information to get the crime incident documents from the
corpus. But semantic retrieval based documents is not efficient enough in real time.
Semantic retrieval is facilitated in the proposed work by means of building legal
ontologies using methontology and construction of legal ontology is supported by
classification techniques. The retrieval of grouping and matching of legal documents
(LD) are also visualized using C-Maps. The proposed methodology aims to build a
semantic model via legal ontology. To identify the crime incidents and to increase the
accuracy of retrieved crime incident documents using Classification techniques especially
c-mapping. Semantic model represents the data in a specific logical way for discovery of
hidden semantic structures in documents. C-Mapping or conceptual diagram that depicts
suggested relationships between concepts and also it builds upon previous knowledge by
connecting new information back to it. Experimental analysis on building methontology
based semantic retrieval of crime incident documents shows the improved results for
recall and precision up to 10%. Methontology is one of the methodology to build legal
ontologies based on specific legal domain.
Keywords: Topic Modeling, Semantic based LDA, C-Mapping, Legal Case Documents,
Methontology.
1. INTRODUCTION
Semantic search is a data searching technique in which a search query not only process
the meaning of the crime documents but also determine the relevant evidences, purpose
and relationship of the crime documents with the key topic. Semantic search systems
consider various points to provide relevant search results including the context of search
in a crime document, location, intent and conceptual matching. Semantic-based
information retrieval is beyond standard information retrieval and uses related
information to get the documents from the corpus. But semantic retrieval based
documents is not efficient enough in real time. Semantic information retrieval can be
implemented by means of ontology. Methontology is the methodology of ontology which
is designed on domain basis. Methontology is one among the building ontologies
either from scratch, reusing different ontologies,or by a method of re-engineering
them.. The legal methontologies are proposed to redesign in this project. The accuracy of
the retrieval in the model is determined by the quality and quantity of documents
collected and stored. The crime incident profile is based on complex information
associated with an individual crime or a group of crimes showed similar interests or
similar guidance behavior. The overall goal of the process is a combination of semantic
retrieval with classification and c-mapping.
ISSN No: 0130-7673
Page No: 118
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
2. RELATED WORKS
Probabilistic Topic Model for Learning Word Ontologies, this literature describes to
capture linguistics relationships between a word in topic and topic in document
understood in terms of probability distributions by using LSI-Probabilistic topic model. It
utilized for topic extraction in Information Retrieval. To find automatic learning in text
corpus based on methods. Here they have mentioned the algorithms for learning
ontology’s using the principle of topic relationship and exploiting information theory with
the probabilistic topic models learned. Experiments with different model parameters were
conducted and learned ontology statements were evaluated by the efficient domain
experts. The exactitude level of the learned ontology is spare for it to be deployed for the
aim of navigation, browsing of information, search and retrieval in data converter
libraries. This proposed model can considerably scale back time and effort within the
method. Learning at the same time many sub-trees within the topic hierarchy makes
ambiguous. [1].
Tackling topic general word in topic modeling, this paper describes topic models is a
current tool for vary over latent topics in documents, and for serving to finish Natural
language processing tasks. To obtain good topics for a corpus, a preprocessing step
usually essential to identify and remove TGW from the corpus done by manually. In
existing the user needs to remove a set of common stop words, such as ‘is’ and ‘a’, which
can be achieved using a standard stop word list. By using Topic modeling automatically
learns from previous results of various domain corpora to help identify TGWs in the
current domain. This model reduces time consumption and human intervention.
Automatic cleaning and extraction is not available for unstructured texts. [2].
A Social Media Ontology on Text Analytics, in real-world data with larger forms and can
be to a very great extent huge. For this concerns the need of algorithms which can deduce
and turn over the truth and valid information from the collection of raw facts. The
complex process of Text mining is used for this purpose successfully. This can be
alternately referred to as text mining, less or more substitute to data analytics, can be
defined as the process of extracting high-quality information from text. Text mining
involves the process of structuring the input data, deriving patterns among the structured
knowledge and lastly an explanation and declaration of the results. This paper provides
jilt on text analytics and social media analytics. The proposed work is based on ontology
framework with excessive social media textual information. By using Ontology based
framework provides users with a good interface to interact with the system. This
technique is capable of handling only one thousand requests at a time. [3].
Semi-Automatic Terminology Ontology Learning Based on Topic Modeling, this
literature describe ontology’s provide options like vocabulary, reusability, machine
readable content, and it allows for semantic search, make possible interactions, ordering
and structuring of information for the Semantic Web application. The challenge in
ontology engineering is automatic learning there is still a lack of fully automatic approach
from a corpus of various topics to form ontology using machine learning techniques. In
this paper, two topic modeling algorithms are explored, namely LSI & SVD and LDA for
learning topic ontology this determine the statistical relationship between document and
terms to create the topic ontology and reduce manual work. Building a topic ontology and
semantic retrieving corresponding topic ontology for the users query demonstrating the
effectiveness of the proposed approach. [4]
ISSN No: 0130-7673
Page No: 119
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
The retrieval of information model based on individual user interest, this literature
describes the problem of signified elucidation is often decomposed to the determination of
a linguistics similarity of words. The architecture of personalized information retrieval
based on user interest is presented. The architecture includes user interface models user
interest model, detecting interest model and update model. It established a user model for
personalized information retrieval based on user interest keyword list on client server
which can supply personalized information retrieval service for user with the
communication and cooperation of all modules of the architecture. Retrieval of relevant
documents based on personalized content is not efficient. [5].
3. SYSTEM DESIGN AND METHODOLOGY
Semantic retrieval is facilitated in the proposed work by means of building legal
ontologies using methontology and construction of legal ontology is supported by
classification techniques. The retrieval of grouping and matching of legal documents (LD)
are also visualized using C-Maps. The proposed methodology aim to build a semantic
model via legal ontology. To identify the crime incidents and to increase the accuracy of
retrieved crime incident documents using Classification techniques especially c-mapping.
Semantic modeling is a methodology of structuring raw facts so as to represent it in an
exceedingly specific logical method for discovery of hidden semantic structures in
documents. C-Mapping or conceptual diagram that depicts suggested relationships
between concepts and also it builds upon previous knowledge by connecting new
information back to it. Experimental analysis on building methontology based semantic
retrieval of crime incident documents shows the improved results. Methontology is one of
the methodology to build legal ontologies based on specific legal domain.
Figure 1: Architecture for semantic modeling of legal text documents using classification
and methontology.
ISSN No: 0130-7673
Page No: 120
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
3.1 Document Extraction:
The legal case texts documents are extracted from the legal repository web pages using
web Scraping. Web Scraping may be a technique which detects and removes the surplus
around the main textual content of a web page.
3.2 Data Pre-processing:
Figure-3: Data Pre-processing
The unstructured raw data (legal documents) are pre-processed using NLP techniques
such as stop words removal, lemmatization and tokenization.
Stop words are words which are filtered out before or after processing of natural
text language.
Tokenization is that the method of demarcating and presumably classifying
sections of a string of input characters.
Lemmatization is the process of grouping together the inflected forms of a word
so they can be analyzed as a single item, identified by the word's lemma, or
lexicon form.
3.3 Generation of sample data
Figure-4:Generation of sample data
Document Term Matrix (DTM)– a matrix that lists all chances of words within the clean
text, by document. In the DTM, the documents are represented by rows and the terms (or
words) by single file(columns).Structured data is generated along with weight assigned.
3.4 Topic Modeling using SLDA:
Documents are depicted as random mixtures over latent topics, wherever every topic is
characterized by distribution over words, V1, V2, V3, V4 & V5 shown in table 1 are the
values of the LDA documents to terms based on the topic modelling.
ISSN No: 0130-7673
Page No: 121
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
3.5 Generative Process
Step 1: By using Dirichlet distribution, choose a distribution over topics
Example: 70% War, 30% Election
Step 2: For each word in the document
a) By using multinomial distribution, choose a topic from the distribution over
topics. Example: War
b) By using multinomial distribution, choose a word from the Corresponding
topic. Example: Bomb
3.6 Clustering of Legal Documents:
Figure-5: Clustering of Legal Documents
Table 1: LDA Docs to Terms
S.No V1 V2 V3 V4 V5
1 0.078253 0.075837 0.054421 0.141923 0.526567
2 0.3 0.254852 0.255259 0.216815 0.077074
3 0.157843 0.239714 0.207103 0.111857 0.212143
4 0.113 0.296014 0.215474 0.181611 0.101763
5 0.218276 0.121938 0.117201 0.322138 0.122207
6 0.143935 0.274968 0.205452 0.245661 0.137484
7 0.249479 0.239394 0.197103 0.197181 0.126261
8 0.169354 0.199405 0.355234 0.134628 0.147279
9 0.168667 0.279778 0.260889 0.201189 0.020278
10 0.173233 0.212429 0.285671 0.205469 0.123278
Table 2 : LDA Docs to Terms
Legal Case Documents V1
CC-ANSHITA GUPTA versus DR. AMIT AGARWAL - LNINDORD 2017 SC 8680.txt 5
CC-Dilip Pandey versus The State Of Madhya Pradesh - LNIND 2014 MP 1082.txt 3
CC-Endolabs Limited Thru. Director Shri Dheeraj Lulla versus State Bank Of India Thru.
Branch Manager And 2 Others - LNIND 2013 MP 1021.txt 2
CC-Gopal Nagda versus State Of M.P. And 2 Ors. - LNIND 2014 MP 13588.txt 2
CC-Kalabai versus Saleem And 2 Ors. - LNIND 2012 MP 178.txt 4
ISSN No: 0130-7673
Page No: 122
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
CC-Khubchand versus Emirates Bank InternationalPetitioner Advocate - LNIND 2014 MP
22991.txt 2
CC-Kshama GourThru.KshamaGour versus Madhya Pradesh PashchimKshetraVidhyutVitaran
Company Ltd. - LNIND 2014 MP 17484.txt 1
CC-Laxman versus The State Of Madhya Pradesh - LNIND 2014 MP 1176.txt 3
CC-Madhulika versus Rajesh - LNIND 2014 MP 13113.txt 2
CC-Omprakash Garg versus The State Of Madhya Pradesh - LNIND 2014 MP 14765.txt 3
Legal Case Documents V1
CC-ANSHITA GUPTA versus DR. AMIT AGARWAL - LNINDORD 2017 SC 8680.txt 5
CC-Dilip Pandey versus The State Of Madhya Pradesh - LNIND 2014 MP 1082.txt 3
CC-Endolabs Limited Thru. Director Shri Dheeraj Lulla versus State Bank Of India Thru.
Branch Manager And 2 Others - LNIND 2013 MP 1021.txt 2
CC-Gopal Nagda versus State Of M.P. And 2 Ors. - LNIND 2014 MP 13588.txt 2
CC-Kalabai versus Saleem And 2 Ors. - LNIND 2012 MP 178.txt 4
CC-Khubchand versus Emirates Bank InternationalPetitioner Advocate - LNIND 2014 MP
22991.txt 2
CC-Kshama GourThru.KshamaGour versus Madhya Pradesh PashchimKshetraVidhyutVitaran
Company Ltd. - LNIND 2014 MP 17484.txt 1
CC-Laxman versus The State Of Madhya Pradesh - LNIND 2014 MP 1176.txt 3
CC-Madhulika versus Rajesh - LNIND 2014 MP 13113.txt 2
CC-Omprakash Garg versus The State Of Madhya Pradesh - LNIND 2014 MP 14765.txt 3
3.7 Analysis and Retrieval of matched documents:
Figure-7\6: Analysis and retrieval of matched documents
The input string is preprocessed and clustered using K-means. The document is retrieved
from the database based on user query. The document displayed is based on exact match
of the user input. The model loops on the words of every sentence and either tries to use
the particular word of to predict its neighbors (its context), within which case the tactic is
named “Skip-Gram”.
ISSN No: 0130-7673
Page No: 123
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
4. Semantic Retrieval
Semantic retrieval is facilitated in the proposed work by means of building legal
ontologies using methontology and construction of legal ontology is supported by
classification techniques. The retrieval of grouping and matching of legal documents (LD)
are also visualized using C-Maps. The proposed methodology aim to build a semantic
model via legal ontology. To identify the crime incidents and to increase the accuracy of
retrieved crime incident documents using Classification techniques especially c-mapping.
Semantic modeling is a methodology of structuring raw facts so as to represent it in an
exceedingly specific logical method for discovery of hidden semantic structures in
documents. C-Mapping or conceptual diagram that depicts suggested relationships
between concepts and also it builds upon previous knowledge by connecting new
information back to it. Experimental analysis on building methontology based semantic
retrieval of crime incident documents shows the improved results. Methontology is one of
the methodology to build legal ontologies based on specific legal domain.
4.1 Advantages:
Retrieval of relevant crime documents based on personalization will be available.
Processing time is reduced.
Automatic extraction of topics from crime incident documents or from collections
of documents will be available.
Relations between the documents are created exclusively using c-mapping.
Methontology-one for building ontologies either from scratch, reusing different
ontologies as they’re, or by a method of re-engineering them.
4.2 Proposed Algorithm: Ontology based semantic retrieval using c-mapping
relations.
Input: features extraction from user profile
Variables: Document Class (DC), Individual(User) Class (IC), subject (S)
Step 1: For each subject S, documents are retrieved based on topic modeling. The
documents are classified as DC1, DC2, and DC3 using categorization based on
technical relevancy.
Step 2: Each feature of individual person profile is represented with a set of
numerical attributes (e.g. Data Analyst-9)
Step 3: Each of the training data consists of set of features and a class label
related to each vector as IC1, IC2 and IC3.
Step 4: Classification of user is done by comparing features of different k
adjacent points. (k=4)
Step 5: Users are classified as Categories IC1, IC2 and IC3 based on the options
extracted from the individual user profile.
Step 6: Based on the query, user search retrieves relevant and customized
documents for user. As an example, if UC1 class user searches for a subject cloud
computing the system retrieves documents within the customized order as DC1,
DC2 and DC3.
ISSN No: 0130-7673
Page No: 124
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
5.Graphical Analysis
ONTOLOGICAL GRAPH
Figure 7: Ontology Graph for relations in semantic retrieval
ISSN No: 0130-7673
Page No: 125
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
Table 3 : Attribute Table Entity for Gene Ontology
iri label entity entity labelquality quality label
DEMOTRAIT_0000002seed morphologyPO_0009010seed PATO_0000051morphology
DEMOTRAIT_0000003seed shapePO_0009010seed PATO_0000052shape
DEMOTRAIT_0000004seed size PO_0009010seed PATO_0000117size
DEMOTRAIT_0000005seed weightPO_0009010seed PATO_0000128weight
DEMOTRAIT_0000007morphologyPO_0009011plant structurePATO_0000051morphology
DEMOTRAIT_0000008shape PO_0009011plant structurePATO_0000052shape
DEMOTRAIT_0000009size PO_0009011plant structurePATO_0000117size
DEMOTRAIT_0000010weight PO_0009011plant structurePATO_0000128weight
DEMOTRAIT_0000012leaf morphologyPO_0009025vascular leafPATO_0000051morphology
DEMOTRAIT_0000013leaf shapePO_0009025vascular leafPATO_0000052shape
DEMOTRAIT_0000014leaf size PO_0009025vascular leafPATO_0000117size
DEMOTRAIT_0000015leaf weightPO_0009025vascular leafPATO_0000128weight
DEMOTRAIT_0000017flower morphologyPO_0009046flower PATO_0000051morphology
DEMOTRAIT_0000018flower shapePO_0009046flower PATO_0000052shape
DEMOTRAIT_0000019flower sizePO_0009046flower PATO_0000117size
DEMOTRAIT_0000020flower weightPO_0009046flower PATO_0000128weight
DEMOTRAIT_0000022perianth morphologyPO_0009058perianth PATO_0000051morphology
DEMOTRAIT_0000023perianth shapePO_0009058perianth PATO_0000052shape
DEMOTRAIT_0000024perianth sizePO_0009058perianth PATO_0000117size
DEMOTRAIT_0000025perianth weightPO_0009058perianth PATO_0000128weight
The above ontology relation figure and the table shows that the input is obtained
automatically from the relation graph. The assigned weights in the table 6 is based on the
relations in the ontology. The automatic assignment and relational support increases the
accuracy compared to the other methods in the semantic modelling.
Figure 8: Precision and Recall for five sets of 50 documents with LDA weightage and C-
Mapping based relational weight.
The recall and precision for old methodology of LDA and proposed relational
based LDA of semantic modelling is analyzed in the figure 8 & figure 9 respective results
are plotted. It shows that the combined method shows significant accuracy compared to
the LDA weight method alone.
ISSN No: 0130-7673
Page No: 126
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
Figure 9: Precision and Recall for five sets of 50 documents with LDA weightage only
6. Conclusion and future work Latent Dirchlet Allocation model used to increase the accuracy of retrieved
relevant documents. The time consumption is reduced by using this models and efficient
number of relevant documents are retrieved. The system provides in-depth ontological
structure using methontology and C-mapping where providing accurate results for the
given crime incident scenario. There is a 10% increase in the recall and precision when
the figures are compared with the implementation of both the methodologies. Future work
can target data retrieval for the precise domain which will permit semantically accurate
retrieval of documents while not human intervention. The number of crime incident
document analysis can be maximized and various experiments can be conducted on legal
ontologies and c-mapping. Analytics can be applied over the obtained results to mine
useful patterns from the past crime incident searches.
References
[1]. Amit Kumar Dhar, Monika Rani, O.P. Vyas, “Semi-Automatic Terminology
Ontology Learning Based on Topic Modeling”, Elsevier-Engineering
Applications of Artificial Intelligence, Vol. 63, August 2017, pp 108-125 .
[2]. Nikhil Vohra, Pankajdeep Kaur, Pallavi Sharma, “An Ontology Based Text
Analytics on Social Media”, Elsevier-International Journal of Database Theory
and Application, Vol. 8, February 2015, pp 233-240
[3]. KaisongSong, ShiFeng, WeiGao ,DalingWang ,GeYu ,Kam-
FaiWong,”Personalized Sentiment Classification Based on Latent Individuality of
Microblog Users”, Engineering Applications of Artificial Intelligence, Vol. 7,
June 2017, pp 124-133 .
ISSN No: 0130-7673
Page No: 127
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
[4]. Mohammadreza, Shams, Ahmad, Baraani-Dastjerdi, “Enriched LDA(ELDA):
Combination of latent Dirichlet allocation with word co-occurrence analysis for
aspect extraction ”,Elsevier- Expert Systems With Applications : An International
Journal, Vol. 80 , February 2017, pp. 83-93.
[5]. Miha Pavlinek, Vili Podgorelec,”Text classification method based on selftraining
and LDA topic models”, Elsevier- Expert Systems with Applications: An
International Journal, Vol. 80, September 2017 pp. 83-93.
[6]. Nikhil Vohra, Pankajdeep Kaur, Pallavi Sharma, “An Ontology Based Text
Analytics on Social Media”, Elsevier-International Journal of Database Theory
and Application, Vol. 8, February 2015, pp 233-240.
[7]. Suh, S., Choo, J., Lee, J., & Reddy, C. K. , “L-EnsNMF: Boosted local topic
discovery via ensemble of nonnegative matrix factorization”, In Proceedings of
the IEEE 16th international conference on data mining,2016.
[8]. Xingwang Zhao,Jiye Liang, and Chuangyin Dang,”Clustering ensemble
selection for categorical data based on internal validity indices”, Pattern
Recognition vol.69,pp.150-168,2017.
[9]. Tu Ding,Chen Ling,Lv Mingqi,Shi Hongyu, and Chen Gencai,”Hierarchical
online NMF for detecting and tracking topic hierarchies in a text
stream”,Pattern Recognition vol.76,pp.203-214,2018.
[10]. Damir Korenčić,Strahil Ristov,Jan and Šnajder, ”Document-based topic
coherence measures for news media text”,Expert Systems with Applications", vol.114,pp.357-373,2018.
[11]. Yong Chen,Hui Zhang,Rui Liu,Zhiwen Ye,Jianying Lin,”Experimental
explorations on short text topic mining between LDA and NMF based Schemes.Knowledge-Based Systems”, vol.163, pp.1-13, 2019.
[12]. Yan Liang,Ying Liu,Chong Chen and Zhigang Jiang,”Extracting
topicsensitive content from textual documents—A hybrid topic model
approach”,Engineering Applications of Artificial Intelligence vol.70,pp.81-91,2018.
[13]. Yueshen Xu,Jianwei Yin,Jianbin Huang and Yuyu Yin,”Hierarchical topic
modeling with automatic knowledge mining”,Expert Systems with Applications vol.103,pp.106-117,2018.
[14]. R., P., Tamilselvan, L. and N., R. (2019), "Semantic tracking and
recommendation using fourfold similarity measure from large scale data using
hadoop distributed framework in cloud", International Journal of Intelligent
Unmanned Systems, Vol. 7, No. 4, pp. 189-208.
ISSN No: 0130-7673
Page No: 128
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020
ISSN No: 0130-7673
Page No: 129
NOVYI MIR Research Journal
Volume 5, Issue 4, 2020