comprehensive study of semantic annotation: variant and …

Comprehensive Study of Semantic Annotation: Variantand PraxisSumit Sharma, Sarika Jain

Department of Computer Applications, National Institute of Technology, Kurukshetra, Haryana, India

AbstractThe proliferation of web content on the Internet has increased the demand for efficient information retrieval independent ofcontent. The concept of the semantic web has revolutionized the way of searching, analyzing, and storage. Besides, semanticannotations provide esteemed solutions to enrich target information. There is a large amount of research available in thearea of semantic annotations, which highlights the significance of annotation (such as sharing, integration, creation, andreuse, so forth) in various domains using annotation tools, be that as it may, none of these tools gives the earlier practiceof the annotation research questions. Besides, no unified system exists that combines all the different kinds of annotations.This work presents a way to address the research questions given in the paper. We have combined isoforms of various typesof annotations which have not been done to our knowledge till now. Furthermore, we have highlighted some prominentsemantic annotation tools with their real-life applications, which depend on the type of annotation we classify.

KeywordsSemantic Annotation, Challenges, Applications, Ontology

1. IntroductionInformation sharing and searching is a more usefultask for Internet users but they are facing difficultiesdue to different representations of different data sources.Semantic annotation modeling can fill this gap of vari-ous knowledge representations. It establishes the rela-tionship between the data entities and joins the termor mentions to entities. The objective of the seman-tic annotation measure is to survey what parts of thereport compare the ideas portrayed in the ontology,and along these lines, the outcome is a bunch of map-pings between record sections and ontology conceptsas defined in [1]. Natural language technologies areone of the emerging trends of their use for the sciencesand humanities. Experts are facing problems such asthe explosion of information due to the continuous in-crease in the production of scientific content on theweb, which makes it difficult to observe the state of theart in a given domain [2]. Semantic annotation appli-cations have been used in different domains in differ-ent ways, but all of these have a common goal. Authorshave applied the semantic annotation for the Arabicweb document by deep learning methods [3]. Anno-tations can also contribute to manage natural historycollections using semantic annotation [4]. Authors ap-

ACI’21: Workshop on Advances in Computational Intelligence at ISIC2021, February 25-27, 2021, Delhi, India" [email protected] (S. Sharma); [email protected] (S.Jain)~ https://sites.google.com/view/nitkkrsarikajain/home (S. Jain)� 0000-0001-5054-8670 (S. Sharma); 0000-0002-7432-8506 (S. Jain)

© 2021 Copyright for this paper by its authors. Use permitted under CreativeCommons License Attribution 4.0 International (CC BY 4.0).

CEURWorkshopProceedings

http://ceur-ws.orgISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)

plied semantic annotation on digital music to improvethe trend of searching music [5]. Thus, annotation canbe termed as to reduce the mental effort when a docu-ment is read for the purpose of research and analysis.Therefore, the process of embedding additional infor-mation to the already available information helps tointerpret the information, remembering things, trace-ability, machine understanding capability, and manymore.

Another assumption about semantic annotation isto use a machine to understand the relationship be-tween the URI and the network of data. If the textis semantically marked, then it becomes a source oflearning which is easy to understand, consolidate andreuse by machines. Semantic annotation helps ma-chines to use data on the web to self-interpret, com-bine results, and manage digital information from in-formation available on the internet. Such informationcan be generated by interpreting sources from meta-data that can result in "annotations" about all resources.In this paper, we shall examine semantic annotationby defining the annotation and metadata, and then weshall discuss various aspects of semantic annotationapproaches and review the current generation of se-mantic annotation systems.

Here in this paper, we are preparing and address-ing some research questions, which are benignant andsignificant for the research development of meaningsand annotations. We have described isoforms of vari-ous kinds of annotations with a formal description ofthe semantic annotation to a nexus between researchquestions. We are also going to explain essential as-

mailto:[email protected]

mailto:[email protected]

https://sites.google.com/view/nitkkrsarikajain/home

https://orcid.org/0000-0001-5054-8670

https://orcid.org/0000-0002-7432-8506

https://creativecommons.org/licenses/by/4.0

http://ceur-ws.org

http://ceur-ws.org

pects of semantic annotations that are being used fordiversity of semantic annotations related to differentdomains. Furthermore, we have highlighted some exi-gent semantic annotation tools alongside their real-lifeapplications, which depend on the type of annotationwe classify.

2. Research QuestionsHere in this section, we provide a brief study on se-mantic annotation to elaborate major research ques-tions "what, where, why, and how" to use the semanticannotation. "What?", describe the definition of anno-tation, "Where?", examine where to apply, "Why?", de-fine the importance of annotation and "How?", definevarious ways to represent annotation.

2.1. What? (Definition)According to Oxford Dictionary Online, the sound “an-notation” is defined as “a note by way of explanationor comment added to a text or diagram”[6]. Semanticannotation contributes to mark-up the existing textsto justify their senses so that a machine can automat-ically identify and process information, thus makingthem more valuable. In literature, the definitions asemployed by different authors for semantic annota-tions were quite different. In any case, Semantic Webachievement relies upon the accomplishment of an ex-traordinary number of clients semantic substance. Thisaccomplishment requires apparatuses that decrease themultifaceted nature of semantic innovations. Seman-tic annotation is the fitting procedure for searching aword, sentence, and paragraph semantically in the Se-mantic Web. Annotations are also used to transformsyntactic structures into knowledge structures.

All the more succinctly, annotation or tagging is aprocess that allows to draft a section, statement, com-ment, or attributes to a document or segment in a re-port. When all is done, the annotation can be viewedas additional data related to a specific point in onerecord or another snippet of data [7]. The authors [8]present an overall meaning of annotation as includ-ing some other bit of information and further expand-ing the definition of annotation in various domains.Generally, domain annotations are typically labelinga concept (record, part of an archive, or word) legit-imately to perceive the essential concept or principlethought in the information. The tagging helps usersto recognize or classify a document based on the con-cepts required and also helps to target the outcome ofthe document [7].

Semantic annotations represent transitional formu-lation of connections between unstructured documents,semi-structured documents, and ontologies in both di-rections [9]. Embedding metadata with the documentsto assign semantics on the web assets is a semantic an-notation by innovative judgment [10]. All the abovedefinitions provided by various authors have one thingin common: linking resources with domain ontology.

2.2. Why? (Purpose)This research question is the most important researchquestion to solve the significance of the developmentof annotation. The textual data’s growing phenomenonrequires Natural language processing and text min-ing procedures to arrange and recognize patterns andknowledge from the texts. The need for semantic an-notation is becoming important because the informa-tion is represented as a knowledge graph [1]. Data isregularly traded in an electronic arrangement (like pa-pers, letters, note amalgamation, mail, data set, report,laws, proposals, articles, and declarations). The pur-pose of semantic annotation encourages the semanticweb-enabled machines to self-interpret, consolidate theresults, and practice it on the web. We can create suchinformation by annotating sources using metadata, out-coming in "annotations" concerning that source.

Probing, searching, mining, and classifying are grow-ing significant and challenging jobs with extensive mas-sive data. This job grows even more complicated ifthe data explode, and the data are undefined. It is notstraightforward to manually read all the documentsand find a particular concept (person, event, place, soon) in the full document. Annotation provides a sig-nificant role in the search for any key idea in the docu-ments. It is challenging and essential to discover all thekey concepts and relationships in the documents dur-ing annotation. Exploring the relationship betweendata concepts and rendering it in a new form is againthe discovery subject. Ontology is the right way todefine the relationship between data concepts, and si-multaneously, it provides advantages to data in theform of machine understanding.

2.3. Where? (Place)In the last few decades, the experimental form of an-notation has grown a lot. We have found the usabil-ity of annotation in various fields. There are extraor-dinary implications and uses in various areas of an-notation. Programming languages use annotation onclass, method, parameters, or variables for their clar-ity and definition. On the other hand, mechanical en-

gineering uses annotations to understand the specificmeanings of text or symbols. Before the utilization ofannotation, it is valuable to think about the scope ofannotation that exists so that anyone can pick the cor-rect type for their use case. Annotations incorporate abroad scope of data types on which it tends to be ap-plied and reuse. Some essential ranges of annotationinclude text, image, audio, video, graphics [2]. Someannotation tools have evolved to show the use casesof annotations that provide a lightweight frameworkto annotate textual data [11]. The authors [12, 13, 14]applied semantic annotation on image data to improvethe searching. Likewise,[15] provided a methodologyto add an annotation to XML schemas same as[5] ap-ply annotation on digital music. Semantic annotationsare useful for digital document classification (newspa-pers, blogs, media content filtering). It is possible tosearch for a particular concept (named entity recog-nition) in large amounts of data. Annotation has anessential role in the biomedical field to identify essen-tial terms used in medicine [16]. Currently, IoT sen-sor data are being stored by meaningful annotation toclearly express the powerful potential and impact ofthe data [17, 18].

2.4. How? (Implement)This is the most important research question that playsan imperative role in the success of annotations. Also,the applicability of the semantic annotation dependson the nature of the data type. It can be text, image, au-dio, video. For the text annotation, it could be Seman-tic Annotation, Intent Annotation, and Named EntityAnnotation. Finding the essential concept in the textis the main work for a text document. Image annota-tion is essential for an extensive scope of utilizations,including PC vision, automated vision, facial acknowl-edgment, and arrangements that depend on AI to de-cipher pictures. To prepare these arrangements, meta-data should be doled out to the pictures as identifiers,inscriptions, or catchphrases.

In the last few decades, several techniques were de-veloped for semantic annotation. The part of speech(POS) annotations depends on the specific design andmodel demanded. One may be interested in a limitedPOS annotation scheme if one wishes to do text min-ing or text processing. Semantic comment stages offerhelp for data extraction advancement, knowledgebaseand ontology executives, warehouse, access APIs (e.g.,RDF repositories), and UIs for knowledgebase editorsand ontology [19]. The semantic annotation is like-wise helpful for a legitimate grouping of e-reports, on-line news, web journals, messages, and computerized

Figure 1: Various aspects of Semantic Annotation

libraries that need text mining[20], AI, and natural lan-guage handling techniques to get meaningful informa-tion. As per our knowledge, the most popular machineunderstandable format nowadays is RDF (W3C, Re-source Description Framework (RDF) http://www.w3.org/RDF/.Last accessed January 25, 2021.).

3. Preliminaries for SemanticAnnotation

In our studies, various aspects of semantic annotationsare shown in Figure 1, which completes the surveyof semantic annotations. This section is important toknow the structure of annotation, here we shall be-gin with the basics to describe the complete method ofpracticing annotation. Then, based on the structure ofdata types in which semantic annotations addressingthe research question and then provide a formal defi-nition of semantic annotations to serve the purpose ofannotations.

3.1. Semantic AnnotationAnnotation is the process of allocating some labels tothe data for data interpretation and automatic descrip-tion. Semantic annotation is the annotation in whichsome necessary additional information is added to atext document to reflect the relationship between on-tology class concepts or instances and text documententities. This brief description of the object definedconsists of the main body of the paper. It describessemantic for a document (such as label, title, author,date of publication, etc.). Therefore, semantic annota-tion collects semantic information from intuitive andmore essential records so that target information canbe easily searched and classified by the machine.

The annotation output of a document can be in dif-ferent forms and depends upon the tools or methodsthat produce annotation. The goal of the annotationproject may differ according to the design and require-ment of the project model. Figure 2 shows an example

Figure 2: Example of semantic annotation

annotation of email text data annotated by ontology.

3.2. Types of dataIn the present scenario, annotation is one of the mostchallenging tasks as data on the web is not uniform(different structures). Semantic annotations can be ap-plied keeping in mind the nature of the data. There-fore, it is essential to provide a unique description ofthe data to make the data different and to avoid anno-tation problems. Motivated from this, in this section,we will throw light on various types of data (on theinternet) which is significant to semantic annotation.There are three kinds of data namely; structured, un-structured, and semi-structured, which are explainedbelow.

3.2.1. Structured Data

A well-Organized form of data is known as structureddata, which is easy to explore and generally arrangedin rows and columns (e.g., excel spreadsheet). In thestructured data, a portion of the information alwaysperiodically outlines into fixed predefined attributes,which is occurred in form of the columns. For instance,Table 1 shows the structured form of the transactionlog in which the excel spreadsheet, database designerdesigns a data model that is followed to store the struc-ture data. This is the best example of structured data.This data model saves all records into a table. Theserecords are collected with the help of a relationshipthat exists between the entities of data. Structureddata utilize the storage space and make informationretrieval as easy as possible.

SQL, MySQL, and SPARQL are the query languagesused to retrieve, manipulating, and storing the struc-ture data. These query language groups the database

Table 1Example of structured data

S.N Account No Name Transaction Logs

1. XXX445050 Ram 4393949 5:5:192. PPP304039 Laxman 2932734 3:4:19

Figure 3: Example of unstructured email message data

on the relationship defined in the data model. Struc-tured data can be handled by humans as well as bymachine. However, human has less role in the annota-tion and structured data are easy to annotate by somepredefined rule [21]. These rules are created based onthe relationship between the entities.

3.2.2. Unstructured Data

Several authors have worked on the other form of un-structured data like (images, audio, video, news, socialmedia data, blogs, open-ended survey, web content,transcripts, etc.). Various AI and Machine learning-based algorithms have been applied to recognize thecontent and then annotate them accordingly. It alsoprovides a hidden association between the entities withthe help of links. The wide range of data on the In-ternet is unstructured data. Generally, heterogeneousdata cannot be stored as a row and column and doesnot have an associated data model. The general ex-amples of unstructured data are web email, blogs, andHTML pages. Since there is no underlying relationshipbetween the data concepts, therefore, finding, analyz-ing, accessing, and managing a piece of informationin this kind of data is more complicated. According tosome machine learning algorithms [22, 23, 24], theseprocesses are erroneous and time-consuming tasks. Fig-ure 3 shows an example of unstructured data.

3.2.3. Semi-structured Data

Semi-structured data is another variety of data thatmix the structured and unstructured data. It has re-

Figure 4: Example of semi-structured data on web searchof paper

markable properties to organized information but doesnot relate to the fixed structure of the data model. Webforums, web pages, and email messages are the popu-lar examples of semi-structured data in which, the ac-tual content is unstructured, and this form of data alsocontain some structured information such as name andtitle, log information, time, etc.

Figure 4 shows an example of semi-structured dataabout a web page. That also contains some structuredinformation about the web page like title, journal name,journal log, etc. This semi-structured data providesa little help to the designer to build the data model.These small pieces of information involve extractingdata from the unstructured repositories.

3.3. Level of Automation of AnnotationSuccessful use of the Semantic Web requires far reach-ing accessibility of semantic annotations for existingand new records on the Web. The level of automationshows how we can get the right data and how to useit correctly. It defines the automaticity of the machinefrom manual to automatic. The level of automationin any systems can be assessed, measured as manual,automatic, and semi automatic described in [7, 9, 25]with their framework and requirements.

3.3.1. Manual Annotation:

Manual annotation is a process of reading an inputdocument and extracting a piece of new informationwith human participation. Manual annotation is also

Figure 5: Manual annotation of document with semanticdata

a type of formal annotation with human computer in-teraction. It tracks many NLP tasks and has lots ofactivities [26] such as writing comprehensive anno-tation guidelines and defining an annotation schema,etc. Manual annotation is even more conveniently de-veloped today, utilizing writing tools, such as Seman-tic Word [27], which give an incorporated atmosphereto authoring and annotating text. Notwithstanding,human annotators’ utilization is as often as possibledue to components, for example, annotator knowledgeof the domain, a measured amount of training, per-sonal inspiration, and complex patterns. Manual an-notation cannot be applied to a massive portion of data.The semantic annotation of archives concerning an on-tology and an entity knowledge base is examined in[15]. Even though introducing intriguing and yearn-ing draws near, these do not talk about the utiliza-tion of robotic strategies. The center is the manualsemantic annotation for the enrichment of web con-tent, while few cutting-edge manual annotation ap-proaches are examined regarding difficulties of sup-porting multiple formats (HTML toward PDF, XML,images (e.g., PNG, JPEG), and video. For a depictionof some more established tools or frameworks, pleaseallude [28]. The authors also provide a classification ofsemantic annotation system detailed analysis of end-user tools, pros, and their cons.

The manual annotation tools allow humans to addsome description of text to web contents or the othersources of data. However manual annotation has be-come very complicated because of its usability and fea-ture [29]. Protégé [29], SMOR[30]E, and OntoMat [31].The author[26], have provided the list of annotationtools based on the detailed evaluation of annotationfeature Besides this, manual annotation is time con-

suming and often full of errors. As shown in the Fig-ure 5, it requires expert knowledge for being domain-specific. For manual annotation, a large volume oftraining is needed. Due to the complex schemas, it isalso not easy to handle large-scale data, and there is noreuse of output data. Human annotation is too costlyand time consuming and cannot be applied to controlthe massive amount of records available on the Web.Manual annotation requires qualified annotators, thishas been explained with the help of an example insection 4, and first, an annotator would map the text“Ram” to domain ontology and recognize it as a Personand further would recognize the company, where Ramis working. Based on tagging of the data, manual an-notation is further categorized as formal and descrip-tive annotations.

• Formal AnnotationFormal annotation is the simplest and fastest wayto annotate documents by the human. In the for-mal annotation, some scripts are added to therecord such as (title, author, publishing date, etc.).To do such a task, experts do not require detailedknowledge about the domain, only conceptualunderstanding is needed.

• Descriptive AnnotationA descriptive annotation or summative annota-tion can describe the main goal of the work. De-scriptive annotation provides a summary as wellas a complete citation of the job without eval-uating the quality of work. Descriptive anno-tations include an overall description of objectsthat may be enough for the machine to under-stand the full semantics of the material and pro-cess the information. For example, it means toconvey a book, hypothesis, methodology, arti-cle, conclusion, or any other source.

3.3.2. Semi-automatic annotation

In a semi-automatic semantic annotation, the frame-work creates an annotation and these few are thenpost-edited and amended by human annotators [32].Many manual annotation tools transferred to the semi-automatic framework by providing manual training.Researches on semantic annotation methods investi-gate the benefits of a state-of-the-art tools for semi-automatic to help the semantic annotation of a largeset of biomedical queries [16]. There are numeroussemi automatic semantic frameworks, MnM [33]. Un-like manual and automatic ones, don’t consolidate pro-grammed into the semantic investigation, however, ei-ther use them as an extension between models ele-

Figure 6: Semi-automatic annotation process of documentswith semantic data

ments and annotation. Semi-automatic annotation re-quires a mixed structure in the annotation model thathas increased the structure complexity [25]. This kindof annotation model is fit for supporting labels or tagsthat are not related to a specific property but on theother hand are portrayed to depict a particular connec-tion among metadata assets for navigation purposesseen at [9].

The semi-automatic annotation is shown in Figure 6,in which both human and machine become the anno-tators. Semi-automatic is fast and robust to find the se-mantic relationship between the annotating data andthe targeted annotated document. Human enrollmentprovides a significant advantage to semi-automatic an-notation to adopt the new feature and new domain.Morphological analysis, part-of-speech tagging, retrievalof domain-specific information, and recognition of nameentities are the significant component of semi-automaticannotation.

3.3.3. Automatic Annotation

Automatic annotation is a high level of semantic an-notation. Systems falling into this category are highlytrained and have high accuracy. To train this type ofsystem, a large amount of quality data and rule sets arerequired. To deal with these issues, unsupervised sys-tems tried the many methodologies and experiment tolearn how to annotate data without human oversight,but precision is as yet restricted. The automatic mean-ing of lexical data allows both annotations to add im-portant information to the production search and in-dex the document [16]. Article [26, 24] proposed a sci-entific classification for information extraction toolsdependent on the principle strategy adopted on a largerscale by the community. Some other techniques usemachine learning methods [22] to automate the se-mantic annotation using some training data.

Automatic semantic annotation is controlled by amachine, so this annotation is efficient and is fast as

Figure 7: Automatic annotation process of documents withsemantic data

compared to manual annotation. The main key fea-ture of an automatic semantic annotation is that it canhandle massive data, which is the limitation of man-ual annotation. In automatic annotation, absolute ruleor standard schema must be defined to work machinesefficiently. Based on fascinating predefined standards,the automatic annotation performs the task. Automaticannotation is useful for dynamic web content that maybe transient. Automatic annotation entirely dependsupon the training module and failed to adopt new ter-minology. However, the complete automatic semanticannotation for global data still is an unsolved prob-lem. Hence, semi-automatic annotation methods arebeing used widely in current scenarios. The compo-nent of the automatic semantic annotation is shownin Figure 7, in which no interaction of humans at therunning state.

3.4. Degree of Semantic AnnotationWith the development of semantic annotation in themost recent couple of years, the semantic annotationscan be applied in various spaces to extend convenience.In the absence of the structure of web data, automaticdiscovery of targeted or unexpected knowledge actu-ally develops various research issues outlined in [22].Heterogeneous data could be text, picture, sound, video,illustrations. The authors in [2] applied semantic an-notation textual objects and provide the practical im-pact of semantic annotation on the search. And in [12]applied semantic annotation on image objects to im-prove the searching and indexing. On the other hand,[15] gave a procedure to add an explanation to XMLcompositions. To the best of our knowledge, no suchannotation technique exists that can be successfullyapplied to all content (text, image, audio, video) simul-taneously. To keep in mind that diverse strategies areused for different content, we can be classifying theannotation as a degree of annotation to use the com-mon framework of the semantic web. Semantic anno-tators take input in a variety of forms, which is knownas the degree of semantic annotation. It makes the sys-

tem flexible. The degree of annotation defines the clas-sification of annotation based on the input structure ofdata as shown in Figure 1.

3.4.1. Text:

Most of the information on the web is in the form oftext. Data is extracted from the web directory througha user query. This user query is also written in form oftext only. The input query could be mapped on struc-tured, semi-structured, or unstructured data. Annota-tion of a text document is important to search, analyze,and classify the documents correctly.

3.4.2. Image:

Increasing digital capturing techniques have led to afantastic evaluation of images on the web. A text queryis used to access a huge amount of image data sources.To achieve this, a query is written which produces avisually similar description to the image. This featureof the image becomes a key to represent it. Several an-notation techniques have been used to make and de-scribe the main feature of the image. Some of the re-searchers have focused only on the feature extractionmethod and have developed an image semantic anno-tation method based on an image concept distributionmodel.

3.4.3. Audio:

Universal mobile interface makes digital cities portablewith audio Earth annotations. The best example iscities carriable with audio semantic annotation and thataimed to provide a comprehensive mobile interface tothe mobile user on demand.

3.4.4. Video:

Video annotations are equally important as image andaudio on the web. Video lectures, social media con-tent, news video, sports, etc. are the data that is moni-tored by semantic annotation. In the semantic contextof the examined domain, the concept, instances, withtheir visual descriptors, enrich the video semantic an-notation.

3.4.5. Hybrid:

Multimedia content base semantic annotation is morechallenging and based on high-level ontologies. Theseapproaches are on demand.

4. Approaches for SemanticAnnotation

Several approaches have been proposed to explain theneed for semantic annotations in user information andvast knowledge spaces. Some of the strategies focusedon semantic tagging (i.e., title, author, explanations,etc.) in the document to annotate. It has reduced largevolume of search that need to find supplementary in-formation in external sources [34, 35]. It categorizesthe annotation tools according to the media content,which can be annotated by annotators (for example,text, audio, video, images, etc.). Furthermore, in view-point to know how to achieve semantic annotation,there are various approaches and techniques used toachieve annotation. [22] investigate the machine learn-ing approach to automate the annotation process. Au-tomatic semantic annotation is more effectively fin-ished nowadays, utilizing machine learning techniques.We can further categorize the semantic annotation intovarious automatic approaches, including Supervisedmachine learning based method, Unsupervised machinelearning based method, Rule based methods, and On-tology based Machine Learning. Supervised approachis completed in two stages, training and annotation. Inthe training provide the plain text with some labeledand in the annotation, the machine has to recognizedentity and semantic relation based on the training la-beled data. [12] apply supervised machine learningtechniques to annotate image data. In an unsupervisedapproach, make an annotation with unlabeled data.For instance, [12] proposed a strategy for automati-cally summing up the extraction designs from the web-site pages. The ontological annotation approach uti-lizes other information sources like Wikipedia, Vocab-ulary, thesaurus ontology, etc. Rule-based semanticannotation is based on some pre-defined rules. Rule-based algorithms for semantic annotation, various ex-traction frameworks have been created based on thestrategy, for instance: Crystal [36], AutoSlog [37], MnM[33], Rapier [38], SRV [39], Whisk [40], Stalker [41],and BWI[42]. The rule-based approach [43] is onlyapplicable if the streaming pattern is well known. It isdifficult to apply to the heterogeneous unknown struc-ture.

4.1. Machine Learning MethodsThe dynamic environment and a wide range of domaininfluence the system to perform automatic annotation.The automatic annotation process is one of the criticaland challenging tasks for a semantic annotation sys-

tem.

4.1.1. Supervised Machine Learning

In a supervised learning method, an expert assigns thekey to annotating data. To deal with annotating data,several supervised machine learning models such asSVM, Hidden Markov Model (HMM), Markov RandomField Model to be implemented to optimize labelingcosts. In this approach, firstly a pair of entities aremapped with the web as a corpus, then it finds a bi-nary relationship between the entities and if a relationis found, then it labels as a favorable otherwise markedas unfavorable.

Furthermore, some authors have extended an exist-ing approach with the help of the SVM machine learn-ing technique but the main drawback of this method isthat it cannot handle the multiple instances of learn-ing and during process, many bugs are found. Otherchallenges in semi-supervised and unsupervised tech-niques to retrieve relation between the entities are dis-cussed in [44].• Limitations of supervised machine learning ap-proaches

• Large Training Corpus: The efficient machinelearning model requires significant expert anno-tated corpus for training purpose and which arevery expensive to develop.

• Limited Entities Extraction: This machine learn-ing models have only identified entities on whichmodels were trained. Other remaining categoriesof entities which are not recognized generate afalse result, which affects the accuracy of themodel.

• Lack of entity relation: Due to large data cor-pus, it only explores the surface of the graph forevery instance of knowledgebase.

Supervised machine learning methods are expensiveand require a lot of effort. So, most of the researchhas moved towards unsupervised or semi-supervisedmachine learning methods. These methods have beendiscussed in the next section.

4.1.2. Unsupervised Machine Learning

Unsupervised machine learning is the process of au-tomatically identifying possible relationships betweenobjects of massive text corpora. Unsupervised machinelearning methods do not require manually labeled data.Pairing deep learning with unsupervised learning crosses

the boundaries of supervised learning. This machinelearning method clusters similar entities concepts. Theseclusters are commonly used to describe relationshipsof sets that occur in such a way that the elements ofsets refer to the same group. Researches examine someclustering techniques with some of the novel approachesdiscussed in [45]. They have created a simplified andgeneralized grammatical clause representation that uti-lizes information-based clustering and inter-sentencedependencies to extract high-level semantic relations.[46] discovered and enhanced concept specific rela-tions other than global connections by web mining.

• Limitations of unsupervised machine learn-ing technique

• Due to automatic nature, sometimes it generatesunnecessary clusters that were not an area of in-terest.

• The output is less accurate because one inputdata is not known, and the data expert does notlabel dynamically.

• It does not extract the hidden relationship be-tween the entities and does not provide the linkto relation.

4.1.3. Deep Learning Method

Due to large interlinked datasets on the internet, ma-chine learning aims to provide a method that processesdata automatically. The idea could be achieved in thepresent text using deep learning semantic annotationbased on public and common ontologies. Due to thegradual growth and the large size of the resources, thereis a need to have an active and quick semantic an-notation of resources. For example, Neural Network,CBOW, and Skip-gram have become the state-of-the-art for generating word embedding. The authors [21]have presented a deep learning and rule-based learn-ing technique for the Arabic language which involvesdiscovering a document and used to enhance the se-mantic indexing.

4.2. Rule-Based Annotation MethodsRule-based annotation is the simplest and most straight-forward approach, which depends upon a predefinedrule created by one or more experts. The rule baseannotation can be applied only when either the data isfully known or have some specific notation. For exam-ple, the rule base annotation is perfect for structureddatasets such as RDBMS data. Rule base annotations

cannot be applied to other types of data or unstruc-tured data. In this type of annotations, experts writesome rules with the help of logical arguments, so thatthe relationship can be extracted by carefully observ-ing the correct logic. The rules follow some specificIF-THEN-ELSE formats that elicit information from ahigh-level reference using a low-level reference. Ac-cording to our survey of the literature, rules have beenapplied when it combines ontological reasoning [21].Author [47] have provided a minimal rule engine, MiRE,for a context-aware mobile device. The rule is signifi-cant and can be applied in various tasks like event de-tection, IoT data representation.

• Limitations of rule based approach

• It is applicable only to recognize regular pattern.

• Dynamic changes cannot be easily handled bythis approach.

• Need expert to generate a rule with complete do-main knowledge.

• Need large and complex rule to deal with un-known vast data set.

4.3. Ontology-Based MethodsOntology-based, dictionary-based, or knowledge-basedsemantic annotation is the most robust annotation ap-proach to represent a relationship between data ob-jects. Ontology-based semantic annotations can be ap-plied with any automation category (manual, semi-automatic and automatic annotations). As we havediscussed in Section 5, this annotation approach in-troduces the process of generating metadata using on-tology as their knowledge base. The ontology-basedapproach relies entirely on description logic, which re-lates to a family of logic-based knowledge representa-tions of formalism. All ontological reasoning approacheshave been supported by two general illustrations ofsemantic web languages. i.e., RDF (S) [48] and OWL[49, 50].

Several frameworks support manual annotation, forexample, Protégé-2000, CREAM , SMORE, Artequaktare the semantic annotation framework that supportsvarious semantic annotation task (like create an anno-tation, add a tag, validate, etc.). Knowledgebase toolshelp to manage and store complex information. Someannotation tools have been used to develop and main-tain the dictionary of the document. ERASMUS andSIBM (CISMeF), NCBO Annotator, are some conceptsMapper used to map the concept of a word to the in-stance of the dictionary.

Many semantic query languages (such as Triple, RQL,SPARQL, RDQL, etc.) and various reasoning engines(RACER, Pellet, and FACT, etc.) connect the semanticweb languages. Some techniques such as the SWRLrule provide popularity to ontological reasoning. On-tological modeling represents the knowledge in a hi-erarchical form and establishes the link between therelated entities.

• Limitations of Ontological approach

• Ontological modeling is domain specific.

• Expert knowledge is required to genereate a query.

• Ontology-based query engine required to retrieveinformation.

5. Semantic Annotation ToolsWe can arrange annotation tools in a two-dimensionalspace, Ontology Support Semantic Annotation toolsand Non-Ontology Support Semantic Annotation tools.Describing these tools based on the various aspects ofsemantic annotation.

5.1. Non-Ontology Support SemanticAnnotation Tools

We are highlighting the most frequently referred non-ontology-based tools found in the literature study ofcurrent semantic annotation. These tools annotate man-ually and some use different strategies to reduce the ef-fort of annotating. Some tools have the option to per-form annotation manually as well as automatically andsome have option both (semi-automatically). Some im-portant semantic annotation tools are shown in Ta-ble 2.

5.2. Ontology Support SemanticAnnotation Tools

Current semantic annotations, based on the literature,aim to support the development of inter language re-sources. Many researchers are working in this areaand several authors have contributed in multiple waysto make it successful. They have defined semantic an-notations in a different appearance but have the samesemantics. Some ontology-based semantic annotationtools and their aspects are shown in Table 3.

6. Advantages and Applicationsof Semantic Annotation

The advantages of annotation include searching, stor-ing, analyzing, and automation. In this section, weshall discuss the various benefits of semantic annota-tion and its real-life application.

6.1. Benefits of semantic annotationThe semantic annotation helps to formulate logic fora more profound understanding by the machine. Se-mantic annotation is encouraging the researcher to makeinferences and draw conclusions about web resources.Some of the benefits of semantic annotation are givenbelow.

6.1.1. Improves searching:

Searching the vast and distributed structure of the webrequires efficient search schemes. Searching becomesefficient when the available information is meaning-ful and contains meta-data to support the informationavailable on the internet. The semantic search will bedefined as a search that is based on semantics ratherthan just depending on text similarity[51]. Semanticannotations are also used to correlate significant tagsamong reports to perform a semantic search.

6.1.2. Better utilizes the available webresources:

Now a days, when almost everything is well defined,organized, and adequately classified on the Web, thenthe resources can be efficiently utilized. The informa-tion is available on the Web in various forms such asdocument, knowledge base and dictionary, etc. con-tains information in the form of text or image or bothcan be linked appropriately through annotation. Thesemantic annotations of web resources are connectedconcepts with meaningful representation in which theretrieved information could be utilized according touser interest instead of just a text matching.

6.1.3. Improves the decision making:

It has been found that when all the related and signif-icant data has been shared with the clients (throughsemantic look), at that point, the client is capable ofmaking a few choices and can perform it successfullysince he/she will be mindful of all the things. The se-mantic search will be helped by semantic annotation

Table 2Non-Ontology Support Semantic Annotation Tools

SA System Approach Description Application domain Automation

BroMo Unsupervised Using clustering forblogs and article se-mantic annotation

Proteins (biomedical) Semi-automatic

Sozekamm Supervised Annotate data using asupervised categoricalclustering algorithmLIMBO

Gemeral Semi-Automatic

Ontea Unsupervised Process email or textdocument find the pat-tern

Text and Email Manual/ Automatic

Doccano Unsupervised Open source text anno-tation tool for human

General Manual

Yawas Unsupervised Java based web-basedannotation system

General Manual

Briefing Associate Unsupervised Used for Microsoftpower point presenta-tion

MS Power point Manual

Zemanta Rule based Algorithm for naturallanguage and seman-tic processing is propri-etary

General Semi-Automatic

Thresher Unsupervised Aimed to Web pageswith similar content

Web page Automatic/Manual

RCSSAT Supervised Classify the using anew lexicon

General Manual

since the semantic search is concerned with the mean-ing of the substance accessible. The semantic annota-tor has given a semantic search with proper explana-tions to empower it to make an appropriate sense inthe document, picture, etc.

6.1.4. Unambiguous description ofabbreviations:

Many words/concepts have been expressed using thesame abbreviation. This leads to a critical problem ofambiguity. The use of annotation is an effective wayto troubleshoot this problem.

6.1.5. Automatically classifies the webresources:

If the resources available on the web are annotatedproperly, then the classification process will be unin-terrupted because all classification algorithms only askfor the references of annotated metadata to classify theresources. This makes semantic web search efficient asa process of classification.

6.2. Applications of semanticannotation

After specifying the structure model of semantic anno-tations, annotation creators can apply the annotationto serve their purpose such as (search, sharing, inte-gration, reuse, etc.). Here, in this section, several ap-plications of semantic annotation are listed with somereal-life applications.

6.2.1. Bibliographies:

Semantic annotation plays a vital role in the field ofbibliography annotation to describe the source. Thewhole information of the source is essential for theauthors while writing a paper. Bibliography seman-tic annotation helps in linguistic data to analyze andis used for any language data.

6.2.2. Extraction of open information:

Semantic annotation has been practiced in diverse fieldsof knowledge. For instance, It has an application in anews analysis for the naming of places, organizations,and people. it has application in biological systems for

the identification of biomedical entities such as genes,proteins, and their relationships.

6.2.3. Alignment of ontologies:

This is one of the important applications for the align-ment of ontologies for knowledge management. On-tology alignment is quite useful to differentiate the het-erogeneous models and it relates the difference to de-termine various interoperability concerns that synchro-nize in semantic image annotation and retrieval.

6.2.4. Semantic search:

Search engines can retrieve the required documentsmore accurately with the help of metadata informa-tion. Scientists and librarians put lots of efforts andtime to create metadata for the documents. However,to alleviate the hard labor, many attempts have beenmade towards generating the automatic metadata, basedon the techniques of information extraction.

6.2.5. Classification:

Semantic annotation helps to classify the data whichspeeds up the task and makes data secure. The infor-mation retrieval-based system on semantic annotationhelps to manage the data according to the search in-terest of the user.

7. ConclusionThe purpose of this paper is to distinguish integratedreview methodology from other review methods andto propose research questions for integrated reviewmethodology to increase the rigor of the process. Inthis paper, we have presented an extensive study ofimportant approaches used for semantic annotation ofa text document and a wide variety of approaches toexplore the prominent historical semantic annotationmodels applicable for text document annotation. Wehave also provided the various aspects of semantic an-notations based on which annotation can be classified.We have also highlighted the importance of semanticannotation in real-life practices.

The comparison of semantic annotation tools hasbeen done through a level of automation, degree of an-notation, and type of annotation. As a future scope, weare also trying to implement the semantic annotationmodels which will map with the global corpus and canbe applied to any domain. Finally, this approach alsohas been compared relatively with other ones availablein the literature.

References[1] F. Pech, A. Martinez, H. Estrada, Y. Hernandez,

Semantic annotation of unstructured documentsusing concepts similarity, Scientific Program-ming 2017 (2017).

[2] H. Agt, G. Bauhoff, R.-D. Kutsche, N. Milanovic,J. Widiker, Semantic annotation and conflictanalysis for information system integration, Pro-ceedings of the MDTPI at ECMFA 2010 (2010).

[3] S. Albukhitan, A. Alnazer, T. Helmy, Semanticannotation of arabic web documents using deeplearning, Procedia computer science 130 (2018)589–596.

[4] L. Stork, A. Weber, E. G. Miracle, F. Verbeek,A. Plaat, J. van den Herik, K. Wolstencroft, Se-mantic annotation of natural history collection,Journal of Web Semantics 59 (2019) 100462.

[5] F. Rahman, J. Siddiqi, Semantic annotation of dig-ital music, Journal of Computer and System Sci-ences 78 (2012) 1219–1231.

[6] C. Soanes, Oxford dictionary of English, OxfordUniversity Press, 2005.

[7] E. Oren, K. Möller, S. Scerri, S. Handschuh,M. Sintek, What are semantic annotations, Re-latório técnico. DERI Galway 9 (2006) 62.

[8] V. Batanović, D. Bojić, Using part-of-speech tagsas deep-syntax indicators in determining short-text semantic similarity, Computer Science andInformation Systems 12 (2015) 1–31.

[9] K. Bontcheva, H. Cunningham, Semantic annota-tions and retrieval: Manual, semiautomatic, andautomatic generation, in: Handbook of semanticweb technologies, 2011.

[10] W.-f. WANG, L. ZHAO, Research and applicationof ontology on semantic web [j], Journal of Za-ozhuang University 2 (2007).

[11] M. Kogalovskii, Semantic annotating of textdocuments: Basic concepts and taxonomic ap-proach, Automatic Documentation and Mathe-matical Linguistics 52 (2018) 134–141.

[12] G. Carneiro, A. B. Chan, P. J. Moreno, N. Vascon-celos, Supervised learning of semantic classesfor image annotation and retrieval, IEEE trans-actions on pattern analysis and machine intelli-gence 29 (2007) 394–410.

[13] S. Prasad, A. K. Lodhi, S. Jain, Helpi viz: A seman-tic image annotation and visualization platformfor visually impaired, in: International Confer-ence On Computational Vision and Bio InspiredComputing, Springer, 2018, pp. 881–888.

[14] S. Jain, S. Prasad, A. K. Lodhi, Semantic annota-tion of images with text and sound for visually

impaired, Journal of Open Source Developments5 (2018) 20–27.

[15] H. N. Talantikite, D. Aissani, N. Boudjlida, Se-mantic annotations for web services discoveryand composition, Computer Standards & Inter-faces 31 (2009) 1108–1117.

[16] A. Névéol, R. I. Doğan, Z. Lu, Semi-automaticsemantic annotation of pubmed queries: a studyon quality, efficiency, satisfaction, Journal ofbiomedical informatics 44 (2011) 310–318.

[17] S. Balakrishna, M. Thirumaran, V. K. Solanki, Iotsensor data integration in healthcare using se-mantics and machine learning approaches, in:A Handbook of Internet of Things in Biomedicaland Cyber Physical System, Springer, 2020, pp.275–300.

[18] S. Jabbar, F. Ullah, S. Khalid, M. Khan, K. Han, Se-mantic interoperability in heterogeneous iot in-frastructure for healthcare, Wireless Communi-cations and Mobile Computing 2017 (2017).

[19] B. Popov, A. Kiryakov, A. Kirilov, D. Manov,D. Ognyanoff, M. Goranov, Kim–semantic anno-tation platform, in: International Semantic WebConference, Springer, 2003, pp. 834–849.

[20] N. Kiyavitskaya, N. Zeni, L. Mich, J. R. Cordy,J. Mylopoulos, Text mining through semi au-tomatic semantic annotation, in: InternationalConference on Practical Aspects of KnowledgeManagement, Springer, 2006, pp. 143–154.

[21] C. Lhioui, A. Zouaghi, M. Zrigui, A rule-basedsemantic frame annotation of arabic speech turnsfor automatic dialogue analysis, Procedia Com-puter Science 117 (2017) 46–54.

[22] J. Tang, D. Zhang, L. Yao, Y. Li, Automatic seman-tic annotation using machine learning, in: Ma-chine Learning: Concepts, Methodologies, Toolsand Applications, IGI Global, 2012, pp. 535–578.

[23] H. Hassanzadeh, M. Keyvanpour, A machinelearning based analytical framework for seman-tic annotation requirements, arXiv preprintarXiv:1104.4950 (2011).

[24] S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, G.-J. Houben, Semantic annotation of data process-ing pipelines in scientific publications, in: Eu-ropean semantic web conference, Springer, 2017,pp. 321–336.

[25] V. Uren, P. Cimiano, J. Iria, S. Handschuh,M. Vargas-Vera, E. Motta, F. Ciravegna, Seman-tic annotation for knowledge management: Re-quirements and a survey of the state of the art,Journal of Web Semantics 4 (2006) 14–28.

[26] M. Neves, J. Ševa, An extensive review of toolsfor manual annotation of documents, Briefings

in bioinformatics 22 (2021) 146–163.[27] M. Tallis, Semantic word processing for con-

tent authors, in: Proceedings of the Knowl-edge Markup & Semantic Annotation Workshop,Florida, USA, 2003.

[28] P. Andrews, I. Zaihrayeu, J. Pane, A classificationof semantic annotation systems, Semantic Web 3(2012) 223–248.

[29] P. Ogren, Knowtator: a protégé plug-in for anno-tated corpus construction, in: Proceedings of theHuman Language Technology Conference of theNAACL, Companion Volume: Demonstrations,2006, pp. 273–275.

[30] A. Kalyanpur, J. Hendler, B. Parsia, J. Golbeck,SMORE-semantic markup, ontology, and RDFeditor, Technical Report, Maryland Univ CollegePark Dept of Computer Science, 2006.

[31] K. Petridis, D. Anastasopoulos, C. Saathoff,N. Timmermann, Y. Kompatsiaris, S. Staab, M-ontomat-annotizer: Image annotation linkingontologies and multimedia low-level features, in:International Conference on Knowledge-Basedand Intelligent Information and Engineering Sys-tems, Springer, 2006, pp. 633–640.

[32] M. Erdmann, A. Maedche, H.-P. Schnurr, S. Staab,From manual to semi-automatic semantic anno-tation: About ontology-based text annotationtools, in: Proceedings of the COLING-2000Workshop on Semantic Annotation and Intelli-gent Content, 2000, pp. 79–85.

[33] M. Vargas-Vera, E. Motta, J. Domingue, M. Lan-zoni, A. Stutt, F. Ciravegna, Mnm: Ontologydriven semi-automatic and automatic support forsemantic markup, in: International Conferenceon Knowledge Engineering and Knowledge Man-agement, Springer, 2002, pp. 379–391.

[34] R. Schroeter, J. Hunter, D. Kosovic, Filmed-collaborative video indexing, annotation, anddiscussion tools over broadband networks, in:10th International Multimedia Modelling Con-ference, 2004. Proceedings., IEEE, 2004, pp. 346–353.

[35] S. Jain, Understanding Semantics-Based DecisionSupport, CRC Press, 2020.

[36] S. Soderland, D. Fisher, J. Aseltine, W. Lehnert,Crystal: Inducing a conceptual dictionary, arXivpreprint cmp-lg/9505020 (1995).

[37] E. Riloff, et al., Automatically constructing adictionary for information extraction tasks, in:AAAI, volume 1, Citeseer, 1993, pp. 2–1.

[38] M. E. Cali, Relational learning techniques fornatural language information extraction, ReportAI98276 (1998).

[39] D. Freitag, Information extraction from html:Application of a general machine learning ap-proach, in: AAAI/IAAI, 1998, pp. 517–523.

[40] S. Soderland, Learning information extractionrules for semi-structured and free text, Machinelearning 34 (1999) 233–272.

[41] I. Muslea, S. Minton, C. Knoblock, Stalker: Learn-ing extraction rules for semistructured, web-based information sources, in: Proceedings ofAAAI-98 Workshop on AI and Information Inte-gration, AAAI Press, 1998, pp. 74–81.

[42] D. Freitag, N. Kushmerick, Boosted wrapper in-duction, AAAI/IAAI 583 (2000).

[43] S. Jain, S. Sharma, J. M. Natterbrede, M. Hamada,Rule-based actionable intelligence for disastersituation management, International Journal ofKnowledge and Systems Science (IJKSS) 11 (2020)17–32.

[44] B. Rosenfeld, R. Feldman, Using corpus statisticson entities to improve semi-supervised relationextraction from the web, in: Proceedings of the45th Annual Meeting of the Association of Com-putational Linguistics, 2007, pp. 600–607.

[45] S. Brody, Clustering clauses for high-level re-lation detection: An information-theoretic ap-proach, in: Proceedings of the 45th Annual Meet-ing of the Association of Computational Linguis-tics, 2007, pp. 448–455.

[46] R. Bunescu, R. Mooney, Learning to extractrelations from the web using minimal supervi-sion, in: Proceedings of the 45th Annual Meetingof the Association of Computational Linguistics,2007, pp. 576–583.

[47] C. Choi, I. Park, S. J. Hyun, D. Lee, D. H. Sim,Mire: A minimal rule engine for context-awaremobile devices, in: 2008 Third InternationalConference on Digital Information Management,IEEE, 2008, pp. 172–177.

[48] B. McBride, The resource description framework(rdf) and its vocabulary description languagerdfs, in: Handbook on ontologies, Springer, 2004,pp. 51–65.

[49] G. Antoniou, F. Van Harmelen, Web ontologylanguage: Owl, in: Handbook on ontologies,Springer, 2004, pp. 67–92.

[50] S. Jain, C. Gupta, A. Bhardwaj, Research di-rections under the parasol of ontology based se-mantic web structure, in: International Confer-ence on Soft Computing and Pattern Recogni-tion, Springer, 2016, pp. 644–655.

[51] N. Limbasiya, P. Agrawal, Semantic textual sim-ilarity and factorization machine model for re-trieval of question-answering, in: InternationalConference on Advances in Computing and DataSciences, Springer, 2019, pp. 195–206.

Table 3Ontology Support Semantic Annotation tools

SA System Approach Description Application domain Automation

AnnotEx Supervised Learning Annotating based onclassifying documentsby means of semanticsimilarities

General Manual

S-CREAM Supervised Learning Annotate Dynamicweb pages and trackthe activities usinghyperlink

Domain Dependent Semi-Automatic

NavEx Supervised learning Extends traditionalperformance-basedannotation

Service Oriented Envi-ronments

Automatic

Knowledge and Infor-mation Management(KIM)

Supervised Learning Keyword-based Inter-domain knowl-edgebase

Manual

Armadillo Supervised learning Gene annotation sys-tem, Pattern Discovery

Gene (Biomedical) Automatic

CREAM Rule-Based/wrappers Framework for highstructure web page

General Automatic / Manual

GoNTogle Unsupervised Annotation and searchfacilities based on tex-tual similarity

General Automatic

C-PANKOW Unsupervised Pattern based annota-tion task

General Automatic

BIMTag Unsupervised Semantic annotation ofonline BIM product re-sources

BIM product Automatic

OEAKM Unsupervised Built ontology enabledannotation KMS thatprovides clustering andreal-time discussion forcollaborative learning

General Semi-automatic

Melita Supervised Follow to two phasecycle (Turning andscheduling text) basedon the training andactive learning

General Manual / Automatic

OntoMat-Annotizer Unsupervised Web based annotationtool that is able tocreate owl instance,attribute and relation-ship

Image, MultimediaManual

Automatic

AeroDAML Rule based Scalable with diverseontologies

Webpage Semi-Automatically

PARMENIDES Unsupervised create a domain ontol-ogy using cluster

General Automatic

MnM Unsupervised Learns extraction rulesfrom training corpus

webpage Semi-Automatic

SemTag Rule based Performs structuralanalysis

General Automatic

comprehensive study of semantic annotation: variant and …

Documents