extracting domain models from natural-language ... · extracting domain models from...

11
Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand SnT Centre for Security, Reliability and Trust University of Luxembourg, Luxembourg {firstname.lastname}@uni.lu Frank Zimmer SES Techcom 9 rue Pierre Werner Betzdorf, Luxembourg [email protected] ABSTRACT Domain modeling is an important step in the transition from natural-language requirements to precise specifications. For large systems, building a domain model manually is a labo- rious task. Several approaches exist to assist engineers with this task, whereby candidate domain model elements are automatically extracted using Natural Language Processing (NLP). Despite the existing work on domain model extrac- tion, important facets remain under-explored: (1) there is limited empirical evidence about the usefulness of existing extraction rules (heuristics) when applied in industrial set- tings; (2) existing extraction rules do not adequately exploit the natural-language dependencies detected by modern NLP technologies; and (3) an important class of rules developed by the information retrieval community for information ex- traction remains unutilized for building domain models. Motivated by addressing the above limitations, we develop a domain model extractor by bringing together existing ex- traction rules in the software engineering literature, extend- ing these rules with complementary rules from the infor- mation retrieval literature, and proposing new rules to bet- ter exploit results obtained from modern NLP dependency parsers. We apply our model extractor to four industrial requirements documents, reporting on the frequency of dif- ferent extraction rules being applied. We conduct an expert study over one of these documents, investigating the accu- racy and overall effectiveness of our domain model extractor. Keywords Model Extraction; Natural-Language Requirements; Natu- ral Language Processing; Case Study Research. 1. INTRODUCTION Natural language (NL) is used prevalently for expressing systems and software requirements [25]. Building a domain model is an important step for transitioning from infor- mal requirements expressed in NL to precise and analyzable specifications [31]. By capturing in an explicit manner the key concepts of an application domain and the relations be- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MODELS ’16, October 02-07, 2016, Saint-Malo, France c 2016 ACM. ISBN 978-1-4503-4321-3/16/10. . . $15.00 DOI: http://dx.doi.org/10.1145/2976767.2976769 tween these concepts, a domain model serves both as an ef- fective tool for improving communication between the stake- holders of a proposed application, and further as a basis for detailed requirements and design elaboration [19, 27]. Depending on the development methodology being fol- lowed, requirements may be written in different formats, e.g., declarative “shall” statements, use case scenarios, user stories, and feature lists [25]. Certain restrictions, e.g., tem- plates [5, 32], may be enforced over NL requirements to miti- gate ambiguity and vagueness, and to make the requirements more amenable to analysis. In a similar vein, and based on the application context, the engineers may choose among several alternative notations for domain modeling. These notations include, among others, ontology languages such as OWL, entity-relationship (ER) diagrams, UML class di- agrams, and SysML block definition diagrams [3, 16]. Irrespective of the format in which the requirements are expressed and the notation used for domain modeling, the engineers need to make sure that the requirements and the domain model are properly aligned. To this end, it is benefi- cial to build the domain model before or in tandem with doc- umenting the requirements. Doing so, however, may not be possible due to time and resource constraints. Particularly, in many industry domains, e.g., aerospace which motivates our work in this paper, preparing the requirements presents a more immediate priority for the engineers. This is because the requirements are a direct prerequisite for the contractual aspects of development, e.g., tendering and commissioning. Consequently, the engineers may postpone domain modeling to later stages when the requirements have sufficiently sta- bilized and met the early contractual demands of a project. Another obstacle to building a domain model early on in a complex project is the large number of stakeholders that may be contributing to the requirements, and often the in- volvement of different companies with different practices. Building a domain model that is aligned with a given set of requirements necessitates that the engineers examine the requirements and ensure that all the concepts and relation- ships relevant to the requirements are included in the domain model. This is a laborious task for large applications, where the requirements may constitute tens or hundreds of pages of text. Automated assistance for domain model construction based on NL requirements is therefore important. This paper is concerned with developing an automated solution for extracting domain models from unrestricted NL requirements, focusing on the situation where one cannot make strong assumptions about either the requirements’ syn- tax and structure, or the process by which the requirements

Upload: others

Post on 23-Aug-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

Extracting Domain Models from Natural-LanguageRequirements: Approach and Industrial Evaluation

Chetan Arora, Mehrdad Sabetzadeh,Lionel Briand

SnT Centre for Security, Reliability and TrustUniversity of Luxembourg, Luxembourg

{firstname.lastname}@uni.lu

Frank ZimmerSES Techcom

9 rue Pierre WernerBetzdorf, Luxembourg

[email protected]

ABSTRACTDomain modeling is an important step in the transition fromnatural-language requirements to precise specifications. Forlarge systems, building a domain model manually is a labo-rious task. Several approaches exist to assist engineers withthis task, whereby candidate domain model elements areautomatically extracted using Natural Language Processing(NLP). Despite the existing work on domain model extrac-tion, important facets remain under-explored: (1) there islimited empirical evidence about the usefulness of existingextraction rules (heuristics) when applied in industrial set-tings; (2) existing extraction rules do not adequately exploitthe natural-language dependencies detected by modern NLPtechnologies; and (3) an important class of rules developedby the information retrieval community for information ex-traction remains unutilized for building domain models.

Motivated by addressing the above limitations, we developa domain model extractor by bringing together existing ex-traction rules in the software engineering literature, extend-ing these rules with complementary rules from the infor-mation retrieval literature, and proposing new rules to bet-ter exploit results obtained from modern NLP dependencyparsers. We apply our model extractor to four industrialrequirements documents, reporting on the frequency of dif-ferent extraction rules being applied. We conduct an expertstudy over one of these documents, investigating the accu-racy and overall effectiveness of our domain model extractor.

KeywordsModel Extraction; Natural-Language Requirements; Natu-ral Language Processing; Case Study Research.

1. INTRODUCTIONNatural language (NL) is used prevalently for expressing

systems and software requirements [25]. Building a domainmodel is an important step for transitioning from infor-mal requirements expressed in NL to precise and analyzablespecifications [31]. By capturing in an explicit manner thekey concepts of an application domain and the relations be-

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

MODELS ’16, October 02-07, 2016, Saint-Malo, Francec© 2016 ACM. ISBN 978-1-4503-4321-3/16/10. . . $15.00

DOI: http://dx.doi.org/10.1145/2976767.2976769

tween these concepts, a domain model serves both as an ef-fective tool for improving communication between the stake-holders of a proposed application, and further as a basis fordetailed requirements and design elaboration [19, 27].

Depending on the development methodology being fol-lowed, requirements may be written in different formats,e.g., declarative “shall” statements, use case scenarios, userstories, and feature lists [25]. Certain restrictions, e.g., tem-plates [5, 32], may be enforced over NL requirements to miti-gate ambiguity and vagueness, and to make the requirementsmore amenable to analysis. In a similar vein, and based onthe application context, the engineers may choose amongseveral alternative notations for domain modeling. Thesenotations include, among others, ontology languages suchas OWL, entity-relationship (ER) diagrams, UML class di-agrams, and SysML block definition diagrams [3, 16].

Irrespective of the format in which the requirements areexpressed and the notation used for domain modeling, theengineers need to make sure that the requirements and thedomain model are properly aligned. To this end, it is benefi-cial to build the domain model before or in tandem with doc-umenting the requirements. Doing so, however, may not bepossible due to time and resource constraints. Particularly,in many industry domains, e.g., aerospace which motivatesour work in this paper, preparing the requirements presentsa more immediate priority for the engineers. This is becausethe requirements are a direct prerequisite for the contractualaspects of development, e.g., tendering and commissioning.Consequently, the engineers may postpone domain modelingto later stages when the requirements have sufficiently sta-bilized and met the early contractual demands of a project.Another obstacle to building a domain model early on ina complex project is the large number of stakeholders thatmay be contributing to the requirements, and often the in-volvement of different companies with different practices.

Building a domain model that is aligned with a given setof requirements necessitates that the engineers examine therequirements and ensure that all the concepts and relation-ships relevant to the requirements are included in the domainmodel. This is a laborious task for large applications, wherethe requirements may constitute tens or hundreds of pages oftext. Automated assistance for domain model constructionbased on NL requirements is therefore important.

This paper is concerned with developing an automatedsolution for extracting domain models from unrestricted NLrequirements, focusing on the situation where one cannotmake strong assumptions about either the requirements’ syn-tax and structure, or the process by which the requirements

Page 2: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

R1: The simulator shall maintain the scheduled sessions, the active session and also the list of sessions that have been already handled.

Simulator Sessionmaintain

Scheduled Session

Active Session

Archived Session

(a) (b)Figure 1: (a) Example requirements statement and(b) corresponding domain model fragment.

IP Address Networkspecifiedfor

R2: The simulator shall connect only to those networks for which the IP addresses have been specified.

(a) (b)Figure 2: (a) Example relative clause modifier(rcmod) dependency and (b) corresponding relation.

were elicited. We use UML class diagrams for representingthe extracted models. To illustrate, consider requirementsstatement R1 in Fig. 1(a). This requirement originates fromthe requirements document of a real simulator applicationin the satellite domain. Upon manually examining R1, adomain expert sketched the domain model fragment shownin Fig. 1(b). Using heuristic rules implemented via NaturalLanguage Processing (NLP) [18], a tool could automaticallyidentify several of the elements in this model fragment. In-deed, generalizable rules can be provided to extract all theelements, except for Archived Session and its relation to Ses-sion, whose identification would require human input.

Automated extraction of models from NL requirementshas been studied for a long time, with a large body of lit-erature already existing in the area, e.g., [10, 12, 17, 26,29, 32], to note some. Nevertheless, important aspects andopportunities that are crucial for the application of modelextraction in industry remain under-explored. Notably:

• There is limited empirical evidence about how well ex-isting model extraction approaches perform when appliedover industrial requirements. Existing approaches often as-sume restrictions on the syntax and structure of NL require-ments [31]. In many industrial situations, e.g., when thereare time pressures or little control over the requirementsauthoring process, these restrictions may not be met [5].There is therefore a need to empirically study the usefulnessof model extraction over unrestricted NL requirements.

• Modern NLP parsers provide detailed information aboutthe dependencies between different segments of sentences.Our examination of existing model extraction rules indi-cates that there are important dependency types which aredetectable via NLP, but which are not currently being ex-ploited for model extraction. To illustrate, consider require-ments statement R2, shown in Fig. 2(a), from the simulatorapplication mentioned earlier. In R2, there is a dependency,called a relative clause modifier (rcmod) dependency [9], be-tween the phrases “network” and “specified”. Based on thisdependency, which is detected by parsers such as the Stan-ford Parser [23], one can extract the relation in Fig. 2(b).Existing model extraction approaches do not utilize rcmodand thus do not find this relation. Similar gaps exist forsome other dependency types.

• An important generic class of information extraction rulesfrom the information retrieval literature is yet to be exploredfor model extraction. This class of rules, referred to as linkpaths [2] (or syntactic constraints [13]), enables extractingrelations between concepts that are only indirectly related.To illustrate, consider requirements statement R3, shown in

Simulator Log Message*

send1

Simulator Database1send log

message to1 Simulator Monitoring Interface1

send log messageto database via1

R3: The simulator shall send log messages to the database via the monitoring interface.

(a) (b)

(c) (d)Figure 3: (a) Example requirements statement, (b)direct relation, (c-d) link path (indirect) relations.

Fig. 3(a), again from the simulator application mentionedearlier. Existing model extraction approaches can detect therelation shown in Fig. 3(b), as “simulator” and “log message”are directly related to each other by being the subject andthe object of the verb “send”, respectively. Nevertheless, ex-isting approaches miss the indirect relations of Figs. 3(c),(d),which are induced by link paths.

Link path relations can have different depths, where thedepth represents the number of additional concepts linked tothe direct relation. For example, in the relation of Fig. 3(c),one additional concept, namely “database”, has been linkedto the direct relation, i.e., Fig. 3(b). The depth of the linkpath relation is therefore one. Using a similar reasoning, thedepth of the link path relation in Fig. 3(d) is two.

As suggested by our example, the direct relation of Fig. 3(b)is not the only plausible choice to consider for inclusion inthe domain model; the indirect relations of Figs. 3(c),(d)present meaningful alternative (or complementary) relations.Indeed, among the three relations in Figs. 3(b)-(d), the do-main expert found the one in Fig. 3(c), i.e., the link pathof depth one, useful for the domain model and excluded theother two relations. Using link paths in model extraction istherefore an important avenue to explore.

Contributions. Motivated by addressing the limitationsoutlined above, we make the following contributions:

(1) We develop an automated domain model extractor forunrestricted NL requirements. We use UML class diagramsfor representing the results of extraction. Our model ex-tractor combines existing extraction rules from the softwareengineering literature with link paths from the informationretrieval literature. We further propose new rules aimed atbetter exploiting the dependency information obtained fromNLP dependency parsers.

(2) Using four industrial requirements documents, we ex-amine the number of times each of the extraction rules im-plemented by our model extractor is triggered, providing in-sights about whether and to what extent each rule is capableof deriving structured information from NL requirements inrealistic settings. The four documents that we consider col-lectively contain 786 “shall” requirements statements.

(3) We report on an expert review of the output of ourmodel extractor over 50 randomly-selected requirementsstatements from one of the four industrial documents men-tioned above. The results of this review suggest that ≈90%of the conceptual relations identified by our model extrac-tor are either correct or partially correct, i.e., having onlyminor inaccuracies. Such level of correctness, noting thatno particular assumptions were made about the syntax andstructure of the requirements statements, is promising. Atthe same time, we observe that, from the set of relationsidentified, only ≈36% are relevant, i.e., deemed useful forinclusion in the domain model. Our results imply that lowrelevance is not a shortcoming in our model extractor perse, but rather a broader challenge to which other rule-basedmodel extractors are also prone. In this sense, our expert

Page 3: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

review reveals an important area for future improvement inautomated model extraction.

Structure. Sections 2 and 3 review the state of the art andprovide background. Section 4 presents our domain modelextraction approach. Section 5 reports on an empirical eval-uation of the approach. Section 6 concludes the paper.

2. STATE OF THE ARTWe synthesize the literature on domain model extraction

and compile a set of extraction rules (heuristics) in order toestablish the state of the art. The rules identified throughour synthesis are shown in Table 1. These rules are orga-nized into four categories, based on the nature of the infor-mation they extract (concepts, associations and generaliza-tions, cardinalities, and attributes). We illustrate each rulein the table with an example. We note that the literatureprovides rules for extracting (class) operations as well. How-ever, and in line with best practice [19], we deem operationsto be outside the scope of domain models. Furthermore,since operations typically become known only during thedesign stages, there is usually little information to be foundabout operations in requirements documents.

Our focus being on unrestricted NL requirements, we haveexcluded from Table 1 rules that rely on specific sentencepatterns. We do nevertheless include in the table five pattern-based rules (B3 to B5 and D1 to D2) due to the genericnature of these rules. The criterion we applied for the inclu-sion of a pattern-based rule was that the rule must have beenconsidered in at least two distinct previous publications. Wefurther exclude from Table 1 rules rooted in programmingconventions, e.g., the convention of separating concepts andattributes by an underscore, e.g., Bank Id.

Next, we describe the sources from which the rules of Ta-ble 1 originate: The pioneering studies by Abbott [1] andChen [8] laid the foundation for the subsequent work onmodel extraction from textual descriptions. Yue et al. [31]survey 20 approaches aimed at transforming textual require-ments into early analysis models. Of these, five [4, 15, 20,21, 24] provide automated support for extracting domainmodels, or models closely related to domain models, e.g.,object models. Yue et al. [31] bring together the rules fromthe above approaches, further accounting for the extensionsproposed to Abbott’s original set of rules [1] in other relatedstudies. Rules A1 to A4, B1, B4, B5, C1 to C4, and D1 toD3 in Table 1 come from Yue et al. [31].

To identify more recent strands of related research, weexamined all the citations to Yue et al. [31] based on GoogleScholar. Our objective was to identify any new extractionrules in the recent literature not already covered by Yue etal. [31]. We found two publications containing new rules:Vidya Sagar and Abirami [29], and Ben Abdessalem Karaaet. al. [7]. Our study of Vidya Sagar and Abirami [29]and a closely-related publication by Elbendak et. al. [12]upon which Vidya Sagar and Abirami build yielded four newrules. These are A5, B2, B3, and D4 in Table 1. As for BenAbdessalem Karaa et. al. [7], all the new rules proposedtherein are pattern-based. These rules do not match ourinclusion criterion mentioned above, as no other publicationwe know of has used these (pattern-based) rules.

A limitation in the rules of Table 1 is that these rulesdo not cover link paths, as we already explained in Sec-tion 1. Link-path rules have been used in the informationretrieval domain for mining structured information from var-

Table 1: Existing domain model extraction rules.

Con

cept

s A

ssoc

iati

ons

and

Gen

eral

izat

ions

C

ardi

nalit

ies

Att

ribu

tes

* NP stands for noun phrase; a definition is provided in Section 3.

"The train arrives in the morning at 10 AM." :: Arrival time is an attribute of Train.D4 An intransitive verb with an

adverb suggests an attribute.

"large library" :: Size is an attribute of Library.D3

The adjective of an adjectivally modified NP suggests an attribute.

"Book's title" :: Title is an attribute of Book.D2 Genitive cases, e.g., NP's NP, suggest attributes.

D1"identified by", "recognized by", "has" [...] suggest attributes.

"An employee is identified by the employee id." :: Employee Id is an attribute of

Employee.

An explicit number before a concept suggests a cardinality.

Student Exampass

“The student passed 3 exams.” ::

31C4

C3 Student Exampass

“The student passed the exam.” ::

11

If the source concept of an association is singular and the target concept is singular as well, then the association is one-to-one.

C2

If the source concept of an association is singular and the target concept is plural / quantified by a definite article, then the association is one-to-many.

Student Exampass

“The student passed the exams.” ::

*1

If the source concept of an association is plural / has a universal quantifier and the target concept has a unique existential quantifier, then the association is many-to-one.

C1 Arriving Airplane Control Tower

contact

“All arriving airplanes shall contact the control tower.” ::

* 1

Premium ServiceService

“Service may be premium service or normal service.” ::

Normal Service

B5"is a", "type of", "kind of", "may be", [...] suggest generalizations.

"contain", "is made up of", "include", [...] suggest aggregations / compositions.

B4 Book Library

“The library contains books.” ::

Customer BLUXbank

“The bank of the customer is BLUX.” ::<R> in a requirement of the form "<R> of <A> is <B>" is likely to be an association.

B3

Cheque Banksent to

“The cheque is sent to the bank.” ::B2 A verb with a preposition is

an association.

Simulator Log Messagesend

R3 in Fig. 3 ::Transitive verbs are associations. B1

Description

R3 in Fig. 3 :: Log Message

R3 in Fig. 3 :: Simulator (if it is recurring)

A1

"Borrowing is processed by the staff." :: Borrowing

A4

Example

Objects in the requirements are concepts.

A5

R3 in Fig. 3 :: Simulator

Gerunds in the requirements are concepts.

All NPs* in the requirements are candidate concepts.

R3 in Fig. 3 :: Simulator, Log Message, Database, and Monitoring Interface

Rule

A3

Recurring NPs are concepts. Subjects in the requirements are concepts.

A2

ious natural-language sources, e.g., Wikipedia pages [2, 13]and the biomedical literature [30]. However, this class ofrules has not been used for model extraction to date. An-other limitation, again explained in Section 1, is that exist-ing model extraction rules do not fully exploit the resultsfrom NLP tools. Our approach, described in Section 4, pro-poses extensions in order to address these limitations. Ourempirical evaluation, described in Section 5, demonstratesthat our extensions are of practical significance.

Further, the large majority of existing work on model ex-traction is evaluated over exemplars and in artificial settings.Empirical studies on model extraction in real settings remainscarce. Our empirical evaluation, which is conducted in anindustrial context, takes a step towards addressing this gap.

Page 4: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

The simulator shall continuously monitor its connection to the SNMP manager and any linked devices.

dobj

advmoddet

NP NP NP

NP

possdet

amod

conj_and

nsubj

prep_to

prep_to

1 2 3 4 5 6 7 8 9 10 11 12

1 1 2 3

4

13 14 15

aux

nndet

prep_toprep_to

wor

dde

pend

enci

es

depe

nden

cies

de

rived

by

Alg

. of F

ig. 8

(Sec

tion

4.2)

14

amod

153 4

advmodaux

VB

R4:

ref_to

6

dobj

nsubj

(identified directly by coreference resolution)

Figure 5: Results of dependency parsing for requirement R4 of Fig. 4(a).

(ROOT (S (NP (DT The) (NN simulator)) (VP (MD shall) (ADVP (RB continuously)) (VP (VB monitor) (NP (PRP$ its) (NN connection)) (PP (TO to) (NP (NP (DT the) (NNP SNMP) (NN manager)) (CC and) (NP (DT any) (VBN linked) (NNS devices)))))) (. .)))

R4: The simulator shall continuously monitor its connection to the SNMP manager and any linked devices.

(b)

(a)

Figure 4: (a) A requirement and (b) its parse tree.

3. SYNTACTIC PARSINGIn this section, we provide background on syntactic pars-

ing, also known as syntactic analysis, which is the key en-abling NLP technology for our model extraction approach.Syntactic parsing encompasses two tasks: phrase structureparsing and dependency parsing. Our model extractor usesboth. We briefly introduce these tasks below.

Phrase structure parsing [18] is aimed at inferring thestructural units of sentences. In our work, the units of in-terest are noun phrases and verbs. A noun phrase (NP) isa unit that can be the subject or the object of a verb. Averb (VB) appears in a verb phrase (VP) alongside any di-rect or indirect objects, but not the subject. Verbs can haveauxiliaries and modifiers (typically adverbs) associated withthem. To illustrate, consider requirements statement R4 inFig. 4(a). The structure of R4 is depicted in Fig. 4(b) us-ing what is known as a parse tree. To save space, we donot visualize the tree, and instead show it in a nested-listrepresentation commonly used for parse trees.

Dependency parsing [28] is aimed at finding grammaticaldependencies between the individual words in a sentence.In contrast to phrase structure parsing, which identifies thestructural constituents of a sentence, dependency parsingidentifies the functional constituents, e.g., the subject andthe object. The output of dependency parsing is representedas a directed acyclic graph, with labeled (typed) depen-dency relations between words. The top part of the graphof Fig. 5 shows the output of dependency parsing over re-quirements statement R4. An example typed dependencyhere is nsubj(monitor{5},simulator{2}), stating that “simula-tor” is the subject of the verb “monitor”.

Syntactic parsing is commonly done using the pipelineof NLP modules shown in Fig. 6. In our work, we use thepipeline implementation provided by the Stanford Parser [23].The first module in the pipeline is the Tokenizer, which splitsthe input text, in our context a requirements document, intotokens. A token can be a word, a number, or a symbol.

Tokenizer

SentenceSplitter

POS Tagger

Parser(Phrase Structure

+ Dependency)

NLRequirements

1

2

3

5

Named Entity Recognizer 4

StructureParse Tree NP NPVP

TypedDependencies

nsubjnn dobj

Annotations

CoreferenceResolver 6

Figure 6: Parsing pipeline.

The second module,the Sentence Split-ter, breaks the textinto sentences. Thethird module, thePOS Tagger, attachesa part-of-speech (POS)tag to each token.POS tags representthe syntactic cate-gories of tokens, e.g.,nouns, adjectives andverbs. The fourthmodule, the Named-Entity Recognizer, identifies entities belonging to pre-defined categories, e.g., proper nouns, dates and locations.The fifth and main module is the Parser, encompassing bothphrase structure parsing and dependency parsing. The finalmodule is the Coreference Resolver. This (optional) modulefinds expressions that refer to the same entity. We concernourselves with pronominal coreference resolution only, whichis the task of identifying, for a given pronoun such as “its”and “their”, the NP that the pronoun refers to. Fig. 5 showsan example of pronominal coreference resolution, where thepronoun “its” is linked to the referenced NP, namely “thesimulator”, via a ref to dependency.

4. APPROACH

NPs, VBs Dependencies

Process Requirements Statements

NLRequirements

Lift Dependencies to Semantic Units

nsubjnn dobj

DomainModel

Construct DomainModel

0..1 1..*

relation

ExtractionRules

1

23

Figure 7: Approach Overview.

Fig. 7 presentsan overview of ourdomain model ex-traction approach.The input to theapproach is an NLrequirements doc-ument and the out-put is a UML classdiagram. Below,we elaborate thethree main steps of our approach, marked 1-3 in Fig. 7.

4.1 Processing the Requirements StatementsThe requirements processing step includes the following

activities: (1) detecting the phrasal structure of the require-ments, (2) identifying the dependencies between the words inthe requirements, (3) resolving pronominal coreferences, and(4) performing stopword removal and lemmatization; theseare common NLP tasks, respectively for pruning words thatare unlikely to contribute to text analysis, and for trans-forming words into their base morphological form.

Page 5: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

Activities (1), (2), and (3) are carried out by the pipelineof Fig. 6. From the parse tree generated by this pipeline foreach requirements statement, we extract the atomic NPsand the VBs. Atomic NPs are those that cannot be furtherdecomposed. For example, from the parse tree of Fig. 4(b),we extract: “The simulator” (NP), “monitor” (VB), “its con-nection” (NP), “the SNMP manager” (NP) and “any linkeddevices” (NP). We do not extract “the SNMP manager andany linked devices” because this NP is not atomic.

We then subject the NPs to stopword removal. Stopwordsare words that appear so frequently in the text that they nolonger serve an analytical purpose [22]. Stopwords include,among other things, determiners and predeterminers. In ourexample, stopword removal strips the extracted NPs of thedeterminers “the” and “any”. The VBs and the tail words ofthe NPs are further subject to lemmatization. In our exam-ple, “linked devices” is transformed into“linked device”. Hadthe VB been, say, “monitoring”, it would have been trans-formed into“monitor”. VBs in passive form are an exceptionand not lemmatized, e.g., see the example of Fig. 2.

The NPs and VBs obtained by following the process aboveprovide the initial basis for labeling the concepts, attributes,and associations of the domain model that will be constructedin Step 3 of our approach (see Fig. 7). The dependenciesobtained from executing the pipeline of Fig. 6 need to un-dergo further processing and be combined with the resultsof coreference resolution before they can be used for domainmodel construction. This additional processing is addressedby Step 2 of our approach, as we explain next in Section 4.2.

4.2 Deriving Dependencies at a Semantic LevelAs seen from Fig. 5, the results of dependency parsing (top

of the figure) are at the level of words. Many of these de-pendencies are meaningful for model extraction only at thelevel of NPs, which, along with verbs, are the main semantic(meaning-bearing) units of sentences. For example, considerthe dependency prep to(connection{7}, manager{11}), stat-ing that “manager” is a prepositional complement to “con-nection”. To derive from this dependency a meaningful rela-tion for the domain model, we need to raise the dependencyto the level of the involved NPs, i.e., prep to(NP2, NP3) inFig. 5. We do so using the algorithm of Fig. 8.

The algorithm takes as input a set P composed of theatomic NPs and the VBs, and the results of dependencyparsing and coreference resolution, all from Step 1 of ourapproach (Section 4.1). The algorithm initializes the output(i.e., the semantic-unit dependencies) with the ref to depen-dencies (L.1), noting that the targets of ref to dependenciesare already at the level of NPs. Next, the algorithm identi-fies, for each word dependency, the element(s) in P to whichthe source and the target of the dependency belong (L.3–12).If either the source or target words fall outside the bound-aries of the elements in P , the words themselves are treatedas being members of P (L.6,11). This behavior serves twopurposes: (1) to link the VBs to their adverbial modifiers,illustrated in the example of Fig. 5, and (2) to partiallycompensate for mistakes made by phrase structure parsers,which are typically only ≈90% accurate in phrase detec-tion [6, 33]. Dependencies between the constituent wordsof the same NP are ignored (L.13), except for the adjectivalmodifier (amod) dependency (L.16–18) which is used by ruleD3 of Table 1. When the algorithm of Fig. 8 is executed overour example requirements statement R4, it yields the depen-

Input: A set P of all (atomic) NPs and VBs in a sentence S;Input: A set DWord of word dependencies in S;Input: A set R of ref to dependencies for the pronouns in S;Output: A set DSem of semantic-unit dependencies for S;

1: DSem R; /*Initialize DSem with the results of coref resolution.*/

2: for all dep 2 DWord do3: if (there exists some p 2 P to which dep.source belongs) then4: psource p;5: else6: psource dep.source;7: end if8: if (there exists some p 2 P to which dep.target belongs) then9: ptarget p;

10: else11: ptarget dep.target;

12: end if13: if (psource 6= ptarget) then

14: Add to DSem a new dependency with source psource,target ptarget and type dep.type;

15: else16: if (dep.type is amod) then17: Add dep to DSem;18: end if19: end if20: end for21: return DSem

Figure 1: Algorithm for lifting word dependencies to semantic-unit dependen-cies.0.1 Introduction

Figure 8: Algorithm for lifting word dependenciesto semantic-unit dependencies.

Table 2: New extraction rules in our approach.

“The simulator shall provide a function to edit the existing system configuration.”

Simulator Existing System Configuration

provide function to edit

N3Non-finite verbal modifiers (vmod dependencies)suggest associations.

System Operator

System Configuration

initialize

“The system operator shall be able to initialize the system configuration, and to edit the existing system configuration.”

System Operator

Existing System Configuration

edit

Verbal clausal complements (ccomp/xcomp dependencies) suggest associations.

N2

Latest Warning Message

System Configuration

belong to

“The system operator shall display the system configuration, to which the latest warning message belongs.”

(Another example for rcmod was given in Fig. 2)

Rule

Relative clause modifiers of nouns (rcmod dependency) suggest associations.

N1

ExampleDescription

dencies shown on the bottom of Fig. 5. These dependenciesare used in Step 3 of our approach, described next, for ex-tracting associations, aggregations, generalizations and alsofor linking attributes to concepts.

4.3 Domain Model ConstructionThe third and final step of our approach is constructing a

domain model. This step uses the NPs and VBs identifiedin Step 1 (after stopword removal and lemmatization), alongwith the semantic-unit dependencies derived in Step 2. Theextraction rules we apply for model construction are: (1) therules of Table 1 gleaned from the state of the art, (2) threenew rules, described and exemplified in Table 2, which wepropose in order to exploit dependency types that have notbeen used for model extraction before, and (3) link paths [2],which we elaborate further later in this section. In Table 3,we show all the model elements that our approach extractsfrom our example requirements statement R4. In the restof this section, we outline the main technical factors in ourdomain model construction process. We organize our dis-cussion under five headings: domain concepts, associations,generalizations, cardinalities, and attributes.

Domain concepts. All the extracted NPs (from Step 1)are initially considered as candidate concepts. If a candidateconcept appears as either the source or the target of some

Page 6: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

Table 3: Extraction results for R4 of Fig. 4(a).

6 Linked Device DeviceGeneralization D3

5 D2† (+ coreference resolution)

Aggregation Connection SImulator

Simulator Linked Device*

continuously monitorconnection to1

Link Path Depth 1, C2 (for cardinalities)4 Association

Link Path Depth 1, C3 (for cardinalities)3 Association Simulator SNMP

Manager1continuously monitor

connection to1

Association2 B1,C3 (for cardinalities)Simulator Connection

1continuously

monitor1

1 A1Simulator, Connection, SNMP Manager, Linked Device, Device

Candidate Concept

Extraction rule(s) triggered

Extracted element(s)Type of

element(s) extracted

#

As we explain in Section 4.3, in contrast to some existing approaches, we use D2 and D3 (from Table 1) for extracting aggregations and generalizations, respectively, rather than attributes.

*

A1 in this table is an enhanced version of A1 in Table 1, as discussed in Section 4.3.

†*

Table 4: Different subject types.

“The operator of the ground station shall initialize the system configuration.” operatorGenetive

Subject

Passive Subject

"The system configuration shall be initialized by the operator." operator

Simple Subject operator"The operator shall initialize the system configuration."

SubjectExampleSubject Type

dependency (from Step 2), the candidate concept will bemarked as a domain concept. If either the source or the tar-get end of a dependency is a pronoun, that end is treated asbeing the concept to which the pronoun refers. Table 1 listsa total of five rules, A1–A5, for identifying domain concepts.Our approach implements A1, which subsumes A2–A5. Weenhance A1 with the following procedure for NPs that havean adjectival modifier (amod dependency), as long as theadjective appears at the beginning of an NP after stopwordremoval: we remove the adjective from the NP and add theremaining segment of the NP as a domain concept. For ex-ample, consider row 1 of Table 3. The concept of Device herewas derived from Linked Device by removing the adjectivalmodifier. The relation between Device and Linked Device isestablished via rule D3 discussed later (see Generalizations).

Associations. The VBs (from Step 1) that have subjects orobjects or both give rise to associations. The manifestationof the subject part may vary in different sentences. Table 4lists and exemplifies the subject types that we handle in ourapproach. Our treatment unifies and generalizes rules B1and B2 of Table 1. We further implement rule B3, but aswe observe in our evaluation (Section 5), this rule is notuseful for the requirements in our case studies.

For extracting associations, we further propose three newrules, N1–N3, shown in Table 2. Rule N1 utilizes relativeclause modifier (rcmod) dependencies. In the example pro-vided for this rule in Table 2, “system configuration” is mod-ified by a relative clause, “to which the latest warning mes-sage belongs”. From this, N1 extracts an association be-tween System Configuration and Latest Warning Message.

Rule N2 utilizes verbal clausal complement (ccomp andxcomp) dependencies. In the example given in Table 2, “ini-tialize” and “edit” are clausal complements to “able”. Here,the subject, “system operator”, is linked to “able”, and thetwo objects, “system configuration” and “existing systemconfiguration”, are linked to “initialize” and “edit”, respec-tively. What N2 does here is to infer that “system operator”(conceptually) serves as a subject to “initialize” and “edit”,extracting the two associations shown in Table 2.

As for Rule N3, the purpose is to utilize non-finite verbalmodifier (vmod) dependencies. In the example we show inTable 2, “edit” is a verbal modifier of “function”. We use thisinformation for enhancing the direct subject-object relationbetween “simulator” and “function”. Specifically, we link theobject of the verbal modifier, “existing system configura-tion”, to the subject, “simulator”, extracting an associationbetween Simulator and Existing System Configuration.

The associations resulting from our generalization of B1and B2, explained earlier, and from our new rules, N1 to N3,are all subject to a secondary process, aimed at identifyinglink paths [2]. Intuitively, a link path is a combination oftwo or more direct links. To illustrate, consider rows 3 and4 in Table 3. The basis for both of the associations shown inthese rows is the direct association in row 2 of the table. Thedirect association comes from the subject-object relation be-tween NP1 and NP2 in the dependency graph of Fig. 5. Theassociation in row 3 of Table 3 is induced by combining thisdirect association with the dependency prep to(NP2, NP3)from the dependency graph. The association of row 4 isthe result of combining the direct association with anotherdependency, prep to(NP2, NP4). In our approach, we con-sider all possible ways in which a direct association can becombined with paths of prepositional dependencies (prep ∗dependencies in the dependency graph).

For extracting aggregations, which are special associationsdenoting containment relationships, we use rules B4 and D2from Table 1. With regard to D2, we point out that a num-ber of previous approaches, e.g., [12, 31, 29], have used thisrule for identifying attributes. Larman [19] notes the diffi-culty in choosing between aggregations and attributes, rec-ommending that any entity that represents in the real worldsomething other than a number or a string of text shouldbe treated as a domain concept, rather than an attribute.Following this recommendation, we elect to use D2 for ex-tracting aggregations. Ultimately, the user will need to de-cide which representation, an aggregation or an attribute, ismost suitable on a case-by-case basis.

An important remark about D2 is that this rule can becombined with coreference resolution, which to our knowl-edge, has not been done before for model extraction. Anexample of this combination is given in row 5 of Table 3.Specifically, the aggregation in this row is induced by thepossessive pronoun “its”, which is linked to “simulator” viacoreference resolution (see the ref to dependency in Fig. 5).

Generalization. From our experience, we surmise that gen-eralizations are typically left tacit in NL requirements andare thus hard to identify automatically. The main rule tar-geted at generalizations is B5 in Table 1. This rule, as evi-denced by our evaluation (Section 5), has limited usefulnesswhen no conscious attempt has been made by the require-ments authors to use the patterns in the rule.

We nevertheless observe that certain generalizations man-ifest through adjectival modifiers. These generalizations canbe detected by rule D3 in Table 1. For example, row 6 ofTable 3 is extracted by D3. Like rule D2 discussed ear-lier, D3 has been used previously for attributes. However,without user intervention, identifying attributes using D3poses a challenge. To illustrate, consider the example wegave in Table 1 for D3. There, the user would need to pro-vide the attribute name, size. For simple cases, e.g., sizes,colors, shapes and quantities, one can come up with a taxon-omy of adjective types and use the type names as attribute

Page 7: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

names [29]. We observed though that generic adjective typesare unlikely to be helpful for real and complex requirements.We therefore elect to use D3 for extracting generalizations.Similar to the argument we gave for D2, we leave it to theuser to decide when an attribute is more suitable and toprovide an attribute name when this is the case.

Cardinalities. We use rules C1 to C4 of Table 1 for de-termining the cardinalities of associations. These rules arebased on the quantifiers appearing alongside the terms thatrepresent domain concepts, and the singular versus pluralusage of these terms. For example, the cardinalities shownin rows 2 to 4 of Table 3 were determined using these rules.

Attributes. We use rules D1 and D4 of Table 1 for extract-ing attributes. As we discussed above, we have chosen touse rules D2 and D3 of Table 1 for extracting aggregationsand generalizations, respectively. With regard to rule D4,we note that one cannot exactly pinpoint the name of theattribute using this rule. Nevertheless, unlike rule D3 whichis not applicable without user intervention or an adjectivetaxonomy, D4 can reasonably guess the attribute name. Forinstance, if we apply our implementation of D4 to the re-quirement exemplifying this rule in Table 1, we obtain arrive(instead of arrival time) as the attribute name.

5. EMPIRICAL EVALUATIONIn this section, we evaluate our approach by addressing

the following Research Questions (RQs):RQ1. How frequently are different extraction rulestriggered? One cannot expect large gains from rules thatare triggered only rarely. A rule being triggered frequentlyis thus an important prerequisite for the rule being useful.RQ1 aims to measure the number of times different extrac-tion rules are triggered over industrial requirements.RQ2. How useful is our approach? The usefulness ofour approach ultimately depends on whether practitionersfind the approach helpful in real settings. RQ2 aims to assessthrough a user study the correctness and relevance of theresults produced by our approach.RQ3. Does our approach run in practical time? Re-quirements documents may be large. One should thus beable to perform model extraction quickly. RQ3 aims to studywhether our approach has a practical running time.

5.1 ImplementationFor syntactic parsing and coreference resolution, we use

Stanford Parser [23]. For lemmatization and stopword re-moval, we use existing modules in the GATE NLP Work-bench [14]. We implemented the model extraction rulesusing GATE’s scripting language, JAPE, and GATE’s em-bedded Java environment. The extracted class diagrams arerepresented using logical predicates (Prolog-style facts). Ourimplementation is approximately 3,500 lines of code, exclud-ing comments and third-party libraries. Our implementa-tion is available at: https://bitbucket.org/carora03/redomex.

5.2 Results and DiscussionOur evaluation is based on four industrial requirements

documents, all of which are collections of “shall” require-ments written in English. Table 5 briefly describes thesedocuments, denoted Case A–D, and summarizes their maincharacteristics. We use all four documents for RQ1 andRQ3. For RQ2, we use selected requirements from Case A.

Table 5: Description of case study documents.

Simulator application for satellite systems N.A.

N.A.

Satellite ground station control system

Description

Safety evidence information management system for safety certification

No

Yes (Rupp's)Case B

Template used?

% of reqs. complying with template

158

Case

Yes (Rupp's)

380

Case C89%110Case D

# ofreqs.

Case A

138Satellite data dissemination network

64%

No

Cases A–C concern software systems in the satellite do-main. These three documents were written by differentteams in different projects. In Cases B and D, the require-ments authors had made an effort to comply with Rupp’stemplate [25], which organizes the structure of requirementssentences into certain pre-defined slots. The number oftemplate-compliant requirements in these two documents ispresented in Table 5 as a percentage of the total number ofrequirements. These percentages were computed in our pre-vious work [5], where Cases B and D were also used as casestudies. Our motivation to use template requirements inour evaluation of model extraction is to investigate whetherrestricting the structure of requirements would impact theapplicability of generic extraction rules, which assume noparticular structure for the requirements.

RQ1. Table 6 presents the results of executing our modelextractor on Cases A–D. Specifically, the table shows thenumber of times each of the rules employed in our approachhas been triggered over our case study documents. The ruleIDs correspond to those in Tables 1 and 2; LP denotes linkpaths. The table further shows the number of extractedelements for each case study, organized by element types.

As indicated by Table 6, rules B1, C1 to C4, D2, D3,and LP are the most frequently triggered. B1 is a genericrule that applies to all transitive verbs. D2 and D3 addressgenitive cases and the use of adjectives in noun phrases.These constructs are common in English, thus explainingthe frequent application of D2 and D3. We note that, as weexplained in Section 4.3, we use D2 and D3 for identifyingaggregations and generalizations, respectively.

Link paths, as stated earlier, identify indirect associations.Specifically, link paths build upon the direct associationsidentified by B1, B2, and N1 to N3. To illustrate, considerthe example in Fig. 3. Here, B1 retrieves the direct associ-ation in Fig. 3(b), and link paths retrieve the indirect onesin Figs. 3(c),(d). In this example, we count B1 as beingtriggered once and link paths as being triggered twice.

Rules C1 to C4 apply to all associations, except aggrega-tions. More precisely, C1 to C4 are considered only alongsideB1 to B3, N1 to N3, and link paths. For instance, C2 is trig-gered once and C3 twice for the associations of Figs. 3(b-d).

Our results in Table 6 indicate that B3, B5, and D1 weretriggered either rarely or not at all in our case study doc-uments. These rules are based on fixed textual patterns.While the patterns underlying these rules seem intuitive,our results suggest that, unless the requirements authorshave been trained a priori to use the patterns (not the casefor the documents in our evaluation), such patterns are un-likely to contribute significantly to model extraction.

With regard to link paths and our proposed rules, N1to N3, in Table 2, we make the following observations: Linkpaths extracted 47% of the (non-aggregation) associations inCase A, 45% in Case B, 50% in Case C, and 46% in Case D.And, rules N1 to N3 extracted 23% of the associations inCase A, 24% in Case B, 29% in Case C, and 37% in Case D.These percentages indicate that link paths and our new rules

Page 8: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

Table 6: Number of times extraction rules were triggered and number of extracted elements (per document).

Case CCase D

Case ACase B

# of generalizations

90

35132

7790

# ofaggregations

4390

71730405

# of (regular)associations

274

526# of attributes

1521264

541

# of concepts

620370

85

370

85

620541

2 2341274 1250 150 680 35 1140 76 2011304050 5 7717 0 852 3668 21

251 58 32781 471 8990 730 69210 1920 4824 47 246526 760 310139 1 424

LPN3N2N1B2 D4B4 C1-4B5B1 D3B3 D1 D2A1*

* A1 in this table is an enhanced version of A1 in Table 1, as discussed in Section 4.3. A1 subsumes A2 to A5 ( of Table 1), as noted in the same section.

The small number of attributes is explained by our decision to use D2 and D3 (resp.) for extracting aggregations and generalizations instead of attributes, as noted in Section 4.3.†

contribute significantly to model extraction. Assessing thequality of the extracted results is the subject of RQ2.

With regard to whether the use of templates has an impacton the applicability of the generic rules considered in this pa-per, the results in Table 6 do not suggest an advantage ora disadvantage for templates, as far as the frequency of theapplication of the extraction rules is concerned. We there-fore anticipate that the generic rules considered in this papershould remain useful for restricted requirements too. Plac-ing restrictions on requirements may nevertheless provideopportunities for developing additional extraction rules [32].Such rules would naturally be tied to the specific restrictionsenforced and are thus outside the scope of this paper.

RQ2. RQ2 aims at assessing practitioners’ perceptions aboutthe correctness and relevance of the results produced by ourapproach. Our basis for answering RQ2 is an interview sur-vey conducted with the lead requirements analyst in Case A.Specifically, we selected at random 50 requirements state-ments (out of a total of 158) in Case A and solicited theexpert’s feedback about the model elements that were auto-matically extracted from the selected requirements. Choos-ing Case A was dictated by the criteria we had to meet: Toconduct the interview, we needed expert(s) who had UMLdomain modeling experience and who were further fully fa-miliar with the requirements. This restricted our choice toCases A and B. Our interview would further require a sig-nificant time commitment from the expert(s). Making sucha commitment was justified by the expert for Case A only,due to the project still being ongoing.

We collected the expert’s feedback using the questionnaireshown in Fig. 9. This questionnaire has three questions, Q1to Q3, all oriented around the notion of “relation”. We de-fine relations to include (regular) associations, aggregations,generalizations, and attributes. The rationale for treatingattributes as relations is the conceptual link that exists be-tween an attribute and the concept to which the attributebelongs. The notion of relation was clearly conveyed to theexpert using a series of examples prior to the interview. Ourquestionnaire does not include questions dedicated to do-main concepts, since, as we explain below, the correctnessand relevance of the domain concepts at either end of a givenrelation are considered while that relation is being exam-ined. During the interview, the expert was asked to evaluate,through Q1 and Q2 in the questionnaire, the individual re-lations extracted from a given requirements statement. Theexpert was further asked to verbalize his rationale for theresponses he gave to these questions. Once all the relationsextracted from a requirements statement had been exam-ined, the expert was asked, through Q3, whether there wereany other relations implied by that requirements statementwhich were missing from the extracted results.

The relations extracted from each requirements statementwere presented to the expert in the same visual format as de-picted by the third column of Table 3. The extraction rulesinvolved were not shown to the expert. To avoid decontex-

❑ Yes ❑ Partially No❑

❑ Yes ❑ Maybe No❑

Q3 (asked per requirements statement). Are there any other relations that this requirements statement implies? If yes, please elaborate.

Q2 (asked per relation). Should this relation be in the domain model?

Q1 (asked per relation). Is this relation correct?

Figure 9: Interview survey questionnaire.

tualizing the relations, we did not present to the expert theextracted relations in isolation. Instead, a given require-ments statement and all the relations extracted from it werevisible to the expert on a single sheet as we traversed therelations one by one and asking Q1 and Q2 for each of them.

Q1 addresses correctness. A relation is deemed correctif the expert can infer the relation by reading the under-lying requirements statement. We instructed the expert torespond to Q1 by “Yes” for a given relation, if all the fol-lowing criteria were met: (1) the concept (or attribute) ateach end of the relation is correct, (2) the type assigned tothe extracted relation (e.g., association or generalization)is correct, and (3) if the relation represents an association,the label and the cardinalities of the association are correct.The expert was instructed to answer by “Partially” when hesaw some inaccuracy with respect to the correctness crite-ria above, but he found the inaccuracy to be minor and notcompromising the meaningfulness of the relation; otherwise,the expert was asked to respond by “No”.

The correctness of a relation per se does not automaticallywarrant its inclusion in the domain model. Among otherreasons, the relation might be too obvious or too detailed forthe domain model. Q2 addresses relevance, i.e., whether arelation is appropriate for inclusion in the domain model.The expert was asked Q2 for a given relation only upon a“Yes” or “Partially” response to Q1. If the expert’s answerto Q1 was “No”, the answer to Q2 was an automatic “No”.If the expert had answered Q1 by “Partially”, we asked himto answer Q2 assuming that the inaccuracy in the relationhad been already resolved.

Finally, Q3 addresses missing relations. A relation is miss-ing if it is identifiable by a domain expert upon manuallyexamining a given requirements statement R, but which isabsent from the relations that have been automatically ex-tracted from R. A missing relation indicates one or a combi-nation of the following situations: (1) information that is notextracted due to technical limitations in automation, (2) in-formation that is tacit in a requirements statement and thusinferable only by a human expert, (3) information that isimplied by the extracted relations, but which the expert de-cides to represent differently, i.e., using modeling constructsdifferent than the extracted relations. The expert answeredQ3 after having reviewed all the relations extracted from agiven requirements statement.

Our interview was split into three sessions, with a periodof at least one week in between the sessions. The durationof each session was limited to a maximum of two hours to

Page 9: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

Table 7: Correctness and relevance results obtained from our expert interview, organized by extraction rules.

Y18

P11 100%

C%N0

B1

Y12

M0 41%

R%N17

Y3

P1 100%

C%N0

B2

Y1

M0 25%

R%N3

Y13

P0 77%

C%N4

B4

Y0

M0 0%

R%N17

Y16

P4 91%

C%N2

D2

Y8

M0 36%

R%N14

Y17

P0 74%

C%N6

D3

Y7

M4 48%

R%N12

Y0

P0 0%

C%N2

D4

Y0

M0 0%

R%N2

Y2

P4 86%

C%N1

N1

Y3

M0 43%

R%N4

Y10

P10 100%

C%N0

N2

Y7

M0 35%

R%N13

Y5

P9 93%

C%N1

N3

Y7

M1 53%

R%N7

Y42

P26 92%

C%N6

Link Paths

Y26

M0 35%

R%N48

Q2 (Relevance)

Q1 (Correctness)

YesY PartiallyP NoN Correctness (%)C%Q1 (Correctness):

(Relation) Extraction Rule

YesY MaybeM NoN Relevance (%)R%Q2 (Relevance):Legend

Correctness(%) Relevance(%)

(126)59.2%

(65)30.5%

0%

(71)33.3%(5)2.4%

(137)64.3%

Q1 Q2Yes NoPartially Yes NoMaybe

25%

50%

75%

100%

(22)10.3%

40%

60%

80%

100%

30%

50%

70%

90%

20%

(a) (b)

Figure 10: (a) Raw and (b) bootstrapping resultsfor Q1 and Q2.

avoid fatigue effects. At the beginning of each session, weexplained and exemplified to the expert the interview pro-cedure, including the questionnaire.

Our approach extracted a total of 213 relations from the50 randomly-selected requirements of Case A. All these 213relations were examined by the expert. Fig. 10(a) shows theinterview results for Q1 and Q2. As shown by the figure:First, ≈90% of the relations were deemed correct or partiallycorrect, and the remaining 10% incorrect; and second, ≈36%of the relations were deemed relevant or maybe relevant forinclusion in the domain model. The remaining 64% of therelations were deemed not relevant (inclusive of the 10% ofthe relations that were deemed incorrect).

Due to the expert’s limited availability, we covered only≈32% (50/158) of the requirements statements in Case A.The 213 relations extracted from these requirements consti-tute ≈31% (213/678) of the total number of relations ob-tained from Case A by our model extractor. To provide ameasure of correctness and relevance which further accountsfor the uncertainty that results from our random sampling ofthe requirements statements, we provide confidence intervalsfor correctness and relevance using a statistical technique,known as bootstrapping [11]. Specifically, we built 1000 re-samples with replacement of the 50 requirements that wereexamined in our interview. We then computed as a percent-age the correctness and relevance of the relations extractedfrom each resample. For a given resample, the correctnesspercentage is the ratio of correct and partially correct re-lations over the total number of relations. The relevancepercentage is the ratio of relevant and maybe relevant rela-tions over the total number of relations. Fig. 10(b) shows,using box plots, the distributions of the correctness and rel-evance percentages for the 1000 resamples. These resultsyield a 95% confidence interval of 83%–96% for correctnessand a 95% confidence interval of 29%–43% for relevance.

The practical implication of the above findings is that,when reviewing the extracted relations, analysts will have tofilter 57%–71% of the relations, despite the large majority ofthem being correct or partially correct. While we anticipatethat filtering the unwanted relations would be more cost-effective than forgoing automation and manually extracting

the desired relations from scratch, the required level of filter-ing needs to be reduced. As we discuss in Section 6, improv-ing relevance and minimizing such filtering is an importantdirection for future work.

In Table 7, we provide a breakdown of our interview sur-vey results, organized according to the rules that were trig-gered over our requirements sample and showing the cor-rectness and relevance percentages for each rule. As seenfrom these percentages, all triggered rules except B4 andD4 proved useful in our study. In particular, the results ofTable 7 indicate that our proposed extensions, i.e., rules N1to N3 and link paths, are useful in practice.

An observation emerging from the relevance percentagesin Table 7 (green-shaded cells) is that relevance is low acrossall the extraction rules and not only for our proposed exten-sions (N1 to N3 and link paths). This implies that other ex-isting rule-based approaches for domain model extraction arealso susceptible to the relevance challenge. This observationfurther underscores the need for addressing the relevancechallenge in future work.

As noted earlier, we asked the expert to verbalize his ra-tionale for his responses to Q1 and Q2. This rationale con-tained a wealth of information as to what made a relationincorrect or only partially correct, and what made a relationnot relevant. In Table 8, we provide a classification of thereasons the expert gave for partial correctness and for incor-rectness (Q1) and for non-relevance (Q2). The number ofrelations falling under each category in the classification isprovided in the column labeled “Count”. For each category,we provide an example of a problematic relation and, whereapplicable, the relation desired by the expert. The tableis self-explanatory. The only remark to be made is that thereasons given by the expert for partial correctness and for in-correctness have one area of overlap, namely wrong relationtype, as seen from rows 3 and 5 of Table 8. For instance, inthe example of row 3, an aggregation was extracted, but thedesired relation was an attribute. The expert viewed this in-accuracy as minor. In contrast, in the example of row 5, theexpert found the extracted aggregation conceptually wrong,since the desired relation was a generalization.

In response to Q3 (from the questionnaire of Fig. 9), theexpert identified 13 missing relations. In six out of these 13cases, we could automatically extract the missing relationfrom other requirements statements in our random sample.From the column chart provided for relevance in Fig. 10(a),we see that we have a total of 76 (71+5) relevant and mayberelevant relations that are automatically extracted. Thismeans that our approach automatically retrieved 76/(76 +7) ≈ 92% of the relevant relations.

80% 90% 100%Figure 11: % of relevantrelations retrieved.

Using bootstrapping, sim-ilar to that done for Q1and Q2 in Fig. 10(b), weobtain the percentage dis-tribution of Fig. 11 for the retrieved relevant relations. From

Page 10: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

Table 8: Reasons for inaccuracies and non-relevance.

4 Simulator Storagesupport generation of

Simulator Error Messagesupport generation of

“The simulator shall support the generation of error messages and their storage to the database.”Non-

existent relation detected

18

“The simulator shall support the simulation of ground stations including Ground-Station A and Ground-Station B.”

Ground-Station A Ground StationGround-Station B

Ground-Station A Ground StationGround-Station B

4Wrong relation

type5

CountReason# Example

3

1

2

“The system operator shall update the status of the network connection in the database.”

Status Network Connection

- statusNetwork Connection

“The simulator shall support the generation of error messages and their storage to the database.”

Simulator Messagesupport generation of

Simulator Error Message

support generation of

“The simulator shall connect only to those networks for which IP addresses have been specified.”

IP Address Networkspecifiedfor1 1

IP Address Networkspecifiedfor* 1

Wrong relation

type

Imprecise label for

relation or concept

Wrong cardinality

32

29

4

6 5Future contingency Internal Repository Repository

Expert Feedback: All repositories are currently internal. This may change, in which case this relation will be relevant.

“The simulator shall maintain an internal repository with system variables and their values.”

8

9

7

Obvious / Common

knowledge Value System Variable

“The simulator shall maintain an internal repository with system variables and their values.”6

Incomplete constraint

“The simulator shall send log messages to the database via the monitoring interface.”

Simulator Log Messagesend

Simulator Databasesendlog message to

21

88Relation too detailed

“The simulator shall send log messages to the database via the monitoring interface.”

Simulator Databasesendlog message to

Simulator Monitoring Interface

sendlog message to

database via

Par

tially

Cor

rect

In

corr

ect

May

be

Rele

vant

Not

Rel

evan

t

DesiredExtracted Legend

Q1

Q2✪

The “desired” relation is indeed also extracted by our model extractor via a different rule. The goal here is to illustrate what the expert considered to be not relevant.

this distribution, we obtain a 95% confidence interval of82%–100% for the percentage of relevant relations that areautomatically extracted by our approach.

RQ3. The execution times for our model extraction ap-proach are in the order of minutes over our case study doc-uments (maximum of ≈4 min for Case B). Given the smallexecution times observed, we expect our approach to scale tolarger requirements documents. Execution times were mea-sured on a laptop with a 2.3 GHz CPU and 8GB of memory.

5.3 Limitations and Validity ConsiderationsInternal, construct, and external validity are the validity

factors most pertinent to our empirical evaluation. Withregard to internal validity, we note that our interview con-sidered the correctness and relevance of extracted relationsonly in the context of individual requirements statements.We did not present to the expert the entire extracted modelduring the interview. This raises the possibility that the

expert might have made different decisions, e.g., regardingthe level of abstraction of the domain model, had he beenpresented with the entire extracted model. We chose tobase our evaluation on individual requirements statements,because using the entire extracted model would have intro-duced confounding factors, primarily due to layout and in-formation overload issues. Addressing these issues, whileimportant, is outside the scope of our current evaluation,whose primary goal was to develop insights about the effec-tiveness of NLP for domain model extraction. To ascertainthe quality of the feedback obtained from the expert, wecovered a reasonably large number of requirements (50 re-quirements, representing nearly a third of Case A) in ourinterview, and cross-checked the expert’s responses for con-sistency based on the similarities and analogies that existedbetween the different relations examined.

With regard to construct validity, we note that our eval-uation did not include metrics for measuring the amount oftacit expert knowledge which is necessary for building a do-main model, but which is absent from the textual contentof the requirements. This limitation does not pose a threatto construct validity, but is important to point out in orderto clarify the scope of our current evaluation. Building in-sights about the amount of tacit information that needs tobe manually added to the domain model and is inherentlyimpossible to obtain automatically requires further studies.

Finally, with regard to external validity, while our evalu-ation was performed in a representative industrial setting,additional case studies will be essential in the future.

6. CONCLUSIONWe presented an automated approach based on Natural

Language Processing for extracting domain models from un-restricted requirements. The main technical contribution ofour approach is in extending the existing set of model ex-traction rules. We provided an evaluation of our approach,contributing insights to the as yet limited knowledge aboutthe effectiveness of model extraction in industrial settings.

A key finding from our evaluation is that a sizable fractionof automatically-extracted relations are not relevant to thedomain model, although most of these relations are mean-ingful. Improving relevance is a challenge that needs to betackled in future work. In particular, additional studies arenecessary to examine whether our observations about rele-vance are replicable. If so, technical improvements need tobe made for increasing relevance. To this end, a key factorto consider is that what is relevant and what is not ulti-mately depends on the context, e.g., what is the intendedlevel of abstraction, and on the working assumptions, e.g.,what is considered to be in the scope of a system and whatis not. This information is often tacit and not automaticallyinferable. Increasing relevance therefore requires a human-in-the-loop strategy, enabling experts to explicate their tacitknowledge. We believe that such a strategy would work bestif it is incremental, meaning that the experts can providetheir input in a series of steps and in tandem with review-ing the automatically-extracted results. In this way, oncea piece of tacit knowledge has been made explicit, it canbe used not only for resolving incompleteness in the domainmodel but also for guiding, e.g., through machine learning,the future actions of the model extractor.

Acknowledgment. We gratefully acknowledge funding fromSES and FNR under grants FNR/P10/03 and FNR-6911386.

Page 11: Extracting Domain Models from Natural-Language ... · Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh,

7. REFERENCES[1] R. J. Abbott. Program design by informal English

descriptions. Communications of the ACM, 26(11),1983.

[2] A. Akbik and J. Broß. Wanderlust: Extractingsemantic relations from natural language text usingdependency grammar patterns. In Workshop onSemantic Search at the 18th International World WideWeb Conference (WWW’09), 2009.

[3] S. Ambler. The Object Primer: Agile Model-DrivenDevelopment with UML 2.0. Cambridge UniversityPress, 2004.

[4] V. Ambriola and V. Gervasi. On the systematicanalysis of natural language requirements withCIRCE. Automated Software Engineering, 13(1), 2006.

[5] C. Arora, M. Sabetzadeh, L. Briand, and F. Zimmer.Automated checking of conformance to requirementstemplates using natural language processing. IEEETransactions on Software Engineering, 41(10), 2015.

[6] G. Attardi and F. Dell‘Orletta. Chunking anddependency parsing. In Workshop on Partial Parsing:Between Chunking and Deep Parsing at 6thInternational Conference on Language Resources andEvaluation (LREC’08), 2008.

[7] W. Ben Abdessalem Karaa, Z. Ben Azzouz, A. Singh,N. Dey, A. S Ashour, and H. Ben Ghazala. Automaticbuilder of class diagram (ABCD): an application ofUML generation from functional requirements.Software: Practice and Experience, 2015.

[8] P. P. Chen. English sentence structure andentity-relationship diagrams. Information Sciences,29(2), 1983.

[9] M. C. De Marneffe and C. D. Manning. Stanfordtyped dependencies manual. Technical report,Stanford University, 2008.

[10] D. K. Deeptimahanti and R. Sanyal. Semi-automaticgeneration of UML models from natural languagerequirements. In 4th India Software EngineeringConference (ISEC’11), 2011.

[11] B. Efron and R. J. Tibshirani. An introduction to thebootstrap. CRC press, 1994.

[12] M. Elbendak, P. Vickers, and N. Rossiter. Parsed usecase descriptions as a basis for object-oriented classmodel generation. Journal of Systems and Software,84(7), 2011.

[13] A. Fader, S. Soderland, and O. Etzioni. Identifyingrelations for open information extraction. InConference on Empirical Methods in Natural LanguageProcessing, 2011.

[14] GATE NLP Workbench. http://gate.ac.uk/.

[15] H. Harmain and R. Gaizauskas. CM-Builder: Anatural language-based CASE tool for object-orientedanalysis. Automated Software Engineering, 10(2), 2003.

[16] J. Holt, S. Perry, and M. Brownsword. Model-BasedRequirements Engineering. IET, 2011.

[17] M. Ibrahim and R. Ahmad. Class diagram extractionfrom textual requirements using natural languageprocessing (NLP) techniques. In 2nd International

Conference on Computer Research and Development(ICCRD’10), 2010.

[18] N. Indurkhya and F. J. Damerau. Handbook of naturallanguage processing. CRC Press, 2010.

[19] C. Larman. Applying UML and Patterns. PrenticeHall, 2004.

[20] D. Liu, K. Subramaniam, A. Eberlein, and B. H. Far.Natural language requirements analysis and classmodel generation using UCDA. In Innovations inApplied Artificial Intelligence. Springer, 2004.

[21] D. Liu, K. Subramaniam, B. H. Far, and A. Eberlein.Automating transition from use-cases to class model.In Canadian Conference on Electrical and ComputerEngineering (CCECE’03), 2003.

[22] C. D. Manning and H. Schutze. Foundations ofstatistical natural language processing. MIT press,1999.

[23] M. Marneffe, B. Maccartney, and C. Manning.Generating typed dependency parses from phrasestructure parses. In 5th International Conference onLanguage Resources and Evaluation (LREC’06), 2006.

[24] L. Mich. NL-OOPS: from natural language to objectoriented requirements using the natural languageprocessing system LOLITA. Natural languageengineering, 2(02), 1996.

[25] K. Pohl and C. Rupp. Requirements EngineeringFundamentals. Rocky Nook, 2011.

[26] D. Popescu, S. Rugaber, N. Medvidovic, and D. M.Berry. Innovations for Requirement Analysis. FromStakeholders’ Needs to Formal Designs, chapterReducing Ambiguities in Requirements SpecificationsVia Automatically Created Object-Oriented Models.Springer, 2008.

[27] K. Schneider. Experience and Knowledge Managementin Software Engineering, chapter StructuringKnowledge for Reuse. Springer, 2009.

[28] N. A. Smith. Linguistic Structure Prediction.Synthesis Lectures on Human Language Technologies.Morgan and Claypool, 2011.

[29] V. B. Vidya Sagar and S. Abirami. Conceptualmodeling of natural language functional requirements.Journal of System and Software, 88, 2014.

[30] Z. Yang, H. Lin, and Y. Li. BioPPISVMExtractor: Aprotein–protein interaction extractor for biomedicalliterature using SVM and rich feature sets. Journal ofbiomedical informatics, 43(1), 2010.

[31] T. Yue, L. Briand, and Y. Labiche. A systematicreview of transformation approaches between userrequirements and analysis models. RequirementsEngineering, 16(2), 2011.

[32] T. Yue, L. C. Briand, and Y. Labiche. aToucan: Anautomated framework to derive UML analysis modelsfrom use case models. ACM Transactions on SoftwareEngineering and Methodology, 24(3), 2015.

[33] M. Zhu, Y. Zhang, W. Chen, M. Zhang, and J. Zhu.Fast and accurate shift-reduce constituent parsing. In51st Annual Meeting of the Association forComputational Linguistics (ACL’13), 2013.