extracting domain models from natural-language requirements: approach and industrial evaluation
TRANSCRIPT
.lusoftware verification & validationVVSExtracting Domain Models from
Natural-Language Requirements: Approach and Industrial Evaluation
Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer
Interdisciplinary Centre for Security, Reliability and Trust Luxembourg
October 6, 2016
Satellite
SatelliteGround Station
SatelliteS&T Station
Feeder Link Ground Station
Data Ground Station
transfers user requests to1 1
SatelliteControl Centre
- location
Domain ModelsA domain model is a visual representation of conceptual
entities or real-world objects in a domain of interest.
2
ConceptsAggregations
AssociationsGeneralizations
Attributes
Context
3
Requirements Analysts
NL Requirements Document
Class A
Class B
Class C
Class D
1 *
Relation
Domain Model
Specify Requirements
Ideal
Build Domain Model
Context
4
Requirements Analysts
NL Requirements Document
Class A
Class B
Class C
Class D
1 *
Relation
Domain Model
Build Domain Model
Practice
Specify Requirements
Problem Definition
• Manually building domain models is laborious
• Automated support is required for building domain models
5
State of the Art
• Multiple approaches exist for extracting domain models or similar variants from requirements using extraction rules
• Majority assume specific structure, e.g., restricted NL
• Extraction of direct relations (mostly) at the level of words
• Little empirical insights on real requirements
6
Concepts
The simulator shall maintain the scheduled sessions, the active session and also the list of sessions that
have been already handled.
8
• All subjects in the requirements are concepts
• All recurring nouns in the requirements are concepts
Noun PhrasesFiltering
Associations
The simulator shall maintain the scheduled sessions, the active session and also the list of sessions that
have been already handled.
9
Simulator Scheduled Session
maintain1 *
Direct Relations
All transitive verbs are associations.
Aggregations
The library contains books.
10
Library Book
Specific Patterns
Requirements with patterns,“contains”, “made up of”, “include”, […]
Attributes
Book’s title
11
• Genitive cases, NP’s NP, NP of NP
Book
-title
Aggregations???
System Component
system’s component
Approach
14
Process Requirements
Statements
Lift Dependencies to Semantic Units
Construct Domain Model
NL Requirements
Phrasal Structure
Dependencies Phrase-level Dependencies
Class A
Class B
Class C
Class D
1 *
Relation
Domain Model
Extraction Rules
Grammatical Dependencies
15
The system operator shall initialize the simulator configuration.
nsubj dobj
Operator Configurationinitalize
Lift Dependencies to Semantic Units
16
The system operator shall initialize the simulator configuration.
nsubj dobj
Operator Configurationinitalize
System Operator
Simulator Configuration
initalize
nsubj dobj
Approach
17
Process Requirements
Statements
Lift Dependencies to Semantic Units
Construct Domain Model
NL Requirements
Phrasal Structure
Dependencies Phrase-level Dependencies
Class A
Class B
Class C
Class D
1 *
Relation
Domain Model
Extraction Rules
System Operator ????able
New Rule - N1
The system operator shall be able to initialize the simulator configuration, and to edit the existing configuration.
18
SystemOperator
Simulator Configuration
intialize
SystemOperator
Existing Configuration
edit
Link Paths
19
The simulator shall send log messages to the database via the monitoring interface.
Simulator Log Messagesend
Simulator Databasesend log message to
Simulator Monitoring Interface
send log message to database via
IR-Domain
RQ1: How frequently are different extraction rules triggered?
21
3 cas
e studies
1 cas
e study
380 Requirements158 Requirements138 Requirements
110 Requirements
RQ1: How frequently are different extraction rules triggered?
• Pattern based rules were never/seldom triggered
• Generic rules triggered most often, e.g., transitive verbs, genitive cases, and all new rules (including link paths)
22
RQ2: How useful is our approach?
23
1 cas
e study
50 Requirements213 Relations
• Interview Survey
• Correctness and Relevance of each relation
• Missing relations in each requirement
Correctness (%) - 90% (avg.)
Correctness
Existing Rules New Rules
Spec
ific P
atte
rn
E1 E2 E3 E4 E5 E6 N1 N2 N3 LP
Relevance (%) - 36% (avg.)
Relevance
Spec
ific P
atte
rn
Spec
ific P
atte
rn
Existing Rules New RulesE1 E2 E3 E4 E5 E6 N1 N2 N3 LP
Conclusion
• Our extensions are of practical significance for domain model extraction
• Empirical evaluation (in industrial settings) - provides insights into the usefulness of existing rules
• An important observation about the automated model extractors — relevance
• Future work - Look into ways to improve relevance
29