extracting domain models from natural-language requirements: approach and industrial evaluation

29
.lu software verification & validation V V S Extracting Domain Models from Natural-Language Requirements: Approach and Industrial Evaluation Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer Interdisciplinary Centre for Security, Reliability and Trust Luxembourg October 6, 2016

Upload: lionel-briand

Post on 17-Feb-2017

73 views

Category:

Software


0 download

TRANSCRIPT

.lusoftware verification & validationVVSExtracting Domain Models from

Natural-Language Requirements: Approach and Industrial Evaluation

Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer

Interdisciplinary Centre for Security, Reliability and Trust Luxembourg

October 6, 2016

Satellite

SatelliteGround Station

SatelliteS&T Station

Feeder Link Ground Station

Data Ground Station

transfers user requests to1 1

SatelliteControl Centre

- location

Domain ModelsA domain model is a visual representation of conceptual

entities or real-world objects in a domain of interest.

2

ConceptsAggregations

AssociationsGeneralizations

Attributes

Context

3

Requirements Analysts

NL Requirements Document

Class A

Class B

Class C

Class D

1 *

Relation

Domain Model

Specify Requirements

Ideal

Build Domain Model

Context

4

Requirements Analysts

NL Requirements Document

Class A

Class B

Class C

Class D

1 *

Relation

Domain Model

Build Domain Model

Practice

Specify Requirements

Problem Definition

• Manually building domain models is laborious

• Automated support is required for building domain models

5

State of the Art

• Multiple approaches exist for extracting domain models or similar variants from requirements using extraction rules

• Majority assume specific structure, e.g., restricted NL

• Extraction of direct relations (mostly) at the level of words

• Little empirical insights on real requirements

6

Existing Extraction Rules

Concepts

The simulator shall maintain the scheduled sessions, the active session and also the list of sessions that

have been already handled.

8

• All subjects in the requirements are concepts

• All recurring nouns in the requirements are concepts

Noun PhrasesFiltering

Associations

The simulator shall maintain the scheduled sessions, the active session and also the list of sessions that

have been already handled.

9

Simulator Scheduled Session

maintain1 *

Direct Relations

All transitive verbs are associations.

Aggregations

The library contains books.

10

Library Book

Specific Patterns

Requirements with patterns,“contains”, “made up of”, “include”, […]

Attributes

Book’s title

11

• Genitive cases, NP’s NP, NP of NP

Book

-title

Aggregations???

System Component

system’s component

12

Are they even useful?

Our Contributions

Approach

14

Process Requirements

Statements

Lift Dependencies to Semantic Units

Construct Domain Model

NL Requirements

Phrasal Structure

Dependencies Phrase-level Dependencies

Class A

Class B

Class C

Class D

1 *

Relation

Domain Model

Extraction Rules

Grammatical Dependencies

15

The system operator shall initialize the simulator configuration.

nsubj dobj

Operator Configurationinitalize

Lift Dependencies to Semantic Units

16

The system operator shall initialize the simulator configuration.

nsubj dobj

Operator Configurationinitalize

System Operator

Simulator Configuration

initalize

nsubj dobj

Approach

17

Process Requirements

Statements

Lift Dependencies to Semantic Units

Construct Domain Model

NL Requirements

Phrasal Structure

Dependencies Phrase-level Dependencies

Class A

Class B

Class C

Class D

1 *

Relation

Domain Model

Extraction Rules

System Operator ????able

New Rule - N1

The system operator shall be able to initialize the simulator configuration, and to edit the existing configuration.

18

SystemOperator

Simulator Configuration

intialize

SystemOperator

Existing Configuration

edit

Link Paths

19

The simulator shall send log messages to the database via the monitoring interface.

Simulator Log Messagesend

Simulator Databasesend log message to

Simulator Monitoring Interface

send log message to database via

IR-Domain

20

RQ1: How frequently are different extraction rules triggered?

21

3 cas

e studies

1 cas

e study

380 Requirements158 Requirements138 Requirements

110 Requirements

RQ1: How frequently are different extraction rules triggered?

• Pattern based rules were never/seldom triggered

• Generic rules triggered most often, e.g., transitive verbs, genitive cases, and all new rules (including link paths)

22

RQ2: How useful is our approach?

23

1 cas

e study

50 Requirements213 Relations

• Interview Survey

• Correctness and Relevance of each relation

• Missing relations in each requirement

Correctness (%) - 90% (avg.)

Correctness

Existing Rules New Rules

Spec

ific P

atte

rn

E1 E2 E3 E4 E5 E6 N1 N2 N3 LP

Observed reasons for Incorrectness

25

• NLP Mistakes

• Wrong Relation Extracted

Relevance (%) - 36% (avg.)

Relevance

Spec

ific P

atte

rn

Spec

ific P

atte

rn

Existing Rules New RulesE1 E2 E3 E4 E5 E6 N1 N2 N3 LP

Observed Reasons for Irrelevance

TCP/IP

Protocol

27

SNMP

Common Knowledge

Missing Relations

8% Relations Missed

92% Relevant Relation Extracted

28

Conclusion

• Our extensions are of practical significance for domain model extraction

• Empirical evaluation (in industrial settings) - provides insights into the usefulness of existing rules

• An important observation about the automated model extractors — relevance

• Future work - Look into ways to improve relevance

29