knowledge extraction from technical documents knowledge extraction from technical documents *with...
TRANSCRIPT
![Page 1: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/1.jpg)
© Generative Software Technologies Corp. 1
Knowledge Extraction from Technical Documents
*With first class-support for Feature Modeling
Rehan Rauf, Michal Antkiewicz, and Krzysztof Czarnecki
Generative Software Technologies Corp. Waterloo, Canada
http://gensoftech.com
![Page 2: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/2.jpg)
© Generative Software Technologies Corp. 2
The Idea
![Page 3: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/3.jpg)
© Generative Software Technologies Corp. 3
Specification Documents
Spec DocHeadingtext text text text text text text- text text text text text text - text text text text text text text text text text text text text text text text text text
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Text Text Text Text Text Text
text text Text Text text text
text text text text text text
Section
Table
Paragraph
Physical structures
Functional Reqs
Business Rules
Use Case
Logical structures(specification elements)
![Page 4: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/4.jpg)
© Generative Software Technologies Corp. 4
Recognize and extract specification elements
based on physical document
structure
![Page 5: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/5.jpg)
© Generative Software Technologies Corp. 5
ET – Extraction Toolsearches for template instances
Spec Doctext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
Text Text Text
text text text
text text text
text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
Text Text Text
text Text text
text text
UC Template UC 1
UC 2
![Page 6: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/6.jpg)
© Generative Software Technologies Corp. 9
Precondition:Documents have been authored with some
template in mind
![Page 7: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/7.jpg)
© Generative Software Technologies Corp. 10
Application scenarios
![Page 8: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/8.jpg)
© Generative Software Technologies Corp. 11
Import to Requirements Mgmt Tools
Spec DocHeadingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Text Text Text Text Text Text
text text Text Text text text
text text text text text text
DoorsHP Quality CenterRequisite Pro…
Functional Reqs
Business Rules
Use Case
Functional Reqs
Business Rules
Use Case
ET
![Page 9: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/9.jpg)
© Generative Software Technologies Corp. 12
Spec DocHeadingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
QT
Spec DocHeadingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Structured Query
Text Text Text Text Text Text
text text Text Text text text
text text text text text text
All use cases with actor = ‘customer’
Use Case
Spec DocHeadingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Functional Reqs
Use CaseUse Case
Business Rules
![Page 10: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/10.jpg)
© Generative Software Technologies Corp. 13
Spec Doc
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Headingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
Text Text Text Text Text Text
text text Text Text text text
text text text text text text
Spec DocHeadingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Tracing
Business Rules
Use Case
Use Case
![Page 11: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/11.jpg)
© Generative Software Technologies Corp. 14
Spec Doc
text text text text text text text text text text text text texttext text text text text text text text text text text text text text text text text text text text text
Headingtext text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text text
Text Text Text Text Text Text
text text Text Text text text
text text text text text text
Template Conformance Checking
Use Case
Use Case
![Page 12: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/12.jpg)
© Generative Software Technologies Corp. 15
Main Challenge:Logical and Physical
Variation
![Page 13: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/13.jpg)
© Generative Software Technologies Corp. 16
Challenge – Variation
Instances of Use Case
![Page 14: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/14.jpg)
© Generative Software Technologies Corp. 17
Challenge – Variation
Instances of Use Case Logical components Component Identifiers
![Page 15: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/15.jpg)
© Generative Software Technologies Corp. 18
Challenge – Variation
Instances of Use Case Logical components Component Identifiers
![Page 16: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/16.jpg)
© Generative Software Technologies Corp. 19
Variation Types
Designed Accidental
Logical
Physical
![Page 17: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/17.jpg)
© Generative Software Technologies Corp. 20
Designed Logical Variation
Optional component
![Page 18: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/18.jpg)
© Generative Software Technologies Corp. 21
Designed Logical Alternatives
Deeper decomposition
Different methodologies lead to logical variation
![Page 19: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/19.jpg)
© Generative Software Technologies Corp. 22
Designed Physical Variation
Different formatting
![Page 20: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/20.jpg)
© Generative Software Technologies Corp. 23
Accidental Variation
LogicalMissing components, e.g., actor
PhysicalSpelling mistakes, e.g., “Actar”Style inconsistency, e.g., italics instead of bold
![Page 21: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/21.jpg)
© Generative Software Technologies Corp. 24
Solution
![Page 22: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/22.jpg)
© Generative Software Technologies Corp. 25
ET – Extraction Tool
Docs PSE
Physical componentsSections, lists, table cells
LSE
UC Template
Logical componentsActor, flow, extensions
Accidental variationvia match threshold
Designed variation
via template
![Page 23: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/23.jpg)
© Generative Software Technologies Corp. 26
UC Template
Metamodel
UC
Name : String Flow
Action : String
*
1 1
SectionHeading
List
Paragraph
Mapping
![Page 24: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/24.jpg)
© Generative Software Technologies Corp. 27
Example Template
![Page 25: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/25.jpg)
© Generative Software Technologies Corp. 28
Logical Structure
![Page 26: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/26.jpg)
© Generative Software Technologies Corp. 29
Mapping
![Page 27: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/27.jpg)
© Generative Software Technologies Corp. 30
Regular Expressions
![Page 28: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/28.jpg)
© Generative Software Technologies Corp. 31
Lists
![Page 29: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/29.jpg)
© Generative Software Technologies Corp. 32
Component Nesting
![Page 30: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/30.jpg)
© Generative Software Technologies Corp. 33
Optional Components
![Page 31: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/31.jpg)
© Generative Software Technologies Corp. 34
Physical Alternatives
![Page 32: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/32.jpg)
© Generative Software Technologies Corp. 35
Templates with Tables
![Page 33: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/33.jpg)
© Generative Software Technologies Corp. 36
Logical Alternatives
![Page 34: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/34.jpg)
© Generative Software Technologies Corp. 37
ET – Extraction Tool
Docs PSE
Physical components
Basic: Paragraph, cell, graphic
Composite: Sections, lists, tables, …
LSE
UC Template
Logical componentsActor, flow, extensions
![Page 35: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/35.jpg)
© Generative Software Technologies Corp. 38
Physical Structure Extraction
Docs PSE
Physical components
Basic: Paragraph, cell, graphic
Composite: Sections, lists, tables, …
LSE
UC Template
Logical componentsActor, flow, extensions
Only part dependent on
document-format
![Page 36: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/36.jpg)
© Generative Software Technologies Corp. 39
Performance
![Page 37: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/37.jpg)
© Generative Software Technologies Corp. 40
Can we extract logical structures from real-world documents?
![Page 38: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/38.jpg)
© Generative Software Technologies Corp. 41
Document Set
43 documents24 from 3 companies11 from public sources6 student projects2,000 to 23,000 words
ContentUse CasesData ObjectsBusiness RulesFunctional ReqsNon-Functional Reqs…
Docs
![Page 39: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/39.jpg)
© Generative Software Technologies Corp. 42
ET2) Verify extraction
Template Development
UC1
UC Template
UC Template
1) Write template manually
UC2
??
3) Refine template
![Page 40: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/40.jpg)
© Generative Software Technologies Corp. 43
Results
36 logical structuresUse cases, data objects, business rules, … Template sizes from 3 to 52 LOCTotal 942 instances
Nearly all instances perfectly recognized100% recall for 33 templates; over 80% for remaining 3100% precision for 35 templates; 87% for remaining 1
Error causesSevere formatting problems, e.g., manual line breaksForgotten ids
![Page 41: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/41.jpg)
© Generative Software Technologies Corp. 44
Other Questions
Amount & kind of template change in refinement 1% – 25% LOC affected during refinement81% changes concern optionality (add ‘?’ or component)
Amount of iterations1 instance (11 cases) to 50% of all instances (6 cases)
e.g., 10 out of 20 (2 cases); mostly simple edits, add `?’
ImplicationStart with few examples, then edit the template based on expert knowledge (e.g., add `?’)
![Page 42: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/42.jpg)
© Generative Software Technologies Corp. 45
Related Work
Import to Req Mgmt ToolsTools prescribe document structureManual markup for fine-grained extraction
Wrapper inductionMachine generated docs (web pages)Induced Regex not human readable (no modeling language)
Natural language processingCan benefit from structure-induced semantic tags
![Page 43: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/43.jpg)
© Generative Software Technologies Corp. 46
Future: Template by Example
UC1
UC Template
UC2
3) Refine template
1) Mark up sample document
UC Template
TE 2) Extract template
3) Verify extraction
ET
![Page 44: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/44.jpg)
© Generative Software Technologies Corp. 47
Summary
![Page 45: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/45.jpg)
© Generative Software Technologies Corp. 48
ET – Design
48
Functional Reqs
B. Rules
Use Case
B. Rules
Use Case
Use Case
PSE
Physical components
Spec Doc
Spec Doc
Spec Doc
UC Template
LSE
Logical components
Spec Doc
Spec Doc
Use CaseQT
Query
Functional Reqs
B. Rules
Use Case
ET
Import
Tracing
Conformance
Application scenarios Template development
Evaluation results
Nearly all instancesperfectly recognized
43 real-world documents
![Page 46: Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,](https://reader038.vdocuments.site/reader038/viewer/2022102923/5516bfe8550346f0208b5816/html5/thumbnails/46.jpg)
© Generative Software Technologies Corp. 49
Technology available athttp://gensoftech.com/IntelligentET