using natural language program analysis to locate and understand action-oriented concerns david...

27
Using Natural Language Program Analysis to Locate and understand Action- Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K. Vijay-Shanker Presented By: Paul Heintzelman

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Using Natural Language Program Analysis to Locate and understand

Action-Oriented Concerns

David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K. Vijay-Shanker

Presented By: Paul Heintzelman

Page 2: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Global Concepts

• Concept assignment problem

• Hybrid of structural and natural language information

• Concern Comprehension

• Action-oriented relations between identifiers – Represented by Action-oriented identifier

graph model (AOIG)

Page 3: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Why Action-Oriented Concerns

• In OOP– Code is organized by objects

• Objects are nouns

• Objects and actions conflict– Code organized by objects causes actions to

be scattered

• Therefore in OOP action-oriented concerns tend to be scattered and more difficult to locate

Page 4: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Paper Contributions

• AOIG– Interactive query expansion algorithm– A result graph construction algorithm– An Eclipse plug-in

• Evaluation– Comparison of search effectiveness of tools – Per task analysis– Comparison of user effort

Page 5: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

AOIG

Page 6: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

State of the Art

• Search-based approaches– Lexical searches

• Lead to over-generalized searches – Information retrieval

• Does not separate verbs and objects• Uses word frequency

• Program navigation– Uses structural information e.g. call, inheritance

graphs...– Accurate but difficult to seed

• Dynamic approaches– Requires test case to enact concept

Page 7: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Challenges

• Map high level concepts to queries– Aid user in mapping concepts

• Inability to search with high precision and recall– Search NLP representation of concern

• Understanding large result sets– Return results in an explorable graph

Page 8: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Overview of Approach

• User formulates a query– Query must include verb-direct object pairings

• User expands query– Recommendations based on query words and

source code

• Searches the AOIG– Interact with result graph

Page 9: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Independent Variables

• Search Tools– Find-Concept– ELex built in Eclipse search– GES Google Eclipse search (modified)

• Search Tasks– Application concept pairing

• Human Subjects– 13 professional programmers– 5 grad students

Page 10: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

• Applications– 4 large open source java projects

• 9 concepts taken from bug reports

– 1 training application• 2 concepts

Application Concept Pairing

Page 11: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Forming the Initial Query

• User generates abstract initial query– e.g. “automatically finish the word”

• User decomposes abstract query into verb-direct object pairs– e.g. “finish” and “word”

• Find concept maintains both verb query and direct object query

• Initial query expansion– User is presented with alternative forms of words in

both queries

Page 12: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Query Expansion

• Iterative steps– Generate recommended list

• Similar semantics is weighted more heavily than similar use• 10 ranked recommendations

– User examines recommendations• User selects words to add to queries• User can view a list of methods fitting the current queries

• Stop when user is satisfied– Augment user query with get, set, execute, construct

• Use AOIG to map verb-direct object pairs to source code– Generate result graph

Page 13: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Word Recommendation

• Similar semantics– Stemming

• Recommends different forms of words in either list• e.g. If “finish” is in verb-query, “finished” will be recommended

– Synonyms • Recommend a word if synonym exists in either list• e.g. Recommend “complete” if “finish” is in list

• Similar use– Recommend words that occur near words in either query– e.g. Recommend “word” if “complete” is in the verb query and

“complete word” is in the AOIG

Page 14: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Evolution of a Query

Page 15: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Result Graph

Page 16: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Find-Concept Process

Page 17: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Research Questions

• Which search tool is most effective at locating concerns by forming and executing a query?

• Which search tool requires the least amount of human effort to form an effective query?

Page 18: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Evaluation

• Effectiveness– Use the harmonic mean of precision and

recall (f-measure)• (2 * precision * recall)/(precision + recall)

– Result set is compared to evaluation set• Evaluation set is 90% generated by a member

unfamiliar with the work of this paper

• Effort– Measured amount of time required to form

each query

Page 19: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Experimental Setup

• Training– Subjects are guided through the use of each

tool on the two training tasks

• Task setup– Users are presented concepts in a visual form– Users confirm that they understood each task

Page 20: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Experimental Procedure

• 9 tasks• 18 programmers• 6 groups• 6 of every task tool combination

Page 21: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Results

• Find-Concept vs. ELex– Consistently outperformed ELex

• Find-Concept vs. GES– Outperformed GES on 4 tasks – Outperformed by GES on 2 tasks

• AOIG to blame?

– Performed equally to GES on 3 tasks

Page 22: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Effectiveness

Page 23: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Effort

• Human Effort was very similar with all tools

Page 24: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Threats to Validity

• The selected tasks favored one tool– Concerns selected from bug reports

• Evaluation sets created for evaluation– 90% generated by member unfamiliar with work

• Results may not generalize to all Java applications– Tested on reasonably-sized applications

• Results may not generalize to all types of concepts

Page 25: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Conclusion

• Interactive query expansion algorithm

• Graph construction algorithm

• Find-Concept performs well against state of the art tools

• All evaluated tools required similar human effort

Page 26: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Future Work

• Create a more effective AOIGBuilder

• Evaluate the effect of application’s quality and size on results

• Evaluate the effect of incorporating naming conventions

• Perform a study on how many tasks focus on actions

• Automate query expansion

Page 27: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and

Additional threats to validity

• Effort and Effectiveness are not really independent

• Relies heavily on unjustified heuristic– Augmenting query

• Search tools are often used in conjunction with structural tools