software engineering in robotics systems evaluation and benchmarking henrik i. christensen –...

Software Engineering in RoboticsSystems Evaluation and Benchmarking

Henrik I. Christensen – [email protected]

mailto:[email protected]

Outline Introduction Design of a system model Definition of performance / metrics Systems evaluation Tools to assist in testing Summary

Introduction It is well known that systems verification is hard What are ways through which we can ease the task Structured testing is essential and not a post-hoc

task Unit testing System testing Systems integration tests Systems delivery tests Acceptance/Reference based benchmarking

Testing should be considered from the start Can you do early verification in simulation? How can each module be tested independently? Is it possible to generate reference data for verification

Unit testing Testing of the smallest possible unit to verify

function In procedural programming this could a

function Define “contract” for units and embed code to

ensure performance and Verification of code after re-factoring

Visual Studio has special classes/NameSpace for unit testing

Small Example interface Adder {

int add(int a, int b);}

class AdderImpl : Adder {int Adder.add(int a, int b) {

return a + b;}

}

public class TestAdder { public void testSum() {

Adder adder = new AdderImpl(); Assert(adder.add(1, 1) == 2); Assert(adder.add(1, 2) == 3); Assert(adder.add(2, 2) == 4);

} }

nUnit There are a number of unit testing tools available.

jUnit is used for java and nUnit is used for .NET frameworks

Example based on .NET/C# using NUnit.Framework;

[TestFixture] public class ExampleTestOfNUnit {

[Test] public void TestMultiplication() { Assert.AreEqual(4, 2*2, "Multiplication"); // Equivalently, since version 2.4 NUnit offers a new and // more intuitive assertion syntax based on// constraint objects

// [http://www.nunit.org/index.php?p=constraintModel&r=2.4.7]: Assert.That(2*2, Is.EqualTo(4), "Multiplication constraint-based"); }

}

Integration Testing Integration testing is about evaluation of

interfaces to make sure the integrated system has the desired functionality

Several different models Big-Bang Testing – Putting it together and se what

happens? Use Case modelling

Verify the different use cases for module interaction What interaction / interfaces generates what actions? Consider full coverage of module/class interaction

states

System Testing This is a problem that is widely studied to

ensure that systems to be delivered to a “customer” satisfies a broad set of requirements

IEEE has a standard 829 that exclusively focuses on Software Testing Initial version 1998 and a revision by 2008

Tests to consider GUI software testing

Usability testing

Performance testing

Compatibility testing

Error handling testing

Load testing

Volume testing

Stress testing

Security testing

Scalability testing

Sanity testing

Smoke testing

Exploratory testing

Ad hoc testing

Regression testing

Reliability testing

Installation testing

Maintenance testing

Recovery testing and failover testing.

Accessibility testing,

In testing think about engagement What are the critical components? What are support components? Focus resources correspondingly

[www.implementingscrum.com]

Simulation Given the easy of moving between simulation

and real systems. Consider use of simulation for early testing Use abstract interfaces to easy use of models

Differential Drive Robots Web Cams Range Scanners Odometry …

The risk is minimal and allows early verification and minimum cost

Simulation

Simulation systems RDS Simulation Engine w. PhysX USARSim w. UnrealTournament ROS Gazebo

Derived for the Player Stage Gazebo project GraspIt for grasp simulation There are also a number of commercial

systems KUKASim V-REP – Virtual Reality Experimentation …

Think about how you can design a system What is a good underlying model for your

system? Can you provide a performance model? What are good parametric tests to verify

performance?

16

System / Project Objectives

Hypothesis formulation Construction of a system Verification of work Reporting

17

Research hypothesis

Definition of a well formed objective that can be subjected to testing using standard scientific methods Optimality, existence, … Examples

“Integration of behaviours using multi-objective decision making is Pareto optimal”

“Integration of multiple cues improves robustness”

18

Research Approach

Inspired by Marr (1981)1) Formulation of theory for problem

Definition of mathematical basis

2) Formulation of algorithm for theoryDesign of algorithm & data structures using standard

methods (space/time efficiency)

3) Implementation of algorithmTransfer to computational platform (data types, ….)

19

Verification Benchmarking of systems

Use of standard data sets or definition of reference data/scenario

Using standard methods for hypothesis testing, i.e. 2 test of statistics or similar

Empirical testing using real-world data

20

Example: Estimation of structure

Estimation of the size of a junction of an object

Hypothesis “The size of a junction can

be estimated without any calibration of the camera and through use of qualitative control”

21

Observation

Line length is unstable (end-points uncertain)

Orientation a line property not a point property

Use of orientation preferable

22

Theory Fixated camera

)2cos()(sin)cos()1)((cos

)sin()cos(2)tan(

2sin

2cos*)cos(

)tan(

2212

1

23

Structure of apparent angle

24

Implementation Camera system to look at objects Regular experimental C code with simple

image processing Not quality software by any measure

25

Algorithm - qualitative control

26

Images

27

Evaluation Experiments carried out on >100 test objects About 40000 images processed Accuracy of estimation ~1 deg Hypothesis verified using theory and empirical

tests

28

Benchmarking: 10 Major Objections Evaluation is task dependent The module is part of a ‘system’ Vision/robotics is too complex The models/assumptions used are wrong or

incorrect Metrics are not comparable Theory is not available for many well-known

method There are too many parameters Ground truth is expensive to obtain Simulations cannot replace real

experiments Benchmarking is not acknowledged

29

Cultural differences Cowboy research

It ‘works’, why bother with a theoretical analysis? Puritan research

The proof is in the theory! Scientific research

Validation across laboratories/researchers

Benchmarking Example The virtual manufacturing challenge

www.cma-competition.com AGV Navigation Mixed Palletizing

Datasets Use of reference factory layout for navigation and

control Use of reference order data from bottling plant and

distribution centers The challenge: Can you beat the industry standards?

Executed each year at ICRA with competition across the world

Virtual Manufacturing Challenge

Consider use of reference data sets There are a number of reference datasets out there

Comparative performance Navigation

Radish: Robot Data Repository http://radish.sourceforge.net/

University of Freiburg Repository http://kaspar.informatik.uni-freiburg.de/~slamEvaluation/

datasets.php Amsterdam navigation dataset (Annotated with Ground Truth)

http://www2.science.uva.nl/sites/cogniron

Computer Vision Caltech 101 – Object Recognition LabelMe – A dataset from MIT CSAIL Indoor Scene Recognition Database – CSAIL Scene Categorization Dataset --

http://categorizingplaces.com/dataset.html

Summary Testing should be pervasive not and

afterthought Think about testing from units to systems What are good models for your system? Use Cases for Interaction between modules Well defined module interfaces / performance

metrics Can you characterize performance

quantitatively? Consider using SW tools from support to

simulators Consider use of golden standards datasets Testing requires a serious amount of resources

Acknowledgement This series of lectures has been developed

with generous support from the Microsoft Corporation as part of the project “Software Engineering in Robotics” Contract # 113873. The support is gratefully acknowledged.

software engineering in robotics systems evaluation and benchmarking henrik i. christensen –...

Documents

adder adder

int adder

failover testing

accessibility testing

taskstructured testing

int addint

systems verification

public class testadder