7. evalution of interactive system

HUMAN COMPUTER INTERACTION

(INTERACTION HOMME-MACHINE)

Sethserey SAM

1

CHAPTER 7:EVALUATION OF INTERACTIVE SYSTEM

2

Software life cycle concerning also user interaction

EVALUATION OF INTERACTION

3

Why evaluate? Intuition of designer of system does not sufficient Formal modelisation of system and interaction does not cover/include all the

choices of conceptions The recommendations (guidelines) are still safeguards and best practices

are too general to cover all the aspects of a specific interaction

Software life cycle concerning equally the interaction Spiral life cycle with prototyping Evaluation to all steps of development


4

How to evaluate? With users: experimentations

Without users: a priori


5

Paradigm of evaluation

o A priori / Heuristic evaluation A priori evaluation: review of expert, cognitive walkthrough, … Predictive model: Fitts’s Law, Keystroke Model, etc…

o Experimental evaluation Subjective evaluation Test the usability with potential users Test the acceptability with a sample population Post-commercialization (or test version) evaluation Cognitive experimentations

A PRIORI EVALUATION

6

Predictive model GOMS, Keystroke, Fitts … (cf. chapter VI)

Heuristic evaluation (Nielsen & Mack, 1994) Review the system by one or many experts: simulations the usage Validation of some heuristic ergonomics (cf. heuristic of Nielsen, chapter II) On screen specification and interaction specification (a priori evaluation), or

on existing system or prototype

Cognitive walkthrough Usability inspection method used to identify usability Focusing on how easy it is for new users to accomplish tasks with the

system

A PRIORI EVALUATION

7

Cognitive walkthrough

1. Specification of intended user and system to develop using screen flow (businesses flow)

2. Evaluation a priori by experts with presence of designer

3. The evaluation walk through the screen by simulating the realization of tasks follow the credible scenario. It evaluate If the action to realize happen to be evident to user If users can easily perceive the action to be realize is available If users can see the result of the action and they can interpret it correctly

4. Critical review of the evaluation with designer

5. Document of syntheses

(Nielsen & Mack, 1994)(Spencer, 2000)

EXPERIMENTAL EVALUATION

8

Laboratory usability Room equipped with all the equipments allowing to observe a user working

or interacts with the system Observers close to the subject, or hidden (in annex room) Record video, sound, log file Subject describing his experience in direct (think aloud or cooperative

evaluate with observers) or containing the realizing of task

Example: IBM (Boca Raton, Floride), Microsoft, Sun, …

Field studies Condition more ecologic compare to laboratory usability

Limitation Evaluate more often during first time of deployment: no track on learning

over the time Do not allow a large coverage of functionalities

SUBJECTIVE EVALUATION

9

Principle: opinion post-utilization

1. Session of utilization of the system with a subject following by a task or a

scenario clearly defined

2. Interrogation of the subjects to ask them for their opinion

Different techniques

Open or oriented interview (answer to the predefined questions)

Questionnaire: scale of values on specific point/issue


10

Open interview The subject address the points which haven’t remark or which may

haven’t yet taking attention by designer Lack of homogeneity of opinion, vary precision: difficult synthetic

analyze

Conducted interview Open or close question testing precise opinion Structuring evaluation: analyze facilities

Have you ever reserve a hotel online? □ yes □ no

This functionality does it seem interesting to you? □ yes □ no

Can you easily complete the hotel reservation? □ yes □ no

Does this take you much time? □ yes □ no


11

Semi-structuring interview (Nielsen et al., 1986)

Why do you do this ?

knowing the objective of user How do you do it ?

retrieving the sub-task to apply recursively the questions Why do you do this in the following manner?

knowing the choices of user What are the precondition for doing this?

evaluate if user understand the condition to start the action What are the result of doing this? Do errors ever occur when doing this? How do you discover and correct these errors?


12

Questionnaire Users, sometime, have difficulty to give the edge opinion Subjective evaluation in an interval/scale of multi-values, in a Lickert scale

or a scale of preference

Example QUIS (Chin et al.

1988) IBM Post-Study System Usability Questionnaire (Lewis

1995) Software Usability Measurement Inventory (Kirakowski et Corbett

1993)

Evaluate from 1 (poor) to 4 (excellent) your affirmation with following statement This functionality is interesting □ 1 □ 2 □ 3 □

4 It is easy to reserve with the system □ 1 □ 2 □ 3 □

4 The time of reservation is acceptable □ 1 □ 2 □ 3 □

4


13

QUIS (Questionnaire for User Interaction Satisfaction)

www.lap.umd.edu/QUIS/

Past experience on tested system Past experience on other systems General opinion of users on the system Display Terminology usage and information provided by system Learnability Paper documentation and online help Online documentation Multimedia Teleconference and collaborative work System installation


14

Subjective evaluative: what criteria of quality? Example: norm ISO 9241

Reliabilityadequateness of the task

scale of satisfaction

Adaptation to training user scale of satisfaction for advances functionalities

Learnability scale of perception of the facility of learning

Robustnesstolerant to errors

scale of satisfaction in management of errors

OBJECTIVE EVALUATION

15

Principles: observation post-utilization

1. Session of utilization of the system with a subject following by a task or a scenario clearly defined

2. Observation and/or recording of session and data examination

3. Data analyzes

Different approaches Qualitative evaluation Quantitative evaluation

EXAMINE THE OBSERVATION

16

Qualitative evaluation Search for the problem of the utilization the most flagrant: sample cases

Quantitative evaluation Calculation of metric (ex: % of errors …) using the observed data Analyze of videos Transcription and analyze of verbalization of the subjects Analyze the notes of the observer Examine the log of data: key presses count, examine data by using the log

file of the WWW server

OBJECTIVE EVALUATION: TEST OF USABILITY

17

Quantitative metrics characterizing the quality of interaction Example (Whiteside, Bennett and Holtzblatt

1988) Execution time of a task % of task completely executed Ratio session success /failure Number of errors Distribution of number of errors for different subjects Wasting time on the errors Number of commands used to accomplish the task Frequency use of Help and documentation % of positive / bad comments (thing aloud) Number of repetition of an error command Number of commands invoked, but not used Number of times the subject was distracted from the exact task Number of times the subject has lost control of the system Number of times the subject expresses frustration …


18

Usability testing of Nielsen (1993)

Effectiveness: verify that if the objectives set by users are achieved

Efficiency: Evaluation of resources used to achieve this objective (ex: time to complete a task)

Satisfaction: quantification the level of user satisfaction

Effectiveness: OK if 90% of users pass the test

Efficiency: OK if 90% of users take less than 3 minutes to accomplish a task

Satisfaction: OK if less than 10% of users expressed a problem of the function

Norm ISO 9241-11


19

Example norm ISO 9241-11

Reliabilityadequateness of the task

% of goals achieved Time to complete the task

Adaptation to training user Number of advance features used Relative efficiency with an Expert

Learnability % of functions learned after practice Time to learn a function

Robustnesstolerant to errors

% of corrected errors Time wasting on error recovery

OBJECTIVE EVALUATION: TEST OF ACCEPTATION

20

Principles The same principle as the usability test, but we fixe the metrics with

intervals of expected success (acceptability) Utilization more frequent with the final system: requirement specification

Metrics example Time (or number of sessions) used to learn a specific function Execute time of a task Error rate while realizing a task Proportion of subjects having pass with success during a given time Retention time of a learned command The result of the subjective evaluation …

After 5 hours of utilization by the novices and 15 days of waiting (learning), 50% of the population of the test must be capable to

accomplish 75% of tasks of the test correctly

OBJECTIVE EVALUATION: TEST OF ACCEPTATION

21

Define the tasks for the test Giving a list of tasks to be executed in a subject at the beginning of the

experimentation Well chosen the proposed tasks base on what we want to evaluate

The task take user to concentrate on the parts of interface where the evaluation is holding

Well calculate the dimension of time for each task (objective come from requirement analyze, compare with other existing software, …)

Estimate the necessary time in average and define an average proportion of exceed acceptability (cf. metrics): variety of inter-individual

Ensure that the statement of task is clearly enough for the comprehension of a novice or a primo-users

EVALUATION PLAN

22

An evaluation will not provide any result unless if it is well prepared

[Basili et al. 1994] What are the general objectives of the evaluation? What are the specific questions for which we want to obtain an answer? What is the paradigm and techniques of test which are necessary to

achieve these objectives? How to organize in practice an evaluation: users recruitment, users

preparation, collection tools/devices, … Ensure the respect of deontology rules in vigor How to examine, interpret and present collected information?

EVALUATION PLAN

23

What paradigm of evaluation to use?

Objective (observation)

Problems detectionsBroad range

Modify behavior

Subjective Less expensiveUsage opinion

PrecisionResponse rate

Predictive(model) Non necessary system

Less expensive

Limit range

Predictive(expert)

ExpertiseMay miss out the problems

EVALUATION PLAN

24

When do we use a particular paradigm of evaluation?

Field studies

Predicative

Laboratory usability

Quick and dirty

EVALUATION AND DIVERSITY OF USERS

25

Sampling the population Important for both objective and subjective evaluation Characterize the communities of intended users Sampling the population following different criteria by responding to this

characterization (men/women, expert/novice, familiar with computer usage, age, socio-professional category, …)

Sampling size: 5, 12, 20, 100? (Dumas & Reddish, 1999)

Remark: experimental studies / evaluation “quick and dirty”

Analyze the tests Multi-criteria analyze: distributes the results follow the different

characteristics Statistic pertinent of results Separate discipline: statistic (protocols and tests)


26

Analyze the statistic of result

1 2 3 4 5 6 7 Average

Age 37 41 43 54 46 44 21 40.9

Sex F F M M F F M 4F, 3M

Education level 4 2 4 4 4 1 2 3.0

PC years 5 2 0 2 6 4 9 4.0

Usage facility 1 2 2 1 2 3 1 1.7

Help quality 1 3 3 1 3 2 2 2.1


27

Example: measure the quality of an interface Learning and learning persistent time Rapidity of execution of a task (benchmark) Errors rate and types Satisfaction (subjective) of user

HCI designing = includes different factors Experts: rapidity of execution is prime to learning time Novices: learning time and errors rate reduction is prime to rapidity of

execution Critical system: reducing the errors is the most important Industrial system: learning and execution cost … …

EVALUATION: OTHER THAN USERS

28

o User is not everything … and often are not buyero Typology of interest in choosing a software

o But user is still alpha and omega!

[SESL: Ramage, 1997] User of the software Their colleagues and superiors (managers) Developer and software reseller Computer/information service of an organization (if necessary) The clients of the organization The syndicates and association of employee The parent/main company Association of employee The shareholder The government

EVALUATION PLAN: DEONTOLOGY

29

o Consent: acceptation form

o Problem: evaluation on WWW

Before the session! Explain the subjects:

What is the objective of the evaluation and what do we want from the subject

What are the personal information which will be demanded and subjected: promise anonymity

If it can stop whenever he wants during the session What are the financial reason for the evaluation (if the subject is

remunerated or not) if there are

Ensure at the end (and only in this moment) the agreement by letting user to sign a consent form

BIBLIOGRAPHIES

30

References Nielsen J. (1993) Usability enginerring. Academic Press.

Publications Chin J., Diehl V., Norman K. (1988) Development of an instrument measuring user

satisfaction of the human-computer interface. Actes ACM CHI’88 Human Factors in Computing Systems. 213-218.

Dumas J., Redish J. (1999) A practical guide to usability testing. Intellect, Exeter, UK. Lewis J. (1995) IBM computer usability satisfaction questionnaires : psychometric

evaluation and instructions for use. International Journal of Human-Computer Interaction, 7 (1), 57-78.

Kirakowski J., Corbett M. (1993) SUMI : the Software Usability Measurement Inventory. British Journal of Educational Technology, 24(3), 210-212.

Nielsen J., Mack R., Bergendorf K., Grischkomswy N. (1986) Integrated software usage in the professional work environment : evidence from questionnaires and interviews. Actes CHI’86, New-York, NJ., ACM Press. 162-167.

Nielsen J. and Mach R. (Eds.) (1994) Usability inspection methods. John Wiley & Sons., New-York, NJ.

Whiteside J., Bennet J., Holtzblatt K. (1988) Usability engineering: our experience and evolution. In Helander M. (Ed.) Handbook of Human-Computer Interaction. North-Holland, Amsterdam.