7. evalution of interactive system
DESCRIPTION
IHMTRANSCRIPT
HUMAN COMPUTER INTERACTION
(INTERACTION HOMME-MACHINE)
Sethserey SAM
1
CHAPTER 7:EVALUATION OF INTERACTIVE SYSTEM
2
Software life cycle concerning also user interaction
EVALUATION OF INTERACTION
3
Why evaluate? Intuition of designer of system does not sufficient Formal modelisation of system and interaction does not cover/include all the
choices of conceptions The recommendations (guidelines) are still safeguards and best practices
are too general to cover all the aspects of a specific interaction
Software life cycle concerning equally the interaction Spiral life cycle with prototyping Evaluation to all steps of development
EVALUATION OF INTERACTION
4
How to evaluate? With users: experimentations
Without users: a priori
EVALUATION OF INTERACTION
5
Paradigm of evaluation
o A priori / Heuristic evaluation A priori evaluation: review of expert, cognitive walkthrough, … Predictive model: Fitts’s Law, Keystroke Model, etc…
o Experimental evaluation Subjective evaluation Test the usability with potential users Test the acceptability with a sample population Post-commercialization (or test version) evaluation Cognitive experimentations
A PRIORI EVALUATION
6
Predictive model GOMS, Keystroke, Fitts … (cf. chapter VI)
Heuristic evaluation (Nielsen & Mack, 1994) Review the system by one or many experts: simulations the usage Validation of some heuristic ergonomics (cf. heuristic of Nielsen, chapter II) On screen specification and interaction specification (a priori evaluation), or
on existing system or prototype
Cognitive walkthrough Usability inspection method used to identify usability Focusing on how easy it is for new users to accomplish tasks with the
system
A PRIORI EVALUATION
7
Cognitive walkthrough
1. Specification of intended user and system to develop using screen flow (businesses flow)
2. Evaluation a priori by experts with presence of designer
3. The evaluation walk through the screen by simulating the realization of tasks follow the credible scenario. It evaluate If the action to realize happen to be evident to user If users can easily perceive the action to be realize is available If users can see the result of the action and they can interpret it correctly
4. Critical review of the evaluation with designer
5. Document of syntheses
(Nielsen & Mack, 1994)(Spencer, 2000)
EXPERIMENTAL EVALUATION
8
Laboratory usability Room equipped with all the equipments allowing to observe a user working
or interacts with the system Observers close to the subject, or hidden (in annex room) Record video, sound, log file Subject describing his experience in direct (think aloud or cooperative
evaluate with observers) or containing the realizing of task
Example: IBM (Boca Raton, Floride), Microsoft, Sun, …
Field studies Condition more ecologic compare to laboratory usability
Limitation Evaluate more often during first time of deployment: no track on learning
over the time Do not allow a large coverage of functionalities
SUBJECTIVE EVALUATION
9
Principle: opinion post-utilization
1. Session of utilization of the system with a subject following by a task or a
scenario clearly defined
2. Interrogation of the subjects to ask them for their opinion
Different techniques
Open or oriented interview (answer to the predefined questions)
Questionnaire: scale of values on specific point/issue
SUBJECTIVE EVALUATION
10
Open interview The subject address the points which haven’t remark or which may
haven’t yet taking attention by designer Lack of homogeneity of opinion, vary precision: difficult synthetic
analyze
Conducted interview Open or close question testing precise opinion Structuring evaluation: analyze facilities
Have you ever reserve a hotel online? □ yes □ no
This functionality does it seem interesting to you? □ yes □ no
Can you easily complete the hotel reservation? □ yes □ no
Does this take you much time? □ yes □ no
SUBJECTIVE EVALUATION
11
Semi-structuring interview (Nielsen et al., 1986)
Why do you do this ?
knowing the objective of user How do you do it ?
retrieving the sub-task to apply recursively the questions Why do you do this in the following manner?
knowing the choices of user What are the precondition for doing this?
evaluate if user understand the condition to start the action What are the result of doing this? Do errors ever occur when doing this? How do you discover and correct these errors?
SUBJECTIVE EVALUATION
12
Questionnaire Users, sometime, have difficulty to give the edge opinion Subjective evaluation in an interval/scale of multi-values, in a Lickert scale
or a scale of preference
Example QUIS (Chin et al.
1988) IBM Post-Study System Usability Questionnaire (Lewis
1995) Software Usability Measurement Inventory (Kirakowski et Corbett
1993)
Evaluate from 1 (poor) to 4 (excellent) your affirmation with following statement This functionality is interesting □ 1 □ 2 □ 3 □
4 It is easy to reserve with the system □ 1 □ 2 □ 3 □
4 The time of reservation is acceptable □ 1 □ 2 □ 3 □
4
SUBJECTIVE EVALUATION
13
QUIS (Questionnaire for User Interaction Satisfaction)
www.lap.umd.edu/QUIS/
Past experience on tested system Past experience on other systems General opinion of users on the system Display Terminology usage and information provided by system Learnability Paper documentation and online help Online documentation Multimedia Teleconference and collaborative work System installation
SUBJECTIVE EVALUATION
14
Subjective evaluative: what criteria of quality? Example: norm ISO 9241
Reliabilityadequateness of the task
scale of satisfaction
Adaptation to training user scale of satisfaction for advances functionalities
Learnability scale of perception of the facility of learning
Robustnesstolerant to errors
scale of satisfaction in management of errors
OBJECTIVE EVALUATION
15
Principles: observation post-utilization
1. Session of utilization of the system with a subject following by a task or a scenario clearly defined
2. Observation and/or recording of session and data examination
3. Data analyzes
Different approaches Qualitative evaluation Quantitative evaluation
EXAMINE THE OBSERVATION
16
Qualitative evaluation Search for the problem of the utilization the most flagrant: sample cases
Quantitative evaluation Calculation of metric (ex: % of errors …) using the observed data Analyze of videos Transcription and analyze of verbalization of the subjects Analyze the notes of the observer Examine the log of data: key presses count, examine data by using the log
file of the WWW server
OBJECTIVE EVALUATION: TEST OF USABILITY
17
Quantitative metrics characterizing the quality of interaction Example (Whiteside, Bennett and Holtzblatt
1988) Execution time of a task % of task completely executed Ratio session success /failure Number of errors Distribution of number of errors for different subjects Wasting time on the errors Number of commands used to accomplish the task Frequency use of Help and documentation % of positive / bad comments (thing aloud) Number of repetition of an error command Number of commands invoked, but not used Number of times the subject was distracted from the exact task Number of times the subject has lost control of the system Number of times the subject expresses frustration …
OBJECTIVE EVALUATION: TEST OF USABILITY
18
Usability testing of Nielsen (1993)
Effectiveness: verify that if the objectives set by users are achieved
Efficiency: Evaluation of resources used to achieve this objective (ex: time to complete a task)
Satisfaction: quantification the level of user satisfaction
Effectiveness: OK if 90% of users pass the test
Efficiency: OK if 90% of users take less than 3 minutes to accomplish a task
Satisfaction: OK if less than 10% of users expressed a problem of the function
Norm ISO 9241-11
OBJECTIVE EVALUATION: TEST OF USABILITY
19
Example norm ISO 9241-11
Reliabilityadequateness of the task
% of goals achieved Time to complete the task
Adaptation to training user Number of advance features used Relative efficiency with an Expert
Learnability % of functions learned after practice Time to learn a function
Robustnesstolerant to errors
% of corrected errors Time wasting on error recovery
OBJECTIVE EVALUATION: TEST OF ACCEPTATION
20
Principles The same principle as the usability test, but we fixe the metrics with
intervals of expected success (acceptability) Utilization more frequent with the final system: requirement specification
Metrics example Time (or number of sessions) used to learn a specific function Execute time of a task Error rate while realizing a task Proportion of subjects having pass with success during a given time Retention time of a learned command The result of the subjective evaluation …
After 5 hours of utilization by the novices and 15 days of waiting (learning), 50% of the population of the test must be capable to
accomplish 75% of tasks of the test correctly
OBJECTIVE EVALUATION: TEST OF ACCEPTATION
21
Define the tasks for the test Giving a list of tasks to be executed in a subject at the beginning of the
experimentation Well chosen the proposed tasks base on what we want to evaluate
The task take user to concentrate on the parts of interface where the evaluation is holding
Well calculate the dimension of time for each task (objective come from requirement analyze, compare with other existing software, …)
Estimate the necessary time in average and define an average proportion of exceed acceptability (cf. metrics): variety of inter-individual
Ensure that the statement of task is clearly enough for the comprehension of a novice or a primo-users
EVALUATION PLAN
22
An evaluation will not provide any result unless if it is well prepared
[Basili et al. 1994] What are the general objectives of the evaluation? What are the specific questions for which we want to obtain an answer? What is the paradigm and techniques of test which are necessary to
achieve these objectives? How to organize in practice an evaluation: users recruitment, users
preparation, collection tools/devices, … Ensure the respect of deontology rules in vigor How to examine, interpret and present collected information?
EVALUATION PLAN
23
What paradigm of evaluation to use?
Objective (observation)
Problems detectionsBroad range
Modify behavior
Subjective Less expensiveUsage opinion
PrecisionResponse rate
Predictive(model) Non necessary system
Less expensive
Limit range
Predictive(expert)
ExpertiseMay miss out the problems
EVALUATION PLAN
24
When do we use a particular paradigm of evaluation?
Field studies
Predicative
Laboratory usability
Quick and dirty
EVALUATION AND DIVERSITY OF USERS
25
Sampling the population Important for both objective and subjective evaluation Characterize the communities of intended users Sampling the population following different criteria by responding to this
characterization (men/women, expert/novice, familiar with computer usage, age, socio-professional category, …)
Sampling size: 5, 12, 20, 100? (Dumas & Reddish, 1999)
Remark: experimental studies / evaluation “quick and dirty”
Analyze the tests Multi-criteria analyze: distributes the results follow the different
characteristics Statistic pertinent of results Separate discipline: statistic (protocols and tests)
EVALUATION AND DIVERSITY OF USERS
26
Analyze the statistic of result
1 2 3 4 5 6 7 Average
Age 37 41 43 54 46 44 21 40.9
Sex F F M M F F M 4F, 3M
Education level 4 2 4 4 4 1 2 3.0
PC years 5 2 0 2 6 4 9 4.0
Usage facility 1 2 2 1 2 3 1 1.7
Help quality 1 3 3 1 3 2 2 2.1
EVALUATION AND DIVERSITY OF USERS
27
Example: measure the quality of an interface Learning and learning persistent time Rapidity of execution of a task (benchmark) Errors rate and types Satisfaction (subjective) of user
HCI designing = includes different factors Experts: rapidity of execution is prime to learning time Novices: learning time and errors rate reduction is prime to rapidity of
execution Critical system: reducing the errors is the most important Industrial system: learning and execution cost … …
EVALUATION: OTHER THAN USERS
28
o User is not everything … and often are not buyero Typology of interest in choosing a software
o But user is still alpha and omega!
[SESL: Ramage, 1997] User of the software Their colleagues and superiors (managers) Developer and software reseller Computer/information service of an organization (if necessary) The clients of the organization The syndicates and association of employee The parent/main company Association of employee The shareholder The government
EVALUATION PLAN: DEONTOLOGY
29
o Consent: acceptation form
o Problem: evaluation on WWW
Before the session! Explain the subjects:
What is the objective of the evaluation and what do we want from the subject
What are the personal information which will be demanded and subjected: promise anonymity
If it can stop whenever he wants during the session What are the financial reason for the evaluation (if the subject is
remunerated or not) if there are
Ensure at the end (and only in this moment) the agreement by letting user to sign a consent form
BIBLIOGRAPHIES
30
References Nielsen J. (1993) Usability enginerring. Academic Press.
Publications Chin J., Diehl V., Norman K. (1988) Development of an instrument measuring user
satisfaction of the human-computer interface. Actes ACM CHI’88 Human Factors in Computing Systems. 213-218.
Dumas J., Redish J. (1999) A practical guide to usability testing. Intellect, Exeter, UK. Lewis J. (1995) IBM computer usability satisfaction questionnaires : psychometric
evaluation and instructions for use. International Journal of Human-Computer Interaction, 7 (1), 57-78.
Kirakowski J., Corbett M. (1993) SUMI : the Software Usability Measurement Inventory. British Journal of Educational Technology, 24(3), 210-212.
Nielsen J., Mack R., Bergendorf K., Grischkomswy N. (1986) Integrated software usage in the professional work environment : evidence from questionnaires and interviews. Actes CHI’86, New-York, NJ., ACM Press. 162-167.
Nielsen J. and Mach R. (Eds.) (1994) Usability inspection methods. John Wiley & Sons., New-York, NJ.
Whiteside J., Bennet J., Holtzblatt K. (1988) Usability engineering: our experience and evolution. In Helander M. (Ed.) Handbook of Human-Computer Interaction. North-Holland, Amsterdam.