evaluating usability

31
EVALUATING USABILITY Data & Analysis

Upload: ghalib

Post on 24-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Evaluating Usability. Data & Analysis. Types of Data. And how to read them. What types of data are relevant to our interests? . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Evaluating Usability

EVALUATING USABILITYData & Analysis

Page 2: Evaluating Usability

TYPES OF DATAAnd how to read them

Page 3: Evaluating Usability

WHAT TYPES OF DATA ARE RELEVANT TO OUR INTERESTS? When evaluating how

usable a design is, there are many data you may want to take into account: whether a user can complete a task, how long it takes them to complete a task, survey responses, etc.

But before thinking about what data you will collect, you must understand the basic types of data that exist and what they can tell you about your interface.

Page 4: Evaluating Usability

NOMINAL DATA Unordered groups or categories (e.g. apples

and oranges; no fruit is inherently better) Examples:

Binary success (whether a user was or was not able to complete a task)

Some demographic information, such as gender or whether or not the participant owns a smartphone But not others, such as age or annual household

income

Page 5: Evaluating Usability

ORDINAL DATA Ordered groups or categories Examples:

Survey rankings “Would you describe this website as excellent, good,

fair, or poor?” Levels of task completion (for non-binary

success) If I tell you to draw a circle and you draw an oval,

shouldn’t you get partial credit?

Page 6: Evaluating Usability

CAVEAT FOR ORDINAL DATA“Would you describe this website as excellent, good,

fair, or poor?” Although good is better than fair and fair is

better than poor, we have no way of knowing whether the “distance” between good and fair is greater than the “distance” between fair and poor

You cannot do arithmetic with ordinal data, but you can summarize it with histograms.

GOOD FAIR POOR GOOD FAIR POOROR

?

Page 7: Evaluating Usability

INTERVAL DATA Data points which are measured along a

scale where each point is equidistant from one another

Examples: 5-star ratings such as those used by Yelp, Google

Local, etc. Semantic differentials:

“Would you describe these slides as…Ugly □ □ □ □ □ Beautiful”

Page 8: Evaluating Usability

CAVEAT FOR INTERVAL DATA You cannot multiply or divide interval data. There is no way something can be “twice as

beautiful” or “three times as ugly” because there is no meaningful zero point.

Page 9: Evaluating Usability

ORDINAL VS. INTERVAL Interval data provides more opportunity for

analysis than nominal or ordinal data do, but the scales used often look the same:

□ Poor □ Fair □ Good □ Excellent

Vs.

Poor □ □ □ □ Excellent

How do these different formats affect your participants’ responses? How do they effect what you can and cannot do with the data?

Page 10: Evaluating Usability

WORKING WITH ORDINAL AND INTERVAL DATA Because ordinal data is not uniformly

distributed along its scale, we cannot treat it like interval data Remember, you cannot do any arithmetic on

ordinal data This means no averaging!

It may seem pedantic, but you cannot treat the response “fair” the same way as “1/3 of the way between poor and excellent”

Page 11: Evaluating Usability

RATIO DATA Like interval data, but with the addition of an

inherently meaningful absolute zero Examples:

Time to task completion Number of page views, mouse clicks, etc.

Page 12: Evaluating Usability

INTERVAL VS. RATIO DATA The concept of an absolute zero means you

can do any type of arithmetic you like with ratio data You can make relative statements

You can say that one participant took twice as much time to complete a task as another, but you can’t say one movie is twice as good as another

You can take the geometric mean Jakob Nielsen believes this is important, because it

prevents a single big number from skewing the result and it accounts fairly for cases in which some of the metrics are negative

Page 13: Evaluating Usability

EVALUATING A UI There are qualities you may wish to gather

user data for: performance and satisfaction Performance data is typically “hard data”

relating to how easily a UI can be used to accomplish a given set of tasks; e.g. Percentage of successfully completed tasks Average time to task completion Number of clicks-through

Satisfaction data covers emotional response to a UI and is generally self-reported; e.g. How aesthetically pleasing the UI was How easy to use the participant found the UI vs.

how easy they thought the task was

Page 14: Evaluating Usability

PERFORMANCE VS. SATISFACTION How much data you gather about each of

those qualities will depend on what type of a system you are building If you are building a stock-trading application to

be used in-house by Goldman Sachs, chances are you are about performance more than satisfaction

If you are building a social game for Zynga, chances are you will care more about satisfaction

Does performance imply satisfaction? Sometimes, a bit of speed can be sacrificed for

better user experience (e.g. iPhone animations)

Page 15: Evaluating Usability

PERFORMANCE METRICSMeasuring success and efficiency

Page 16: Evaluating Usability

TASK SUCCESS To measure success, you must first define a

clear desired end-state for your task “Find the current price of a share of GOOG stock”

(clear end-state) “Research ways to save for retirement” (unclear

end-state) Binary success – tasks which are

necessarily pass/fail Levels of success – tasks which may be

partially completed or completed in less-than-optimal ways

Page 17: Evaluating Usability

LOOKING AT LEVELS OF SUCCESS Let’s say you are evaluating the GIMP

interface and one of your tasks is having participants draw a circle What are the different possible levels of success

for this task? Where is the cutoff for failure?

Page 18: Evaluating Usability

ISSUES IN MEASURING SUCCESS Deciding what constitutes success Deciding when to end a task if the participant

is not successful Tell participants to stop trying at the point where,

in the real world, they would give up or seek assistance

Allow participants a certain number of attempts to complete the task Issue: What constitutes an “attempt”?

Set a time limit

Page 19: Evaluating Usability

TIME-ON-TASK This data can be analyzed in a number of

ways: Looking at the median or geometric mean

(typically less skewed than the mean) Creating ranges to report frequency of users

falling into each interval Create a threshold which models the

“acceptable” amount of time to complete a particular task

Look at distribution to identify outliers—especially important for remote testing, which often yields “noisy” data (e.g., a participant goes and gets a sandwich halfway through a task)

Page 20: Evaluating Usability

ISSUES IN MEASURING TIME-ON-TASK Should you include time on unsuccessful tasks?

How will including or throwing out this data affect your results?

Will asking users to voice their thoughts while completing the task alter their time to completion? Will voicing thoughts aloud cause users to complete

tasks more quickly/slowly than they would otherwise?

Quantitative methods are only part of the goal for usability testing. The voice-aloud method can provide you with useful qualitative data.

Should you tell participants that their time until completion is being measured?

Page 21: Evaluating Usability

EFFICIENCY Most people think of efficiency as equivalent to

time-on-task Efficiency can also be considered a

measurement of the amount of effort required to complete a task.

Effort is a quantification of the number of actions a user takes (e.g. mouse clicks, page views, keystrokes, etc.)

To measure efficiency, you must define what your units of meaningful action are and what the precise start and endpoints are for your task

Typically, effort is only calculated for successfully completed tasks

Page 22: Evaluating Usability

LOSTNESS Another measure of efficiency, especially

important for websites, is that of “lostness”, which can be modeled using the following formula:

√( ¿ 𝑜𝑓 𝑢𝑛𝑖𝑞𝑢𝑒𝑠𝑡𝑎𝑡𝑒𝑠𝑣𝑖𝑠𝑖𝑡𝑒𝑑𝑡𝑜𝑡𝑎𝑙¿ 𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠 𝑣𝑖𝑠𝑖𝑡𝑒𝑑¿−1)2

+(𝑚𝑖𝑛 .¿𝑜𝑓 𝑠𝑡𝑎𝑡𝑒𝑠𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑¿¿¿𝑜𝑓 𝑢𝑛𝑖𝑞𝑢𝑒𝑠𝑡𝑎𝑡𝑒𝑠𝑣𝑖𝑠𝑖𝑡𝑒𝑑−1 )2

Page 23: Evaluating Usability

LEARNABILITY We have already discussed learning curves in

class You hope that, the more times a user has

used your UI, the less time it will take them to complete a task and the less effort it will require

In this sense, we can model learnability as time-on-task, success rate, and/or efficiency over time

Page 24: Evaluating Usability

SELF-REPORTED METRICS“These slides are AWESOME!!!11!1”

Page 25: Evaluating Usability

THE IMPORTANCE OF SELF-REPORTED DATA So far, we have focused mostly on data

which is gathered by observing the user However, it is sometimes necessary to gather

data directly from the user This is especially common when you are

looking to evaluate user satisfaction rather than performance

Page 26: Evaluating Usability

LIKERT SCALES Likert scales consist of a statement, either

positive or negative, followed by a 5-point scale of agreement

Example:“I found this website easy to use.”

□ Strongly disagree □ Disagree□ Neither agree nor disagree □ Agree□ Strongly agree

This is an example of ordinal data!

Page 27: Evaluating Usability

SEMANTIC DIFFERENTIAL SCALES These are similar to Likert scales, but run on

scales anchored by opposing adjectives Example:

“I would describe this website as…”Ugly □ □ □ □ □ Beautiful

This is an example of interval data!

Page 28: Evaluating Usability

EXPECTATION MEASURE Some experts (Albert and Dixon 2003)

believe that the best way to assess the ease or difficulty of a given task is relative to how easy or difficult the participant thought it was going to be

Thus, for each task you might ask participants to rate both how easy/difficult they thought it would be and how easy/difficult it actually was

Doing so allows you to calibrate what is otherwise essentially an arbitrary measure Remember, individuals may have different ideas

of how “easy” and “very easy” compare!

Page 29: Evaluating Usability

POST-TASK VS. POST-SESSION EVALUATION It is generally helpful to chunk your lab time

into a number of tasks You can gather self-reported data either after

each task, after the entirety of the session, or both

The previously described methods can be used either post-task or post-session

Page 30: Evaluating Usability

SYSTEM USABILITY SCALE A System Usability

Scale can be used for post-session self-reporting

There are many variants on this test. Here is a classic example:

Page 31: Evaluating Usability

OTHER TYPES OF SELF-REPORTING Self-reported metrics can also be used to

evaluate specific elements (e.g. navigation bar) or overall attributes (e.g. visual appeal) of a UI

When evaluating elements, it is helpful to examine gaps between awareness and usefulness To do this, you might ask, “Were you aware of

this functionality prior to this study? (Yes/No)” followed by “On a scale of 1 to 5, how useful is this functionality to you? (1=Not at all useful; 5=Very useful)