Transcript Evaluation

Evaluation
Read Preece 10
Evaluation J T Burns May 2004
1
Evaluation


There are many times throughout the lifecycle of a software
development that a designer needs answers to questions
Will want to for example:




Check whether his or her ideas match with those of the user(s).
Identify problems –can the user perform the task efficiently
Check if the functionality is apparent
Such evaluation is known as formative evaluation because it
(hopefully) helps shape the product. User-centred design places a
premium on formative evaluation methods.
Summative evaluation, in contrast, takes place after the
product has been developed.
Evaluation J T Burns May 2004
2
Context of Formative
Evaluation


Evaluation is concerned with gathering data about
the usability of a design or product by a specific
group of users for a particular activity within a
definite environment or work context.
Regardless of the type of evaluation it is important to
consider
 characteristics of the users
 types of activities they will carry out
 environment of the study (controlled laboratory? field
study?)

nature of the artefact or system being evaluated?
(sketches? prototype? full system?)
Evaluation J T Burns May 2004
3
Reasons for Evaluation

Understanding the real world


Comparing designs



rarely are there options without alternatives
valuable throughout the development process
Engineering towards a target


particularly important during requirements
gathering
often expressed in the form of a metric
Checking conformance to a standard
Evaluation J T Burns May 2004
4
Approaches to evaluating
usability


Measurements of usability can be conducted
in either of two ways:Analytically


Empirically


By performing a simulation of how the user’s
activities will be performed – real users are not
involved
By building a prototype and testing it with users
These are two quite different approaches to
answering questions about usability
Evaluation J T Burns May 2004
5
Analytic Evaluation

Analytic approaches include:The cognitive walkthrough
 Heuristic Evaluation
 Review based
 Model based


Shall look at each of these
Evaluation J T Burns May 2004
6
Cognitive walkthrough




This technique enables analysis of designs through
exploratory learning
This approach can be particularly useful for
evaluating systems that users ‘walk’ up and use
Enables designers to analyse and predict
performance in terms of the physical and cognitive
operations that must be carried out
CW’s help to answer questions such as: Does this design guide the unfamiliar user through
the successful completion of the task?
Evaluation J T Burns May 2004
7
Cognitive walkthrough


To do this type of evaluation require
 A detailed description of the the user
interface – may include sketches of the
design
 A task analysis
 An indication of who the users are and
what experience knowledge etc can
assume they have
Evaluators then try to answer questions re
the design
Evaluation J T Burns May 2004
8
Cognitive walkthrough questions

Typically evaluators might ask : Are the assumptions about what task the design is
supporting correct? E.g. The meaning of an icon
or a label?
 Will users notice what actions are available? Will
they see a menu option or a particular button?
 When they see a particular button will they
recognise that it is the one required for the task?
 Will users understand the feedback that they get?
Evaluation J T Burns May 2004
9
Heuristic Evaluation


Useful where method of operation is not fully
predictable and where user might not be a
complete novice
Relies on a ‘team’ of evaluators to evaluate
the design
 Each individually critiques the design –4/5
evaluators discover 75% of problems
 Set of Design Heuristics (general
guidelines) is used to guide the evaluators
– Prevent errors
Evaluation J T Burns May 2004
10
Heuristic Evaluation

Original list of 9 heuristics used to generate ideas: Simple & natural dialogue
 Speak the users language
 Minimise user memory load
 Be consistent
 Provide feedback
 Provide clearly marked exits
 Provide short cuts
 Present good error messages
 Prevent errors – (Neilson & Molich 1989)
See Dix 1998 for a more comprehensive set
Evaluation J T Burns May 2004
11
Model Based evaluation




These are based on theories and knowledge of user
behaviour – Eg The theory/model of the Human
Information Processor
This particular model has led to a number of tools &
techniques known as GOMS analysis
GOMS predicts user performance with a known
sequence of operations with a particular interface
and an experienced operator
A second model is the Keystroke- Level model – can
be used to predict the users speed of execution to
complete the task when the method is known and
‘closed’ E.g. – dialling a telephone no
Evaluation J T Burns May 2004
12
Review Based Evaluation

Makes use of experimental results and
empirical evidence. E.g.:



Fitts Law Predicts time to select an object based
on distance and size
Speed and accuracy of pointing devices
Must recognise context under which results
were obtained
Must be careful about subjects & conditions
under which experiments were carried out
Evaluation J T Burns May 2004
13
Analytic Evaluation Summary
Advantages

Do not use costly prototypes

Do not need user testing
 Usable early in the design process
 Use few resources
Disadvantages

Too narrow a focus

Lack of diagnostic output for redesign

Broad assumptions of users cognitive operations

Limited guidance on how to use the methods
therefore can be difficult for evaluator
Evaluation J T Burns May 2004
14
Classification of Evaluation Methods

Non analytic evaluations





Involve use of prototypes
Involve users
May be informal or experimental under controlled
conditions
Time taken can vary from days to weeks or even
months!
Method chosen will often depend on number of
factors including time, cost and criticality
Evaluation J T Burns May 2004
15
Classification of Evaluation Methods



Observation and Monitoring
 data collection by note-taking, keyboard
logging, video capture
Experimentation
 statement of hypothesis, control of
variables
Collecting users’ opinions
 surveys, questionnaires, interviews
Evaluation J T Burns May 2004
16
Observation and Monitoring Direct Observation Protocol


Usually informal in field study, more formal in
controlled laboratories
data collection by direct observation and
note-taking





users in “natural” surroundings
quickly highlights difficulties
Good for tasks that are safety critical
“objectivity” may be compromised by point of
view of observer
users may behave differently while being watched
(Hawthorne Effect)
Evaluation J T Burns May 2004
17
Data gathering techniques
•Naturalistic observation:
—Spend time with stakeholders in their
day-to-day tasks, observing work as it
happens
—Gain insights into stakeholders’ tasks
—Good for understanding the nature and
context of the tasks
—But, it requires time and commitment
from a member of the design team, and
it can result in a huge amount of data
—Ethnography is one form
Evaluation J T Burns May 2004
18
Observation and Monitoring - Indirect
Observation Protocol

data collection by remote note taking,
keyboard logging, video capture





Users need to be briefed fully; a policy must be decided
upon and agreed about what to do if they get “stuck”; tasks
must be justified and prioritised (easiest first)
Video capture permits post-event “debriefing” and avoids
Hawthorne effect (However, users may behave differently in
unnatural environment)
Also with post event analysis users may attempt to
rationalise their actions
Data-logging rich but vast amounts of low-level data
collected; difficult and expensive to analyse
interaction of variables may be more relevant than a single
one (lack of context)
Evaluation J T Burns May 2004
19
Experimental Evaluation






“Scientific” and “engineering” approach
Utilises standard scientific investigation techniques
Aims to evaluate a particular aspect of the interface
Control of variables, esp. user groups, may lead to
“artificial” experimental bases
The number of factors studied is limited so that
casual relationships can be clearly identified
Detailed attention must be paid to the design of the
experiment. E.g. It must be reliable the hypothesis
must be able to be tested
Evaluation J T Burns May 2004
20
Experimental Evaluation


Advantages

Powerful method

Quantitative data obtained

Can compare different groups and types of users

Reliability and validity can be very high
Disadvantages

High resource demands

Requires knowledge of experimental methods
 Time spent on experiments can mean that evaluation is
difficult to integrate into the design cycle

Tasks can be artificial and restricted

Cannot always generalise to full system in typical working
environment
Evaluation J T Burns May 2004
21
Query Techniques




Are less formal than controlled
experimentation
Include use of questionnaires and interviews
Embody principle & philosophy of ‘ask the
user’
Are relatively simple and cheap to administer
Provide information about user attitudes and
opinions
Evaluation J T Burns May 2004
22
Collecting User’s Opinions

Surveys



critical mass and breadth of survey are
critical for statistical reliability
Sampling techniques need to be wellgrounded in theory and practice
Questions must be consistently formulated,
clear and not “lead” to specific answers
Evaluation J T Burns May 2004
23
Data gathering techniques
•Interviews:
—Forum for talking to people
—Props, e.g. sample scenarios of use,
prototypes, can be used in interviews
—Good for exploring issues
—But are time consuming and may be
infeasible to visit everyone
Evaluation J T Burns May 2004
24
Collecting User’s Opinions – Interviews

(Individual) Interviews

can be during or after user interaction





during: immediate impressions are recorded
during: may be distracting during complex tasks
after: no distraction from task at hand
after: may lead to misleading results (short-term
memory loss, “history rewritten” etc.)
can be structured, semi structured or unstructured

a structured interview is like a personal questionnaire prepared questions
Evaluation J T Burns May 2004
25
Collecting Users Opinions

Questionnaires

“open” (free form reply) or “closed” (answers
“yes/no” or from a wider range of possible
answers)


important to use clear, comprehensive and
unambiguous terminology, quantified where
possible



latter is better for quantitative analysis
e.g., daily?, weekly?, monthly? Rather than “seldom”,
“often” and there should always be a “never”
Needs to allow for “negative” feedback
All Form Fill-in guidelines apply!
Evaluation J T Burns May 2004
26
Data gathering techniques
•Workshops or focus groups:
—Group interviews
—Good at gaining a consensus view
and/or
highlighting areas of conflict
Evaluation J T Burns May 2004
27
Data gathering techniques
•Studying documentation:
—Procedures and rules are often written
down in manuals
—Good source of data about the steps
involved in an activity, and any
regulations governing a task
—Not to be used in isolation
—Good for understanding legislation, and
getting background information
—No stakeholder time, which is a limiting
factor on the other techniques
Evaluation J T Burns May 2004
28
Choosing between techniques
•Data gathering techniques differ in two ways:
1. Amount of time, level of detail and
risk associated with the findings
2. Knowledge the analyst requires
•The choice of technique is also affected by the kind of task to
be studied:
—Sequential steps or overlapping series of subtasks?
—High or low, complex or simple information?
—Task for a layman or a skilled practitioner?
Evaluation J T Burns May 2004
29
Problems with data gathering (1)
•Identifying and involving stakeholders:
users, managers, developers, customer reps?, union reps?,
shareholders?
•Involving stakeholders: workshops, interviews, workplace
studies, co-opt stakeholders onto the development team
•‘Real’ users, not managers:
traditionally a problem in software engineering, but better
now
Evaluation J T Burns May 2004
30
Problems with data gathering (2)
•Requirements management: version control, ownership
•Communication between parties:
—within development team
—with customer/user
—between users… different parts of an organisation use
different terminology
•Domain knowledge distributed and implicit:
—difficult to dig up and understand
—knowledge articulation: how do you walk?
•Availability of key people
Evaluation J T Burns May 2004
31
Problems with data gathering (3)
•Political problems within the organisation
•Dominance of certain stakeholders
•Economic and business environment changes
•Balancing functional and usability demands
Evaluation J T Burns May 2004
32
Some basic guidelines
•Focus on identifying the stakeholders’ needs
•Involve all the stakeholder groups
•Involve more than one representative from each
stakeholder group
•Use a combination of data gathering techniques
Evaluation J T Burns May 2004
33
Some basic guidelines
•Support the process with props such as prototypes and
task descriptions
•Run a pilot session
•You will need to compromise on the data you collect and
the analysis to be done, but before you can make sensible
compromises, you need to know what you really want to
find out
•Consider carefully how to record the data
Evaluation J T Burns May 2004
34