Transcript Document
SIMS 213:
User Interface Design & Development
Marti Hearst
March 9 & 11, 2004
Formal Usability Studies
Outline
Experiment Design
– Factoring Variables
– Interactions
Special considerations when involving human
participants
Example: Marking Menus
–
–
–
–
Motivation
Hypotheses
Design
Analysis
Formal Usability Studies
When useful
– to determine time requirements for task completion
– to compare two designs on measurable aspects
• time required
• number of errors
• effectiveness for achieving very specific tasks
Require Experiment Design
Adapted from slide by James Landay
Experiment Design
Experiment design involves determining how
many experiments to run and which attributes to
vary in each experiment
Goal: isolate which aspects of the interface really
make a difference
Experiment Design
Decide on
– Response variables
• the outcome of the experiment
• usually the system performance
• aka dependent variable(s)
– Factors (aka attributes))
• aka independent variables
– Levels (aka values for attributes)
– Replication
• how often to repeat each combination of choices
Experiment Design
Example:
– Studying a system (ignoring users)
Say we want to determine how to configure the
hardware for a personal workstation
– Hardware choices
• which CPU (three types)
• how much memory (four amounts)
• how many disk drives (from 1 to 3)
– Workload characteristics
• administration, management, scientific
Experiment Design
We want to isolate the effect of each component for the
given workload type.
How do we do this?
–
–
–
–
–
–
WL1
WL1
WL1
WL1
WL1
…
CPU1
CPU1
CPU1
CPU1
CPU1
Mem1
Mem1
Mem1
Mem2
Mem2
Disk1
Disk2
Disk3
Disk1
Disk2
There are (3 CPUs)*(4 memory sizes)*(3 disk sizes)*(3 workload types) = 108
combinations!
Experiment Design
One strategy to reduce the number of
comparisons needed:
– pick just one attribute
– vary it
– hold the rest constant
Problems:
– inefficient
– might miss effects of interactions
Interactions among Attributes
Interacting
Non-interacting
B1
B2
A1
3
6
A2
5
8
A1
3
6
B1
B2
B2
B2
B1
A1
B1
A2
B2
A2
5
9
B1
A1
A2
A2
A2
A1
A1
B1
B2
Experiment Design
Another strategy: figure out which attributes are
important first
Do this by just comparing a few major attributes
at a time
– if an attribute has a strong effect, include it in future
studies
– otherwise assume it is safe to drop it
This strategy also allows you to find interactions
between attributes
Experiment Design
Common practice: Fractional Factorial Design
– Just compare important subsets
– Use experiment design to partially vary the
combinations of attributes
Blocking
– Group factors or levels together
– Use a Latin Square design to arrange the blocks
Between-Groups Design
Wilma and Betty use one
interface
Dino and Fred use the other
Within-Groups Design
Everyone uses both interfaces
Between-Groups vs. Within-Groups
Between groups
– 2 or more groups of test participants
– each group uses only 1 of the systems
Within groups
– one group of test participants
– each person uses all systems
• can’t use the same tasks on different systems
Adapted from slide by James Landay
Between-Groups vs. Within-Groups
Within groups design
– Pros:
• Is more powerful statistically (can compare the same
person across different conditions, thus isolating effects of
individual differences)
• Requires fewer participants than between-groups
– Cons:
• Learning effects
• Fatigue effects
Special Considerations for Formal
Studies with Human Participants
Studies involving human participants vs. measuring
automated systems
–
–
–
–
people get tired
people get bored
people (may) get upset by some tasks
learning effects
• people will learn how to do the tasks (or the answers to questions) if
repeated
• people will (usually) learn how to use the system over time
More Special Considerations
High variability among people
– especially when involved in reading/comprehension
tasks
– especially when following hyperlinks! (can go all over
the place)
Experiment Design Example:
Marking Menus
Based on
Kurtenbach, Sellen, and Buxton, Some Articulartory and Cognitive Aspects of
“Marking Menus”, Graphics Interface ‘94, http://reality.sgi.com/gordo_tor/papers
Experiment Design Example:
Marking Menus
Pie marking menus can reveal
– the available options
– the relationship between mark and command
1. User presses down with stylus
2. Menu appears
3. User marks the choice, an ink trail follows
Why Marking Menus?
Supporting markings with pie menus should help
transition between novice and expert
Useful for keyboardless devices
Useful for large screens
Pie menus have been shown to be faster than
linear menus in certain situations
What do we want to know?
Are marking menus better than pie menus?
– Do users have to see the menu?
– Does leaving an “ink trail” make a difference?
– Do people improve on these new menus as they
practice?
Related questions:
– What, if any, are the effects of different input
devices?
– What, if any, are the effects of different size menus?
Experiment Factors
Isolate the following factors (independent variables):
– Menu condition
• exposed, hidden, hidden w/marks (E,H,M)
– Input device
• mouse, stylus, track ball (M,S,T)
– Number of items in menu
• 4,5,7,8,11,12 (note: both odd and even)
Response variables (dependent variables):
– Response Time
– Number of Errors
Experiment Hypotheses
Note these are stated in terms of the factors
(independent variables)
– Exposed menus will yield faster response times and
lower error rates, but not when menu size is small
– Response variables will monotonically increase with
menu size for exposed menus
– Response time will be sensitive to number of menu
choices for hidden menus (familiar ones will be easier,
e.g., 8 and 12)
– Stylus better than Mouse better than Track ball
Experiment Hypotheses
Device performance is independent of menu type
Performance on hidden menus (both marking and
hidden) will improve steadily across trials.
Performance on exposed menus will remain
constant.
Experiment Design
Participants
– 36 right-handed people
• usually gender distribution is stated
– considerable mouse experience
– (almost) no trackball, stylus experience
Task
– Select target “slices” from a series of different pie menus as
quickly and accurately as possible
– Menus were simply numbered segments
• meaningful items would have longer learning times
– Participants saw running scores
• lose points for wrong selection
Experiment Design
One between-subjects factor
– Menu Type
• Three levels: E, H, or M
Two within-subjects factors
– Device Type
• Three levels: M, T, or S
– Number of Menu Items
• Six levels: 4, 5, 7, 8, 11, 12
How should we arrange these?
Experiment Design
E
H
M
Between
subjects
design
How to
arrange
the
devices?
12
12
12
Experiment Design
A Latin
Square
E
H
M
M
T
S
T
S
M
S
M
T
12
12
12
No row
or
column
share
labels
Experiment Design
How to
arrange
the
menu
sizes?
E
H
M
M
T
S
T
S
M
S
M
T
Block by size
then
randomize
the
blocks.
Experiment Design
Block by size
then
randomize
the
blocks.
E
H
M
M
T
S
T
S
M
S
M
T
(Note: the order of each set of blocks
will differ for each participant in each square)
5
11
12
8
7
4
Experiment Design
40 trials
per block
E
H
M
M
T
S
T
S
M
S
M
T
7
8
5
11
12
5
12
8
4
11
7
4
(Note: these blocks will look
different for each participant.)
Experiment Overall Results
Group
Mean RT
(s.d)
Mean Errors Mean %
(s.d.)
Errors
Exposed
0.98 (.23)
0.64 (1.0)
1.6%
Hidden
1.10 (.31)
3.27 (3.57)
8.2%
Marking
1.10 (.31)
3.76 (3.67)
9.4%
So exposing menus is faster … or is it?
Let’s factor things out more.
A Learning Effect
When we graph over the number of trials, we find
a difference between exposed and hidden menus.
This suggests that participants may eventually become
faster using marking menus (was hypothesized).
A later study verified this.
Factoring to Expose Interactions
Increasing menu size increases selection time and number of errors
(was hypothesized).
No differences across menu groups in terms of response time.
That is, until we factor by menu size AND group
– Then we see that menu size has effects on hidden groups not seen
on exposed group
– This was hypothesized (12 easier than 11)
Factoring to Expose Interactions
Stylus and mouse outperformed trackball (hypothesized)
Stylus and mouse the same (not hypothesized)
Initially, effect of input device did not interact with menu type
– this is when comparing globally
– BUT ...
More detailed analysis:
–
–
–
–
Compare both by menu type and device type
Stylus significantly faster with Marking group
Trackball significantly slower with Exposed group
Not hypothesized!
Average response time
and errors as a function of
device, menu size, and
menu type.
Potential explanations:
Markings provide feedback
for when stylus is pressed
properly.
Ink trail is consistent with
the metaphor of using a pen.
Experiment Design
How can we tell if order in which the device
appears has an effect on the final outcome?
E
H
M
M
T
S
T
S
M
S
M
T
Some evidence:
There is no significant difference among devices in the
Hidden group.
Trackball was slowest and most error prone in all three cases.
Still, there may be some hidden interactions, but unlikely
to be strong given the previous graph.
Statistical Tests
Need to test for statistical significance
– This is a big area
– Assuming a normal distribution:
• Students t-test to compare two variables
• ANOVA to compare more than two variables
Analyzing the Numbers
Example: trying to get task time <=30 min.
–
–
–
–
–
test gives: 20, 15, 40, 90, 10, 5
mean (average) = 30
median (middle) = 17.5
looks good!
wrong answer, not certain of anything
Factors contributing to our uncertainty
– small number of test users (n = 6)
– results are very variable (standard deviation = 32)
• std. dev. measures dispersal from the mean
Adapted from slide by James Landay
Analyzing the Numbers (cont.)
This is what statistics are for
Crank through the procedures and you find
– 95% certain that typical value is between 5 & 55
Usability test data is quite variable
– need lots to get good estimates of typical values
– 4 times as many tests will only narrow range by 2x
Adapted from slide by James Landay
Followup Work
Hierarchical Markup Menu study
Followup Work
Results of use of marking menus over an extended
period of time
– two person extended study
– participants became much faster using gestures without
viewing the menus
Followup Work
Results of use of marking menus over an extended
period of time
– participants temporarily returned to “novice” mode when they
had been away from the system for a while
Summary
Formal studies can reveal detailed information but take
extensive time/effort
Human participants entail special requirements
Experiment design involves
– Factors, levels, participants, tasks, hypotheses
– Important to consider which factors are likely to have real effects
on the results, and isolate these
Analysis
– Often need to involve a statistician to do it right
– Need to determine statistical significance
– Important to make plots and explore the data
References
Kurtenbach, Sellen, and Buxton, Some Articulartory and
Cognitive Aspects of “Marking Menus”, Graphics Interface ‘94,
http://reality.sgi.com/gordo_tor/papers
Kurtenbach and Buxton, User Learning and Performance with
Marking Menus, Graphics Interface ‘94,
http://reality.sgi.com/gordo_tor/papers
Jain, The art of computer systems performance analysis, Wiley,
1991
http://www.statsoft.com/textbook/stanman.html
Gonick and Smith, The Cartoon Guide to Statistics,
HarperPerennial, 1993
Dix et al. textbook