Wizard of Oz studies

Download Report

Transcript Wizard of Oz studies

Agents

Today’s agenda: a word about screens discuss agents discuss dialogue and language interfaces introduce the Wizard of Oz eval. technique go over last quiz get started on Assignment 5

Screen Design, revisited:

 Really bad Web sites:  Jacob Nielsen on bad design   Websitesthatsuck Interface Hall of Shame  Test the visual accessibility of a web page

Agents

Review: AGENT style of interaction

     “Intelligent” processes that take some initiative and perform tasks on the user’s behalf  The user

delegates

responsibility to the agent; the agent takes initiative.

This feels very different from direct manipulation!

At best, the agent anticipates what you want.

At worst, the agent is an obstacle Agents can be

amplifiers

or alternatively,

prosthetics (2 distinct metaphors for kinds of agents)

Tasks for Agents

  Information retrieval (e.g., Web bots, travel scheduling) Sorting, organizing, filtering (e.g., spam filters)  Coaching, tutoring, providing help  Reminding  Programming, doing repetitive things (macros, automatic form-filling)  Advising  Entertaining  Navigation (gps) (adapted from Brenda Laurel; Patti Maes)

     

Tasks for Agents - adaptive functionality

After observing its user performing the same set of actions over and over again, a computer system offers to produce a system-generated program to complete the task (Cypher 1991).

An adaptive phone book keeps track of which numbers are retrieved; it then uses that information to increase the accessibility of frequently retrieved numbers (Greenberg & Whitten 1985).

A "learning personal assistant" fits new appointments into the busy calendar of its user, according to rules inferred by observing previous scheduling behavior (Mitchell, et al. 1995).

A multi-user database notices that over time certain seemingly unrelated bibliographic records--call them X and Y--are frequently retrieved in the same search session. It uses that information to increase the probability that Y is retrieved whenever X is specified, and vice versa (Belew 1989).

A full text database allows its users to type in questions in plain English. It interprets the input, and returns a list of results ordered in terms of their relevance. Users can select an item, and tell it to 'find more like that one' (Dow Jones & Co. 1989).

A variety of recognition systems transform handwriting, speech, gestures, drawings, or other forms of human communication from fuzzy, analog representations into structured, digital representations.

Tom Erickson: “

Designing agents as if people mattered

What other agent interfaces have you used?

Agents (anthropomorphic or not)

The most obvious metaphor for an interactive system that seems intelligent or takes initiative may be a human-like character. But agents can be non-anthropomorphic as well… even a charge-card form can take initiative and act “intelligent” by filling in the right values automatically.

Drawbacks of delegation

     How can you instruct/program your agent?

“Intelligent” interfaces can be unpredictable.

Sometimes the “agent” may be more of an obstacle (if you’d rather do it yourself - D.M.) Whose interests is the agent serving?

 (institutional? advertisers?) How do you come to “trust” your agent?

 Do you have a mental model of how it works?  What’s it doing with your data? Spying on you?

Dialogue systems

 Common metaphor for agent-style interaction:

conversation

or

dialogue

.  The underlying metaphor is having a conversation with another person  These range from simple voice command recognition systems to more complex ‘natural language’ dialogues  To discover what representations work best, we need to observe people interacting with systems.

 Dialogue systems may or may not have personas…

QuickTime™ and a TIF F ( Uncompressed) decompressor are needed to see this picture.

Personas

 Role, character, coherent set of personality traits/behaviors  Cohen et al. (Ch. 6) suggest not leaving personas to chance, but specifically defining them  But: Are “personas” annoying?  Anthropomorphic cartoon agents such as Microsoft’s Bob , etc.

 Microsoft’s bizarre character patents

Microsoft’s Rover

  “That dog is Rover, the cursed canine who lived on to serve as the “Search Assistant ” in Windows XP. ( An article on Microsoft ’s own site published when Windows XP was released says that some people “loathe” Rover--can you name another instance of any company anywhere using that word in conjunction with customer response to a new product?) One enduring mystery about this patent: Why did Microsoft call a program involving a talking mutt a “real world interface”?” by Harry McCracken http://technologizer.com/2009/01/02/microsoft-clippy-patents/2/ Q uickTim e™ and a ar e needed t o see t his pict ur e.

Microsoft’s Earl

  “Meet Earl, a surfer dude (get it?) who’s mostly a giant pair of lips and who seems to think he knows more about the Internet than you do. (Once again, I don ’t know if Microsoft unleashed this exact idea on its customers, but it ’s a variant of the Search Assistant from Windows XP.) Two scary things about this patent: 1) its title pitches it as a search

improvement

rather than a perverse joke; and B) this is apparently what Microsoft was doing to enhance Web search at the same time that Larry Page and Sergey Brin were founding Google.” by Harry McCracken http://technologizer.com/2009/01/02/microsoft-clippy-patents/12/ QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Microsoft’s Will

(Shakespeare)   “For most of us, the Office Assistant is synonymous with the talking paperclip character, but you could actually choose between multiple helpers. Including William Shakespeare or an unreasonable facsimile thereof (shown in a fuzzy patent drawing). “Will” is offering to help the user perform regression analysis here--I would have thought it intuitively obvious that anyone who knows what regression analysis is probably doesn ’t want an animated version of the Bard of Avon getting involved in the process.” by Harry McCracken http://technologizer.com/2009/01/02/microsoft-clippy-patents/5/ QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Computer Interface for Illiterate and Near-Illiterate Users (filed 1/26/2006)   This patent… pitches on-screen assistants as an interface for people who can ’t read or write. The garb of the lady shown here is explained by the fact that the patent is the result of work by researchers in India; the parent says that the assistant could also be a dog, an elephant, or an airplane. It ’s scary to think that the idea behind Bob and Clippy hasn ’t completely fizzled out, but look on the bright side--if you ’re reading this story, Microsoft isn ’t going to try and get you to use any product based on this patent.” by Harry McCracken http://technologizer.com/2009/01/02/microsoft-clippy-patents/15/ ar e needed t o see t his pict ur e.

Sociable robots

 Robots by definition are “embodied” they may or may not be anthropomorphic.

 Robots and agents in science fiction:  Harrison Ford voice interface from Blade Runner :  Total Recall: Arnold and JohnnyCab  or http://www.youtube.com/watch?v=0H5k--n7sFI&NR=1 Hal from 2001

Drawbacks of anthropomorphization

 Review Abby Don’s panel and position papers on

whether agents should be anthropomorphic .

 Shneiderman’s arguments: problems with anthropomorphizing an interface  Are anthropomorphic interfaces honest?

 Shneiderman, Weizenbaum: No, they’re misleading

Pros and cons of conversational metaphor

   The conversational metaphor need not be embodied in an anthropomorphic character.

Conversational interaction allows users, especially novices and technophobes, to interact with the system in a way that is familiar  makes them feel comfortable, at ease, not uneasy or uncertain  they transfer some of their expectations from dialogues with people to dialogues with computers But misunderstandings can arise when the system lacks knowledge, cannot repair the misunderstandings, doesn’t keep track of conversational context, or does not know how to parse what the user says

Food for thought:

To what extent is dialog with a person similar to interacting with a computer?

(Shneiderman: not at all!) (Others: If the system can take the initiative and act “intelligently”, then conversation may be a good metaphor for interacting with an agent.)

What’s wrong with this dialog?

Q: Who has a terminal?

A: Shall I do the following? Create a report showing the full name and the manager and the equipment from the forms on which the equipment includes “TERMINAL”?

(from the Q&A program)

Q: Who has a terminal?

A: Dan Flickinger has a terminal.

(from the HPNL program)

Initiative

(REVIEW - metaphors & interaction styles)  Some kinds of human conversations are scripted - who takes the initiative is relatively fixed.

 In other kinds of conversation, neither person takes all the initiative.

 Ideally, initiative should be flexible.

Initiative in Human-Computer Dialog

 A dialog can be

under-determined

when the user is mystified about what to do next and is forced to take the initiative

(e.g., blank screen, cryptic commands)

 A dialog can be

over-determined

when a system is authoritarian and takes too much initiative

(unnatural constraints) (Harold Thimbleby)

Different users are different!

 Interfaces that take the initiative are typically better for novices, or when errors would be catastrophic.

 Interfaces that let the user take the initiative are better for experts.

An interface is over- or under-determined with respect to a particular user .

Examples from spoken dialogue systems

Examples from Voice User Interface Design, by Cohen, Giangola, & Balogh

Minimizing cognitive load

(Cohen et al., 2004) System:United Airlines flight 47 leaves New York Kennedy Airport at 1 p.m. from gate 36 and arrives at Boston Logan at 1:45 p.m. at gate 22. American Airlines flight 243 leaves New York Kennedy Airport at 2:15 p.m. from gate 12 and arrives at Boston Logan at 3 p.m. at gate 47. American Airlines flight 260 leaves New York Kennedy Airport at 3:45 p.m. from gate 15 and arrives at Boston Logan at 4:30 p.m. at gate 42. United Airlines flight 52 leaves New York Kennedy Airport at 5 p.m. from gate 38 and arrives at Boston Logan at 5:45 p.m. at gate 31. Which would you like?

(Broad, shallow dialog)

Minimizing cognitive load

(Cohen et al., 2004) System: There are flights at 1 p.m., 2:15, 3:45, and 5. Which would you like?

Caller: How about the 2:15 flight.

System: American Airlines Flight 243 leaves New York Kennedy Airport at 2:15 p.m., arriving at Boston Logan at 3 p.m. Would you like to book this flight?

(Much less broad, slightly deeper dialog)

Keeping track of previous information

Caller: Bill payment.

System:What bill would you like to pay?

Caller: MasterCard.

System:How much would you like to pay?

Caller: Two hundred and fifteen dollars.

System:What date would you like the bill to be paid?

Caller: August first.

System:Confirming: You'd like to pay MasterCard in the amount of two hundred and fifty dollars on August first. Is this correct?

Caller: No.

System:Which part would you like to change?

Caller: The amount.

System:How much would you like to pay?

[same recording as first-time request] (Cohen et al., 2004)

Keeping track of previous information

Caller: Bill payment.

System: Sure, "bill payment." What bill would you like to pay?

Caller: MasterCard.

System:How much would you like to pay?

Caller: Two hundred and fifteen dollars.

System: And on what date would you like it Caller: August first.

paid?

System: Let me confirm that: Did I get that right?

You'd like to pay MasterCard in the amount of two hundred and fifty dollars on August first. Caller: No.

System:Which part did I get wrong?

Caller: The amount.

System: Sorry about that. What's the correct amount? (Cohen et al., 2004)

Task oriented dialogue

 …is different from character dialogue…  Hi! I’m Julie!

Natural language interfaces-

         Claim: easy to use

(high-level lang.)

Can express negation Can express quantification Can search very large databases Can issue command over sets Can distinguish individuals from kinds Can ask for information in novel ways Can construct complex queries Refer to things not in the here and now

pro

Natural language interfaces -

con

     Require the ability to repair errors People may have unreasonable expectations There are many ways to say same thing The ambiguity problem There are currently no robust NL systems that are domain independent  NL requires solving many hard problems!

Conversation

Herb: ok! now, next week

(looks at Susan)

Susan: Herb: right.

ok, I will.

When is speech useful?

     in the dark around corners when there’s no keyboard handy when your hands are busy for some handicapped users

Language in the interface

Whether the interface uses NL or speech or neither, people will still

transfer some of their expectations

about human partners to computer partners.

For example:

Socrates: Please select command mode

Student: Please find an author named Octavia Butler.

Socrates: Invalid Folio command: please

Guindon claimed that people will not use pronouns with a statistics advisor program However, we showed that people use just as many pronouns with a computer partner as with a human partner (when the domain lends itself to using pronouns!)

Wizard of Oz studies

- laboratory studies of simulated systems

The experimenter intercepts the subject’s input to the computer and may provide responses as if they were coming from the subject’s computer.

Conversation with and through computers

(Brennan, 1991)

How to get people to type well-formed Qs?

Will people use pronouns?

Will people use restricted vocab & syntax?

Do people have the same expectations of computers that they have of their human partners?

Experiment design Partner

Short

Answers

Sentence Lex change Human Computer 6 6 6 6 6 6

Answer style variable

Q:

What is Aida’s job?

Short

:

Sentence

:

Lex change

: engineer Aida’s job is engineer Aida’s profession is engineer

Task: Database query Igor Russia actor dog soccer Aida MIT truck cat Fred Canada Megumi surgeon SUNY SB sedan goldfish basketball

Requirements for this expt

“Wizard” blind to condition (“system” or “partner”) The task had to avoid labeling the attributes A set of rules for generating responses Responses had to be produced quickly!

The task had to be started in an orderly way (using a confederate) Thoroughly debrief about deception

Rules for responses

prompt:

Is anyone there?

to start:

Hello - what do you want to know?

compound: Answer 1st part, then

What else?

unintellig: domain: etc.: thank you: goodbye:

What was that again?

I can’t answer that No comment you’re welcome bye

Results

How people started their conversations:

Human partner: Computer partner: 100% full sentences 50% full sentences 50% telegraphic

More results (people adapt over time)

During first half, people used the same proportion of complete sentence questions overall (about 75%).

But during second half, they used significantly more full sentence questions in the sentence answer style than in the short answer style.

More results:

Pronoun use

just as many 3rd pers. pronouns (he, she, it) w/ partners thought to be computers as w/ partners thought to be human.

So people had expectations that partners should be tracking the

task context

, whether computer or human.

Pronoun use, continued

BUT many fewer 1st and 2nd person pronouns were used to computer partners than to human partners!!

e.g.

“Can you tell me what Aida’s job is?”

So people have different expectations about the

social context

with computers.

Results: Another difference

People were much more likely to organize their questions by

topic

(and announce this) to human than computer partners.

e.g.,

Now, Aida.

What’s Aida’s favorite sport?

Results: Yet another difference

For all three response styles, people were more likely to

acknowledge

partner’s responses w/ humans than computers. e.g., Subject: Partner: Subject:

what is Megumi’s car?

convertible ok

Lexical entrainment

When they had an immediate opportunity to ask about the same attribute, people switched to the partner’s term about 60% of the time, whether the partner was human or computer.

“Human” partner, short response style

Howdy, are you ready?

What do you want to know?

What does Igor have a degree in?

BA, in fine arts

ok, i’m not too sure what you’ve got there, but what i have here is a list of people and i’m going to start by asking you: what do you know about Igor?

What do you want to know?

Next is Ellen. What degree?

Ph.D. in psychology

sport

weight-lifting

“Computer” partner, sentence style

Hello

What do you want to know?

What is Ellen’s hobby?

Ellen’s hobby is weight-lifting

Igor’s school

Igor’s school was the Sorbonne

Next is Megumi. What nationality?

Megumi’s nationality is Japanese.

Ellen degree

Ellen’s degree is a Ph.D. in psychology

What does he or she drive?

Megumi drives a convertible

Conclusions:

Full sentence responses lead to more well formed questions.

People expect partners to track dialog context (pronoun coverage is required).

People are not fooled into thinking that the computer can deal w/ social context.

Expectations are both similar & different.

Wizard of Oz methodology: Pros & Cons

 test something without having to build it  can be difficult to run  you need a consistent set of response rules to avoid bias  “system’s” reaction times may be slow, and so be too unlike a real system  sometimes subjects catch on

Effects of message style on user’s attributions toward agents (Brennan & Ohaeri, 1994)

Do natural language interfaces cause people to anthropomorphize computers?

(recall Shneiderman’s caveats!)

Task:

Make airline reservations (6 scenarios) using a text-based natural language interface that you log on to.

3 styles of responses

Telegraphic (leaves out words) Enter first request. Unknown word: “travdl” Fluent (uses complete sentences) Please enter your first request.

The word “travdl” is unknown.

Anthropomorphic (refers to I) How may I help you? I don’t know the word “trvdl”

Measures

# of scenarios completed # words, time per scenario # second-person pronouns # indirect requests and polite terms ratings of partner’s intelligence acceptability ratings of utterances e.g.

Could you recommend some good hotels?

# complete-sentence turns

Results (Brennan & Ohaeri, 1994)

1. All conditions were equally successful.

2. Word counts: 50.6, 59.8, 72.3

(Telegraphic = Fluent < Anthropomorphic)

So the anthropomorphic condition was less efficient than the others.

Results

(continued)

3. Refs to computer as social partner

(you)

2.7, 2.5, 6.6 (Teleg. = Fluent < Anthro.) 4. Indirect requests and politeness 7.1, 13.0, 21.6 (Teleg. < Fluent < Anthro.) 5. Anthro. led to 2X complete sentences than Teleg., with Fluent in between.

(So the styles lead to quite different dialogs)

Results (Brennan & Ohaeri, 1994)

6. No differences in intelligence ratings, for both explicit and implicit measures

(So while the styles lead to different linguistic behavior, they don’t necessarily mislead people.)

Conclusions: