An Oxygenated Presentation Manager Larry Rudolph Oxygen Workshop, January, 2002

Download Report

Transcript An Oxygenated Presentation Manager Larry Rudolph Oxygen Workshop, January, 2002

LCS
An Oxygenated
Presentation Manager
Larry Rudolph
Oxygen Workshop, January, 2002
1
Larry Rudolph & Shalini Agarwal
Goals & Overview
• Integrate Many Oxygen Technologies
• Application Driven
–
–
–
–
Use an application that we understand
Personally use often
Would help if were more human-centric
Portable (as opposed to E-21)
• Develop Architectural Infrastructure
– Exposes new requirements
• Critique of Presentation Manager
– What is wrong with it
– What needs improvement
2
Larry Rudolph & Shalini Agarwal
LCS
Application Scenario
3
Larry Rudolph & Shalini Agarwal
LCS
An Oxygen Application
LCS
Components
– Input
* Vision
* Speech
* Touch
– Output
* Projector
* Handheld
* Archive
– Processing
* Changing
configuration
Equipment
– Today, it is too hard 
–Linux laptop; windows laptop; camera;
microphone; network; projector; power blocks
– Tomorrow, much easier 
–a couple of H21’s
4
Larry Rudolph & Shalini Agarwal
Camera watching laser point on screen
LCS
• Camera Challenges
* Inexpensive ones have wrong focal length
* Alignment issues
• Use edge of screen, display pattern, figure out from what is
known to be visible
• We ended up displaying a pattern of concentric circles
* Relative size of laser point depends on distance
• Beyond ten feet, had to use only certain types of lasers
• Could slow-down camera and let pixels saturate (too
complicated)
5
Larry Rudolph & Shalini Agarwal
Camera watching laser point on screen
(cont)
LCS
• Camera Interface
* Click at point (x,y)
• Hold laser at same location for 5 seconds
* Select horizontal line ( (x1,y1) , (x1,y2) )
• Sweep laser back and forth, line is diameter of ellipse
* Select object centered at point (x,y)
• Sweep laser in circle, point is center of circle
* Previous or Next
• Click in left (right) 1/8 of screen
6
Larry Rudolph & Shalini Agarwal
Microphone listening to speaker
• Microphone
– Many technologies;
– Lapel-mic; mic array; room microphone
– Current approach: ipaq
* Continuous recognition
* Push to speak
• Audio server on ipaq
– Detects start and stop
* Best results when human pushes to start and releases to stop
– Audio wave file sent to Galaxy speech system
• Galaxy output actions via CGI-script
– A nice unifying mechanism
– One more complicated component
7
Larry Rudolph & Shalini Agarwal
LCS
Speaker controlling presentation via ipaq
LCS
• Ipaq output to CGI-script Server
– Same actions as from speech server
• Action are
*
*
*
*
*
Next slide, Previous slide, Goto slide #n, Goto slide named <xxx>
Next item, Previous item, Goto item #n, Goto item named <xxx>
Next animations, previous animation, goto animation #n
Start presentation <name>, End presentation, Pause presentation
Initialize Camera, test microphone
• Handheld (Ipaq) display
– GUI generated from speechbuilder grammar
– List of slides, items per slides
* Currently use ad-hoc solution where power-point sends
lists to ipaq. Need more automatic solution
8
Larry Rudolph & Shalini Agarwal
Output to projector, handheld, archive
LCS
• Unlimited number of video / audio output producers
– E.g. powerpoint just one producer of output
– At any time, each output device has an associated producer
* This producer can receive input from several producers
• Handheld has proxy
– To reduce bandwidth to ipaq
– Current slide, list of slides, list of commands
• Archive
– Each slide shown, audio (from a different microphone) sent to
archive
* Currently just gif of current slide
9
Larry Rudolph & Shalini Agarwal
Processing – controlling session
LCS
• Do not let powerpoint control the world
– Slide viewer; movie player; program execution; browser; etc
– Want to mix all types of applications
– Presenter has control of the output
* Eg: Switch output producer from powerpoint to media
player
• Remove interrupting technologies
– Dynamically disconnect any input / output source
• All done via core language
– Or some other glue language, e.g. meta-glue
– Which does all the other infrastructure issues
10
Larry Rudolph & Shalini Agarwal
Multi-Modal Input
Shalini Agarwal
Oxygen Conference
January 8th, 2002
Initial Experience With Presentation
Manager
LCS
One Single Monolithic Context
Command within slide, between slides, between applications
Problem
Too many false positives
Preliminary Solution
Slide tracking
* e.g. recognize “Next Slide” command only after at least 60% of words on slide
have been said
* e.g. recognize “Show Demo” only after slide 17
Still lots of problems
* Many slide styles hard to track (e.g. figures not words on slide)
* Tracking for within slide different than for between slides
12
Larry Rudolph & Shalini Agarwal
A Better Solution: Multiple Contexts
LCS
Very Active Research Area
Intelligent-room project; Galaxy; Others
Three layers, each having its own context
1. Slide (Next Item, Next Animation)
2. Presentation (Next Slide, Goto Conclusion, Goto Example)
3. Session (Start Presentation, Switch to Browser, Show
Questions)
Challenges
– Each context requires its own speech recognition system
– Multicasting sound wave to each system
– Selecting the best result
13
Larry Rudolph & Shalini Agarwal
Extending the Galaxy System
LCS
• Start with context for speech and then extend
• Note, our goals are similar but not identical to those of the
Spoken Language Group
Language
Generation
• We are not dialog-based
Speech
Dialogue
• Exploit their work
Management
Synthesis
• Follow Galaxy
• Recognizer scores
different guesses at
words
• Language Processing
Unit uses input
grammar to select
best input sentence
• Scott Cyphers gave us
the nbest interface
14
Larry Rudolph & Shalini Agarwal
Audio
Hub
Speech
Recog.
Database
Server
Discourse
Resolution
Language
Processing
LCS
Recognizer
chooses 10 best
guesses at word
matches (for this
context)
Language
Processor picks
best sentence
from recognizer
based on input
grammar
15
Larry Rudolph & Shalini Agarwal
System Structure
Recognizer
Sound Input
16
go to slide nine
go to twenty nine
go to nine
Larry Rudolph & Shalini Agarwal
LCS
Language
Processor
go to slide nine
Presentation
Layer
System Structure
Recognizer
next item
next movie
previous item
Recognizer
Sound Input
go to slide nine
go to twenty nine
go to nine
Recognizer
end presentation
start presentation
start explorer
17
Larry Rudolph & Shalini Agarwal
LCS
Language
Processor
next item
Slide
Layer
Language
Processor
go to slide nine
Selector
Language
Processor
start presentation
Presentation
Layer
start
presentation
Session
Layer
System Structure
Recognizer
next item
next movie
previous item
Recognizer
Sound Input
go to slide nine
go to twenty nine
go to nine
Recognizer
end presentation
start presentation
start explorer
18
Larry Rudolph & Shalini Agarwal
LCS
Language
Processor
next item
Slide
Layer
Language
Processor
go to slide nine
Selector
Language
Processor
start presentation
Presentation
Layer
start
presentation
Session
Layer
Add Recognizer for T9
Recognizer
LCS
Language
Processor
next item
T9 Input
Recognizer
Language
Processor
go to slide nine
Sound Input
Recognizer
Slide
Layer
Selector
Language
Processor
start presentation
19
Larry Rudolph & Shalini Agarwal
Presentation
Layer
start
presentation
Session
Layer
Add Recognizer for Graffiti
LCS
Language
Processor
next item
Slide
Layer
Recognizer
T9 Input
Language
Processor
go to slide nine
Sound Input
Graffiti Input
Selector
Recognizer
Language
Processor
start presentation
Recognizer
20
Larry Rudolph & Shalini Agarwal
Presentation
Layer
start
presentation
Session
Layer
Other Input Modes
LCS
• T9 (telephone keypad)
– To input a, b, or c press “2”;
– Current cell phones have dictionary to select
correct word
– Lots of false positives (very annoying)
* Remember my introduction?
– Using an application-dependent grammar would reduce errors
• Pen-based character input
–
–
–
–
21
Use strokes to input characters
Current palm pilot only recognizes “Graffiti” alphabet
Lots of false positives (very annoying)
Using an application-dependent grammar would reduce errors
Larry Rudolph & Shalini Agarwal
Replacing the Recognizers
LCS
• Build recognizers for T9 and Graffiti
• Use Galaxy system to process results from new
recognizers
Language
Generation
Speech
Synthesis
Audio
Dialogue
Management
Hub
Database
Server
T9
Recog.
Speech
Recog.
Graffiti
Recog.
22
Larry Rudolph & Shalini Agarwal
Discourse
Resolution
Language
Processing
Conclusion
LCS
• Each application defines an input grammar
• This grammar can be used to
– Ensure that each application gets valid input
• It might not be what the user wanted, but the application will
understand it
– Reduce false-positives
– Identify the input suitable for associated application
• Choose the application with the highest score
• If tie, must do something else (future research)
– Enable T9, Graffiti, Speech, other input modes
23
Larry Rudolph & Shalini Agarwal
Critique of Presentation Manager
Vision / Gesture Recognition
• Laser Pointer
– Great for drawing attention to content
* Audience is primary consumer
* Secondary use to control
presentation
– But it is not a mouse
* Semantics are tied to slide context
* Differs from Intelligent-room use
– Small number of identified gestures
* Gestures easily punctuated
– Low computational overhead
* Soon will be handled with a H21
25
Larry Rudolph & Shalini Agarwal
LCS
Critique of Vision / Gesture Recognition
•
Laser Pointer
– Great for drawing attention to content
* Cheap technology but mostly distracting
* Too shaky, imprecise
– But it is not a mouse
* More awkward to use than mouse
* Another gadget to hold in the hand, button to
identify, batteries to maintain
– Small number of identified gestures
* There are better ways of drawing attention to
slide content
* I rarely use it and don’t like it when others do
– Low computational overhead
* Dumb vs Intelligent Device Discussion
26
Larry Rudolph & Shalini Agarwal
LCS
Speech Recognition
LCS
• Initially seems like great idea
– Speaker is already speaking, so can use it to control presentation
• Want passive, intelligent listener
– Not a dialog
– No “prompt” :: alienating distraction
• Want no mistakes
– For dialog, better to guess than ignore
– For us, high cost for incorrect guess
– Most words are not relevant to speech system
• More trouble than it is worth
– But may be good for real-time search of content
27
Larry Rudolph & Shalini Agarwal
More useful aspect – Output modalities
LCS
• Presenter has put the time and effort into the production
– Simplier is better
• Audience has harder task
–
–
–
–
–
–
Understand material being presented
Record thoughts, impressions, connections
Filter for later review
Process in real-time
Keep-up with presentation
Do all this with minimal distractions
• Output modalities
– Content for live audience
– Content for speaker (superset of audience)
– Content for retrieval
* Correlate notes with content
28
Larry Rudolph & Shalini Agarwal
Record and correlate notes with
presentation
29
Larry Rudolph & Shalini Agarwal
LCS
CORE:
Communication Oriented Routing Environment
(Oxygen Research Group)
Assumptions
–
–
–
–
–
31
Actuators / Sensors (I/O) in the environment
Many are shared by apps & users
Many are flaky / faulty
“User” does not know much about them
Environment, application, users desires change over time
Larry Rudolph & Shalini Agarwal
LCS
An Oxygen Application
• Interconnected Collection of Stuff
• Who specifies the stuff?
– I don’t know, but its mostly virtual stuff
– Many layers of abstraction
* “Don’t ask, its turtles all the way down”
• Two main layers of programming
– Professionals
– Users, e.g. grandmother
32
Larry Rudolph & Shalini Agarwal
LCS
Communications-Oriented Programs
LCS
• Connecting the (virtual) stuff done by user
– Home stereo / theater analogy
* Plug Stuff together; unplug it if doesn’t work
* Don’t like it, unplug it
• Device drivers, services, clients, don’t know to whom or to what they
connect
– In client/server model,
* server knows a lot about the client,
* the client knows even more about the server
• Extend Unix Pipes
33
Larry Rudolph & Shalini Agarwal
CORE
LCS
ProgramsDevices
Physical
(Processes)
Other COREs
App
Larry Bear’s
CORE
Larry Bear
34
Larry Rudolph & Shalini Agarwal
App
CORE
Message Flow
• Messages flow between nodes & core
– Core is both language and router
• Within Core Router, some messages
– are interpreted and may trigger actions
– other messages get routed to other nodes
• Request-Reply message strategy
– Even number of messages
– No reply within time period, means error
35
Larry Rudolph & Shalini Agarwal
LCS
CORE Language Elements
•
Four elements
1.
2.
3.
4.
•
Features
–
–
–
36
Nodes,
Links,
Messages,
Rules
Interpreted Language
Statement is a message & reply
Each element has an inverse
Larry Rudolph & Shalini Agarwal
LCS
Node
LCS
handler = (nickname, specifier)
Nodes – Specify via INS
Cam = [device=web-cam; location=518;…]
PTRvision = [device=process; OS=Linux;File=Laser Vision, ..]
Presentation
Speech
Slide
Speech
Command
Speech
CORE
Laser
Vision
37
Larry Rudolph & Shalini Agarwal
Node Statement Handler
• When ‘node’ message arrives
– Verified for correctness (statements allowed)
– Routed to Node Manager (just another node)
• Node Manager
– INS lookup, verifies if allowed, creates if needed
– Creates core thread to manage communication with node
– Bookkeeping & reply message with handle/error
38
Larry Rudolph & Shalini Agarwal
LCS
Links
LCS
Lcamera,vision = (Cam,PTRvision)
Presentation
Speech
Slide
Speech
Command
Speech
CORE
Laser
Vision
39
Larry Rudolph & Shalini Agarwal
Link Statement Handler
• Message routed to ‘link’ manager
• Two queries to node mng for thread cntl
• Message to thread controller of source node
– Specifying destination thread controller
• Message to thread controller of dest node
– Specifying source thread controller
• Bookkeeping & reply message handler/error
40
Larry Rudolph & Shalini Agarwal
LCS
Messages
LCS
Messages flow over the links
Next
Slide!
Presentation
Speech
Slide
Speech
Command
Speech
CORE
Laser
Vision
41
Larry Rudolph & Shalini Agarwal
Message Handling
•
•
•
•
Messages can be encrypted
Core statement messages have fixed format
Everything else is data message
Each node thread has two unbounded buffers
– Core to node & Node to core
• Logging, rollback, fault-tolerance
42
Larry Rudolph & Shalini Agarwal
LCS
Rules
LCS
RULES: (trigger,action)
( MESSQuestion , Lslide,lcd -- & Lslide,qlcd )
Presentation
SpeechQuestions
Slide
Speech
Command
Speech
Questions
CORE
Questions
Laser
Vision
43
Larry Rudolph & Shalini Agarwal
Rule Statement Handler
• ( trigger , consequence )
• Both are “event sets”
• Eight basic events:
+Node, -Node, +Link, -Link
+Message, -Message, +Rule, -Rule
• Event set is a set of events
• Trigger is true when events are true
• Consequence makes events true
44
Larry Rudolph & Shalini Agarwal
LCS
Rules – A link is a rule
LCS
• A message event is of form
(node, message specifier)
( message specifier , node )
– Message came from or going to node
• A link (x,y) is just shorthand for the rule:
+( x , m )  ( - (x, m) , +(m , y) )
If a message m arrives at node x, then make that event false (remove
the message) and make the event of m arriving at y from core true.
45
Larry Rudolph & Shalini Agarwal
Rules – Access Control Lists
LCS
• An access control list is just a rule
• When messages arrive at node, if they arrive from valid
node, then allowed to continue to flow.
• Modifying access control lists is just adding or removing
rules.
46
Larry Rudolph & Shalini Agarwal
Rules
LCS
• Rule statement gets sent to rule manager
• Event set is just another shorthand for rules
• Rule manager sends command to trigger node thread
that tells it about the consequence
• Rules are reversible
47
Larry Rudolph & Shalini Agarwal
Reversibility
LCS
• Each statement is invertible (reversible)
• If there is an error in the application specification, then
can undo it all.
• General debugging is possible with reversible rules and
message flow
48
Larry Rudolph & Shalini Agarwal