Basic Representations of Music & Audio (1)

Download Report

Transcript Basic Representations of Music & Audio (1)

The General Temporal Workbench
(formerly General Multimedia Workbench)
A Universal System for Exploring Time-based
Phenomena
Donald Byrd
School of Informatics & Jacobs School of Music
Indiana University/Bloomington
2 Jan. 2010
1
Introduction: Time is Of the Essence
• Gandini Juggling’s Mozart (Symphony no. 25, 1st mvmt) performance
• “When you hear music, after it’s over, it’s gone, in the air. You can
never capture it again.” —Eric Dolphy (1964)
• Likewise for all complex temporal phenomena
• …and timescale can be microseconds or millions of years
• What if you really want to think about what happened (or, for creative
art, you want to happen)?
• Need a way to “freeze” it
– Playing a recording over & over isn’t enough!
• Obvious answer: visualization—but what’s the best way?
• …and is visualization the only answer?
10 Sep. 2009
2
Motivation: We Have Big Problems
• Long-standing, difficult problems in all fields
• …plus deluge of data in many fields
– Even arts & humanities are getting lots of hard data
– Promises to help, but not much help so far!
• What we need is insight, not data; how to get there?
– Widely recognized as an important goal
• The cross-discipline argument
– Problems in all fields have much in common => a general system
could be very valuable, if it’s possible
– A general system is possible
• The cross-creativity/analysis argument
– Problems of creators & analysts in a field have much in common
=> system for creation & analysis also very valuable, if possible
– A system for both is also possible
3 July 2009
3
Examples
• Create/teach performers/rehearse/study a multimedia show
– Gandini Juggling’s Mozart Symphony
– …or marching band, or dance w/ music & lighting effects, etc.
• Study Hendrix’s Star-Spangled Banner (VAT vs. published
transcription)
• Look for patterns in patient’s medical history (Lifelines)
• Research on embodied language acquisition (Chen Yu)
• Study or learn role in opera/musical (GTW simulation)
• Organize ethnomusicology field research (EVIA AWB)
• Study world events (JFK assassination, Salem Witchcraft
Trials)
2 Jan. 2010
4
Jimi Hendrix’s Woodstock Star-Spangled Banner
(a) Timeline overview w/ labeled segments (Variations Timeliner)
(b) In music notation, guitar tablature, & words (published transcription)
28 Feb. 2009
5
Timelines (1)
Applications of “timelines” in a broad sense
–
–
–
–
–
–
–
–
–
–
–
–
audio editing (a few hundred millisec. to a few min.)
juggling (a few seconds; vertically oriented)
video & motion data of two participants interacting in lab (seconds)
“bubble” diagrams of structure of pieces of music (minutes)
movie/video of animal behavior, interview, show, etc. (minutes to
hours)
video annotation (hours)
weather (hours to days)
appointment calendar (week or month; 2-D)
the assassination of President Kennedy (a few days)
Salem witchcraft accusations (a month)
personal history: medical, criminal justice, etc. (years)
dinosaurs (tens of millions of years)
28 Mar. 2009
6
Timelines (2)
Assassination of President Kennedy (SIMILE Timeline)
12 Mar. 2009
7
Concrete & Abstract Forms in Different Fields
• Symbolic forms in music vs. text (& other areas)…
Which came first?
Symbolic form
is… (Seeger,
1958)
Text
Music
Symbolic
Prescriptive
script
Score/performance
parts, “sheet music”
Real-time
(performance,
speech, etc.)
Descriptive
transcript
transcription
• Real-time = low level (concrete)
• Symbolic = high level (abstract)
• Very high-level: segmentation w/ labels
• Ex: Hendrix Star-Spangled Banner
• Concrete & abstract forms are useful for all temporal phenomena
14 Apr. 2009
8
Different Fields Have Much in Common
•
From music to remote disciplines in small steps, for three aspects of
music
•
•
•
Steps shown aren’t unique—many paths are possible
All are complex enough that no one way of “looking” at it can capture
everything
For all, people often want to compare two or more instances of the
phenomenon
15 Apr. 2009
9
Solution 1: Better Human/Machine
Partnerships
Above figure is from Yu et al (2008), slightly modified
• Integrate info visualization & analysis/data mining (Shneiderman 2002;
Yu et al 2008) => closed loop: use visual perception to generate
hypotheses for analysis; present results of analysis visually
– Cf. browsing vs. searching dichotomy; HCIR, visual analytics
• …or substitute synthesis for analysis, e.g., for composers
7 Apr. 2009
10
Solution 2: Allow All Sensory Modes
• Visual: visualization is most generally useful, but not the only answer
• Auditory: sound is central for many applications
– sonification is surely valuable for some non-audio phenomena
• Tactile: important for the blind
• Other (olfactory): important for ??
• Don’t rule out support for all sensory modalities
14 Mar. 2009
11
Solution 3: Don’t Reinvent (or do without)
the Wheel!
• Problems of all temporal phenomena have much in common
• …but people rarely share ideas or software across disciplines
– Issue of “disjoint technical vocabulary/literature” (cf. Swanson 1988)
– Value of exploratory search (cf. Jeremy Pickens, etc.)
• Idea: a “General Temporal Workbench” (GTW)
– Formerly General Music/General Multimedia Workbench (GMW)
– Supports multiple: coordinated, editable, interactive visualizations &
sonifications (eventually “tactilizations”, “olfactizations”?)
– …of multiple instances
– …of any combination of temporal phenomena
– …plus data mining & analysis (Solution 1)!
• For creative applications or analytical applications?
– Both—the design is neutral
– NB: a misleading question: insightful analysis involves creativity
15 May 2009
12
Use Existing Timeline Software
• There’s an endless variety of timelines
–
–
–
–
Orientation: horizontal, vertical, 2D, other
Spacing: linear, logarithmic, piecewise linear, etc.
Multiscale coordinated
Playability: audio, video, Flash, etc.
• Useful generic “timeliners” can go far beyond the basics
– Ex: SIMILE Timeline & Timeplot
• Even doing “the basics” well isn’t that easy
– Ex: axis tick marks & numbers for them
• Can consider all time-domain displays as variations on
timelines
• But what is there besides time-domain displays?
11 Sep. 2009
13
Use Existing Frequency Domain Viewers
• Frequency domain = patterns in time domain
– Example: economic cycles; Kondratiev’s theory of periodic collapses
of capitalist economy :-)
– Time/frequency domain (hybrid) more useful than pure
• Visualization example: spectrogram (via Fourier analysis)
• Very well-known in hard sciences, less in soft sciences
• …almost unheard of in arts & humanities, & by public
– Exception: computer music
• Are periodic changes in cultural areas plausible?
– Politics: U.S. House of Reps. elections correlate w/ franked mail
– Direct experience important => periods of 1-2 generations(?)
– Higher education population turnover => periods of 4 years(?)
• …or periodic changes of blood glucose for diabetics?
– Type-1 diabetics do constant self-medicating => need user-friendly
tool
10 Jul 2009
14
What Fields are Candidates for a GTW? (1)
• What fields can really benefit from synergy of
“not reinventing the wheel”?
• Relevant features
1. Complex enough that no one way of “looking” at it can
capture everything, i.e., needs multimodal access
2. people often want to compare two or more instances of
the phenomenon
3. (less important) specialized graphical notation(s) are
widely used for symbolic form
19 Feb. 2009
15
What Fields are Candidates for a GTW? (2)
• How many fields have at least Features 1 & 2?
• An unusual example: juggling (Juggling Lab)
– No single way of looking at it can capture all the information
• Standard: video and/or animated stick figures
• Optional: notation (“siteswap”), timeline showing paths of balls
– People often want to compare versions of a trick
– General human & animal movement is really complex
• Conclusion: all non-trivial fields have 1 & 2; very
many have all three.
– Speculation: area w/ over (say) 100K person-years of serious
interest probably has enough complexity to have Features 1 & 2
– Speculation: area w/ over (say) 500K person-years of serious
interest probably has enough complexity to have all three
19 Feb. 2009
16
HCI: Multiple “Visualizations” Can be Great or
Worthless
• Parable of blind men & elephant
• Point of multiple visualizations: let the user put the pieces
together & “see” big picture
– The more different the visualizations involved, the better…
– But the more different the visualizations, the greater the danger of
user getting confused!
– Ease of navigating between is critical => need coordination
– Often helpful to have a (small) overview on screen
– Ex: viewing modes in PowerPoint
– Ex: “Scrollbar with confetti” (Byrd 1998) gives overview with (v.
often) no additional screen space
• Similar principles apply to sonifications, etc.
1 Mar. 2009
17
The Ultimate Music-and-More System (1)
• If system could do “everything” with music, should be
useful for lots besides music!
• Not just useful for many domains, allows synergy/leverage
–
–
–
–
Related to “abstraction”, “modularizing”, “factoring”
= breaking problem down into separate parts
Cf. high-school algebra
Result: no need to reinvent the wheel
• Example: timelines
• Example: apply frequency-domain approaches & software
in many fields
• But could a program do that? Is this practical?
26 Mar. 2009
18
The Ultimate Music-and-more System (2)
• Practical iff it can be broken down to independent
chunks
• Modular design (in layers) is vital
• Architecture plan for GTW
I. Completely general framework: no knowledge of
domain
II. Generic domain-specific modules for file I/O, support
for low-level modules, maybe “automatic” alignment
III. API for user-written analyzers & “visualizers”
18 Feb. 2009
19
Architecture for a General Visualizer/Analyzer (1)
•
•
Configuration = software + UI (windows, etc.)
Software for common audio & video uses…
15 Mar. 2009
20
Architecture for a General Visualizer/Analyzer (2)
For sequential art & movies based on sequential art (John Walsh)…
15 Mar. 2009
21
GTW “Screenshot” 1
• Scenario: music-informatics researcher (or ethnomusicologist)
comparing two audio-segmentation algorithms
• …or composer comparing input & output of synthesis programs
20 Apr. 2009
22
GTW “Screenshot” 2 (& “Demo”)
• Scenario: singer (or conductor) comparing videos of
performances to learn role in opera, musical, etc.
• …or stage director, choreographer, or lighting designer comparing
previous versions to own ideas
• …or scholar studying performances (perhaps juggling w/ music!)
4 Apr. 2009
23
What’s Special About the GTW?
•
Design (& planned implementation) for our “solutions”
–
–
•
Better human/machine partnerships: tight coupling of visualization & analysis
Don’t reinvent wheel: any presentations of anything temporal in any combination
Factors, roughly from most to least fundamental:
1.
2.
Architecture separates framework from assumptions about use (domain
knowledge)
Support for rapidly changing focus between very different visualizations at vastly
different scales
–
–
3.
4.
Configuration files set up internals and UI; experts can create for each use case
Designed to support comparing “similar” documents
–
5.
6.
7.
Can automatically adjusts (in own windows) which visualizations take screen space &
sizes/layouts of interfacing programs’ windows
Support for showing relationships between features in different views
Doesn’t assume consistency between coordinated representations
Can act as “slave” (client for, e.g., SEASR/Meandre, Max/MSP, Pd) or master
Fully multimodal: presentation in non-visual form (sonification, Braille) on same
basis as visualization
Analysis modules can communicate w/ presentation modules
16 Mar. 2009
24
The Truth: The GTW Can’t Do Everything
• ... but it can enable YOU to!
• Catch: needs technology for your field
– Level II (domain knowledge)
– Level III (user interaction)
• Vast majority of users aren’t technology experts
• Solution: user communities
– Enables experts in each field
• Something like an operating system
– Not so hard because much more synergy (=> less new
work per field) than now
2 Jan. 2010
25
Similar Tools for Non-Temporal Phenomena
•
Existing, very general tools for other situations
– Network Workbench (Katy Börner/IU SLIS): visualize/explore
networks
– Google Map API: visualize/explore “space” (surface of the earth)
– Both have proven very useful
– But many phenomena have temporal and non-temporal aspects
•
•
T. & network: artificial life, computer games, studying software
(debugging, etc.): traversal => temporal form
T. & spatial: folksong style vs. region of origin, art or general
history, etc.
–
•
Cf. Timemap (Google), Salem Witchcraft Accusations webpage
All three: public health (as in epidemiology)
– GTW could be used with other tools
2 June 2009
26
Getting Off the Ground
• Working on prototype, based on EVIA Annotators
Workbench; also Variations, CIShell, Chen Yu’s system for
“visual mining of multimedia data” (all from IUB)
• Other possible open-source starting points
– Sonic Visualiser, SyncPlayer, SIMILE Timeline, etc.
• Connections to general tools for nontemporal visualization
– Network Workbench (Katy Börner/IU SLIS): networks
– Google Map API: “space” (surface of the earth)
• Connections to other general tools
– Meandre for SEASR (UIUC): humanities/social science research
– Max/MSP, Pd: musical audio
2 Jan. 2010
27
Conclusions
• How do I know applications are realistic?
– Many probably aren’t, but many, many possibilities exist!
– Have ca. 30 usage scenarios, ca. half written/endorsed by experts
– Some examples (all by experts)
•
•
•
•
Ruth Stone: ethnomusicology field work
Amar Flood: nanoscience/nanotechnology
Larry Yaeger: artificial life
Elaine Chew: annotating video of computer-aided musical
improvisation
• John Walsh: comic books/movies
• Philip Gossett: musicology
– Personal knowledge/experience for a few
2 Jan. 2010
28
End
• Thanks to Geoff Chirgwin, Will Cowan, Allen Winold,
Paul Sturm, &…
• THE END
20 Feb. 2009
29
Extra Slides
•
Following slides are just in case…
rev. 18 Feb. 2009
30
Good Design for Music Can Be Good for Many Things
•
1.
2.
3.
4.
Cf. “Why Studying Music is Both Difficult and Important” (Byrd 2009)
Music is an art => people use elements in unusual ways
Music is a performing art => performances & symbolic representations
Much music has complex synchronization requirements
Music involves many different instruments, often in groups. Leads to:
–
–
–
5.
6.
Arrangements/transcriptions for other instruments
Versions for players with different levels of skill
Notation may represent sounds or actions
Music is often combined with text via singing, narration, etc.
Music is extremely popular, so:
–
–
Some works exist in many versions, arrangements for different ensembles, etc.
Handling challenges is important, even on purely economic grounds
rev. 18 Feb. 2009
31
HCI: Searching, Browsing, & Visualization
• Visualization is essential for browsing, merely helpful for searching
• In browsing, user finds everything; the computer just helps
• Browsing is obviously good because it gives user more control, but few
systems emphasize it. Why?
– “Users are not likely to be pleasantly surprised to find that the library has
something but that it has to be obtained in a slow or inconvenient way.
Nearly all items will come from a search, and we do not know well how to
browse in a remote library.” (Lesk, p. 163)
• For “and”, read “as long as”
• Searching is more natural on computer, browsing in real world
– Effective browsing takes very fast computers—widely available now
– Effective browsing has subtle UI demands
• Cf. HCIR, visual analytics, visual searching, etc.
7 Mar. 2009
32
Why juggling? Who cares?
• A surprising domain, but realistic
– Features 1 & 2 apply
– Feature 3 applies in part: has established (though not graphic)
notation, “siteswap”
• Many juggling programs available
• GTW framework has support for:
–
–
–
–
–
Control of tempo, including pausing or going backwards
UI for (temporal, not spatial) zooming in on details
Synchronization of multiple videos and/or animations
Framework for auto. synchronization
Framework for combining independent visualizations
• Animal motion in general is much more complex => more
need for GTW!
– Ex: dancing (Labanotation, etc.)
Structure in Basic Representations of Music &
Audio
Audio: no explicit structure
MIDI: simple, regular, welldefined structure
Western Music Notation: very complex,
irregular structure; some parts welldefined, some not—and what’s welldefined isn’t well-defined
10 Feb. 09
34
Basic Representations of Music & Audio
Digital Audio
1. Audio (e.g., CD,
MP3): like speech
2. Time-stamped
Events
Time-stamped
Events
(e.g., MIDI file): like
unformatted text
3. Music Notation:
like
Music Notation
text with complex
formatting
• Time scales of graphs: #1, milliseconds; #2 & 3, seconds
• Essential difference among forms: “knowledge representation” =
explicit structure
10 Feb. 09
35
“Isn’t it a mistake to use music notation this way?”
•
•
•
•
•
•
Chris Raphael’s question about Hendrix transcription
It’s obviously useful: easy to find phrases, “Taps”, etc.
…but seriously misleading in places
But CMWN is “always” misleading!
Is it useful enough to justify danger of misleading?
Knowledge representation has inevitable bias (Davis et al
1993); notation has more bias (Wiggins et al 1993)
• Fundamental issue of transcription in ethnomusicology
• Conclusion: use it, but be careful
– Cf. my “Logician General’s Warning” on classification
– …in fact, transcribing requires classifying constantly
12 Feb. 09
36
Sequential-Art/Movie: The Hard Goodbye (1)
• From Frank Miller’s “Sin City” series
• John Walsh (SLIS): want to compare comics, movies of them, etc.
18 Feb. 2009
37
Sequential-Art/Movie: The Hard Goodbye (2)
• From Frank Miller’s “Sin City” series
• John Walsh (SLIS): want to compare comics, movies of them, etc.
18 Feb. 2009
38
Types of Visualizations of Music (and more)
• Is visualization static or dynamic?
– Dynamic = time represented by time
– Static = time represented by space
• What features are visualized?
• What basic representation? Audio, symbolic,
both?
– Easy to generalize to plays (score = script) & other text
phenomena, dance, etc.
18 Feb. 2009
39
Types of Visualizations of Music (and more)
•
Hendrix example uses coordinated visualizations
– Generalization of parallel, aligned, synchronized, etc.
•
How are multiple visualizations coordinated?
1.
2.
3.
•
•
Parallel panes of a single window
Superimposed in a single window
Separate coordinated windows
Forms 1 & 2 apply directly to audio (incl. sonification)
Easy to interpolate between forms 1 & 2
–
Categories in the real world are rarely discrete
26 Feb. 2009
40
The Ultimate Music System
• Original goal: visualizer that can do anything with music
– Handle any no. & combination of visualizations
– Static visualizations: audio, any kind of notation, structural
diagrams, etc.
– Dynamic visualizations: video, etc.
– Automatic (or near-automatic) synchronization
– Support OS-level technologies (QuickTime, etc.)
– Easy-to-learn UI allowing high degree of control
• User may want frequent extreme zoom changes => help with
• If it could do all that, should be useful for lots (domains
with >=2 Features) besides music!
20 May 08
41