Review of Captioning Techniques

Download Report

Transcript Review of Captioning Techniques

The Accessible Technology Initiative (ATI) Presents
“Meet The Experts” on Captioning
with
Kevin Erler and Pat Brogan
Automatic Sync Technologies
June 25, 2009, 1 PM – 2 PM PDT
Please use a headset or your computer speaker.
Chat will be turned off until the end of the presentation.
Captions are provided by Bill Courtland
This presentation is being recorded and you will receive an announcement of the archived location by July 1.
If you have any questions or concerns, please email, Jean Wells, [email protected]
About the Presenters
Pat Brogan
•30+ years professional engagement
•Dissertation and Research on rich-media & distance learning
•Worked with standards on elearning (SCORM, LOM, Accessibility)
•Adjunct marketing professor SCU
•Highly influenced by motherhood, non-profit work with at-risk youth
Kevin Erler
•Engineer by training
•20+ years background in speech processing
•Focus on automation and cost reduction, without compromising quality
•Founder of Automatic Sync Technologies
© 2009 Automatic Sync Technologies
2
Agenda
1. Introduction to captioning
2. The captioning process
3. Assessing quality
4. Approaches to large scale captioning
5. Motivations for captioning
6. The media explosion
7. Choosing what to caption and transcribe
8. Funding issues
© 2009 Automatic Sync Technologies
3
A History of Captioning
• Captioning is a synchronized text representation of the audio
component of a program. Sometimes called subtitling.
• In traditional broadcast media, it is sometimes called “Line21”.
• The captioning industry was created in the early 1980s by an FCC
mandate for broadcast TV.
• The FCC mandate had a *slow* phase-in period that reached 100%
only in January 2006.
• Today captioning can be applied to many different types of media,
and many other regulations govern these different forms of media.
© 2009 Automatic Sync Technologies
4
Captioning in Education
 Media is everywhere. The use of video in education is substantial and
increasing
 Captioning is starting to proliferate: finally, many educational
institutions are starting to see the many benefits of captioning
and are increasingly taking a proactive approach.
 The value and impact extend beyond just making content
accessible for the deaf. Most users are not deaf and hard of
hearing, but use words to search, to reinforce language skills and
to comprehend better.
 Look at what we found: here are some of the dozens of
examples we found of captioning in the educational environment.
© 2009 Automatic Sync Technologies
5
© 2009 Automatic Sync Technologies
6
© 2009 Automatic Sync Technologies
7
© 2009 Automatic Sync Technologies
8
Creative Uses of Transcripts & Captions

Students retain more if they are able to 'read
ahead' and have more of the transcript visible
© 2009 Automatic Sync Technologies
9
Captioned Recorded Lecture

10
© 2009 Automatic Sync Technologies
10
© 2009 Automatic Sync Technologies
11
Captioned iPhone Apps
© 2009 Automatic Sync Technologies
12
Transcripts
© 2009 Automatic Sync Technologies
13
Some Captioning Terms
• Key element of captioning is that it is a synchronized text
representation of the audio component of a program.
o
Transcription vs. Captioning
o
Subtitling vs. Captioning
o
Open vs. Closed captioning
o
Post vs. Real-time
o
Web vs. Broadcast
© 2009 Automatic Sync Technologies
14
The Transcription-Captioning
Process
• Produce a transcript of the audio portion of the program. Need to
observe certain conventions to represent non-dialog content.
• Divide text into captions, observing guidelines about where to break
sentences. Some constraints here are dependent on the media type
and the video size.
• Synchronize captions to the video timeline.
• Create output files in the format required by your media. Note that
format is dictated by the type of player that the content will be played
on, not by the media itself.
• Encode caption data into final media. This process varies widely
depending on the media/player type.
© 2009 Automatic Sync Technologies
15
Transcription and Captioning
Process
• NCI estimates these steps will take 10 to 16 labor
hours per media hour to do using traditional
captioning methods.
• How long it takes will depend on how much
attention you pay to quality issues.
• You should assume it will not take less than 5 to
7 labor hours per hour of media.
© 2009 Automatic Sync Technologies
16
Encoding?
•
For DVD media: the captions need to be written back to your DVD – this is
done with an authoring package.
•
For tape media: (eg: VHS) the captions need to be written out to your tape
using a caption encoder. This can also be done using a software NLE
system (in some cases).
•
For traditional web media: the caption files are typically external and read by
the player so no encoding is needed. Some allow you to embed the caption
data (eg: QuickTime, Windows Media).
•
For content portals: Typically, the caption file must be uploaded separately
from the video file.
•
For mobile devices: high variability. For iPods (and iTunes), you must embed
the caption file using Apple tools before upload.
© 2009 Automatic Sync Technologies
17
Captioning Tools
• Professional Tools
o
CPC / MacCaption
• Do It Yourself
o
o
Magpie
SubtitleWorkshop
• Speech Recognition
o
CaptionMic
• Services
o
o
o
o
NCI
Vitac
CaptionFirst (Realtime)
Automatic Sync
• Content Portals
o
o
CaptionTube
Overstream
© 2009 Automatic Sync Technologies
18
Setting the context: Quality
• Quality is a continuum; trades off against cost
and time.
• “100%” is unlikely – cost is too high. But what is
acceptable?
• Your mission statement almost certainly includes
“excellence”.
• Responding to OCR complaints: can you
demonstrate due attention to serving the needs?
© 2009 Automatic Sync Technologies
19
Word Error Rate
0% Error Rate
Everyone loves a booming market, and most booms happen on the back of technological change.
The world's venture capitalists, having fed on the computing boom of the 1980s, the internet boom
of the 1990s and the biotech and nanotech boomlets of the early 2000s, are now looking around
for the next one. They think they have found it: energy.
Many past booms have been energy-fed: coal-fired steam power, oil-fired internal-combustion
engines, the rise of electricity, even the mass tourism of the jet era. But the past few decades have
been quiet on that front. Coal has been cheap. Natural gas has been cheap. The 1970s aside, oil
has been cheap. The one real novelty, nuclear power, went spectacularly off the rails. The
pressure to innovate has been minimal.
In the space of a couple of years, all that has changed. Oil is no longer cheap; indeed, it has never
been more expensive. Moreover, there is growing concern that the supply of oil may soon peak as
consumption continues to grow, known supplies run out and new reserves become harder to find.
The idea of growing what you put in the tank of your car, rather than sucking it out of a hole in the
ground, no longer looks like economic madness. Nor does the idea of throwing away the tank and
plugging your car into an electric socket instead.
© 2009 Automatic Sync Technologies
20
Word Error Rate
10% Error Rate
Boot hoses a booming market, gloved capote booms happen heart the back of technological
change. The world's venture capitalists, house fed gem's the computing boom of the 1980s, the
internet boom of the 1990s and the biotech and nanotech boomlets of the early 2000s, are now
looking around for the road one. They gaunt they have found bubonic: energy.
Many past booms have been energy-fed: coal-fired steam power, oil-fired internal-combustion
engines, the rise of electricity, even the brushy tourism of the jet era. But the past few decades
have been quiet on magic front. Coal has been cheap. Natural gas gross hoist cheap. Jennifer
1970s aside, oil has been cheap. The one real novelty, nuclear power, went spectacularly off
tabloid rails. The burping to innovate has been minimal.
In local space of a couple of years, all that has paycheck. Oil is no longer cheap; indeed, it has
never been more expensive. Moreover, there is fizzled translogic that the supply of oil may soon
peak as consumption rains to grow, known supplies run out and new reserves become zipper to
find.
The idea of growing what you put in the tank of your car, rather saber sucking it out of a hole in
grim ground, no longer looks like economic madness.
© 2009 Automatic Sync Technologies
21
Word Error Rate
20% Error Rate
Kazakhstan banter a booming estate, and most systemically happen on the back of technological
bleed. The world's venture capitalists, Italians fed on seltzer computing boom kingdom the 1980s,
the internet levy of paddy 1990s and the harder and nanotech boomlets of the early 2000s, eroded
now looking around for the buckle one. They think they limitless methodology it: energy.
Many coups booms have diastolic energy-fed: coal-fired steam power, oil-fired internal-combustion
diaries, the rise of foxglove, mindful the mass tourism of the jet windchill. Pepper ascent past few
decades pragmatic been quiet on that front. Sentences erupt gushers cheap. Natural gas has
falsifying cheap. Untruths 1970s aside, oil has been ultranationalist. The one real hoax, nuclear
power, kite spectacularly off the rails. The pressure to innovate has been minimal.
In the tinted skinner's a couple of years, looking that has changed. Oil is no longer cheap; indeed,
it has never been maximize farthingale. Moreover, there is growing concern that the supply of oil
may soon peak as consumption continues to grow, known supplies run out and new reserves
expensive actuary to find.
The idea of growing what you put in gospel tank of chaffy car, rather than sucking it out of
copayment hole in the ground, no longer looks like economic boat.
© 2009 Automatic Sync Technologies
22
Effect of Errors
Intelligibility vs Error Rate
10
9
Intelligibility Score
8
7
6
5
4
3
2
1
0
0
1
2
3
5
10
20
Error Rate (%)
Predicted
Actual
© 2009 Automatic Sync Technologies
23
Error Rates for General Captioning
Source
Typical Error
Rate
Result
Trained Stenographer
0.5% to 1%
No problems
Student transcriber
1% to 5%
(??)
Expect to be worse
than stenographer
Speech Rec: trained
3% to 5+%
Varies from acceptable
to poor
Speech Rec: untrained
20% to 40%
Unintelligible
© 2009 Automatic Sync Technologies
24
Captioning Solutions
•
Self-captioning (Do It Yourself, In-Sourcing)
•
Speech Recognition
•
Outsource
Key considerations: Error rate, cost, timeliness, scalability
© 2009 Automatic Sync Technologies
25
Captioning Solutions
•
Self-captioning
•
Speech Recognition
•
Outsource
Key considerations: Error rate, cost, timeliness, scalability
© 2009 Automatic Sync Technologies
26
Self-Captioning
• Key issues: Quality and Scalability.
• Quality: students are rarely well trained
for this task and turnover is high.
• Scalability: The amount of staff needed
to caption on a large scale is high; high
turnover compounds issues.
© 2009 Automatic Sync Technologies
27
In-Sourcing Costs
• Need to include:

Management, support staff

Equipment, space, overhead costs

Training and recruitment costs
© 2009 Automatic Sync Technologies
28
The Real Cost of In-Sourcing
 Assume 200 hours of content per month, for 8 mon/yr
 Total content of 1600 hrs per year
 7.5 labor hours per content hour
 Total labor: 12,000 hrs
 Assume student available for 4 hrs/day, 3 days/wk, 30 wks/yr
 Each student yields 360 labor hrs/yr; 34 students needed
 Assume one supervisor per 10 students
 Labor requirements:
 34 students
 4 supervisors
 1 technical support staff
© 2009 Automatic Sync Technologies
29
The Real Cost of In-Sourcing
 Capital Depreciation: $5,000 (5 work stations; 5 yr depreciation)
 Annual Support: $2,500 (10% support contract)
 Training cost: $29,250 (50% turnover per yr; $1500 to train)
 Student Labor: $144,000 ($12/hr)
 Supervisors: $96,000 ($18/hr)
 Support Staff: $33,333 ($50k/yr, for 8 months)
 Benefits/Overhead: $98,400 (36%)
 Facilities: $21,960 (1000 sq ft @ $21.96/sq ft/yr)
© 2009 Automatic Sync Technologies
30
The Real Cost of In-Sourcing
 Capital Depreciation: $5,000 (5 work stations; 5 yr depreciation)
 Annual Support: $2,500 (10% support contract)
 Training cost: $29,250 (50% turnover per yr; $1500 to train)
 Student Labor: $144,000 ($12/hr)
 Supervisors: $96,000 ($18/hr)
 Support Staff: $33,333 ($50k/yr, for 8 months)
 Benefits/Overhead: $98,400 (36%)
 Facilities: $21,960 (1000 sq ft @ $21.96/sq ft/yr)
7.5 labor hrs per video hr: $430,443 or $269 / video hr.
5 labor hrs per video hr: $181 / video hr.
© 2009 Automatic Sync Technologies
31
Captioning Solutions
•
Self-captioning
•
Speech Recognition
•
Outsource
Key considerations: Error rate, cost, timeliness, scalability
© 2009 Automatic Sync Technologies
32
Speech Rec
• Key issue is error rate.
• 3 key factors affect error:

Speaker (trained vs untrained; goat vs sheep)

Task domain (Topic)

Acoustic Environment (mic, noise, background, etc)
© 2009 Automatic Sync Technologies
33
Fixing the Speech Rec Output
• 2 approaches:

Pre-processing: train to the speaker(s)

Post-processing: edit the transcripts to repair
© 2009 Automatic Sync Technologies
34
Fixing the Speech Rec Output
• Pre-processing:

Very difficult to get faculty to participate.

Often training is not possible: Guest lecturers, 3rd
party video, one-off recordings.

Still need to deal with goats and noisy recordings.
© 2009 Automatic Sync Technologies
35
Fixing the Speech Rec Output
• Post-processing:

For untrained recognition (20%+ WER), cost to
repair is higher than cost to start from scratch.

Point at which it becomes more cost effective to
repair: less than 3% WER.

Using students to conduct repairs creates the same
issues outlined under in-sourcing.
© 2009 Automatic Sync Technologies
36
Cost of Repairing a Bad Transcript
Cost
Transcription Cost
From Scratch
Edit Cost
0
10
20
30
40
50
60
Error Rate
Example data only
© 2009 Automatic Sync Technologies
37
Consider the Total Cost
• On the surface, speech rec solutions look
appealing from a cost perspective; but
consider the total cost of the solution:

Capital cost for initial system; consider
provisioning.

Training cost (if you choose to train speakers)

Repair costs for each show (see In-Sourcing
cost structure)
© 2009 Automatic Sync Technologies
38
Captioning Solutions
•
Self-captioning
•
Speech Recognition
•
Outsource
Key considerations: Error rate, cost, timeliness, scalability
© 2009 Automatic Sync Technologies
39
Outsourcing
•
Solves the Accuracy issue, but:

Cost?

Workflow?

Reliability?

Scalability?

Service level? (speed)

Can they keep pace with technology?
© 2009 Automatic Sync Technologies
40
Outsourcing
•
Small firms tend to offer better cost structures,
but most cannot offer scalability if you have a lot
of material.
•
Large firms can better handle large volumes,
but costs are generally much higher.
© 2009 Automatic Sync Technologies
41
Conclusion
Cost
Speed
Workflow
Accuracy
In-Sourcing
SpeechRec (raw)
Speech-rec
(repaired)
OutSourcing
© 2009 Automatic Sync Technologies
42
Why Caption and Transcribe?
• Compliance with system, state and federal mandates
• Improve access to learning materials
• Provide content appropriate for different learning styles
• Support at-risk students (DSS, ESL)
• Make content more discoverable and reusable, optimize
search engine performance
© 2009 Automatic Sync Technologies
43
Transcripts
•
•
•
•
•
Needed to generate captions
Have value for all students
Searchable
Can launch audio and video
Can obviate the need for sending some
note-takers to support DSS students
© 2009 Automatic Sync Technologies
44
Captions Improve:
Searchability, Discoverability, Navigability
• Captions and transcript text can be used as
meta-data for SEO (search engine optimization)
• Can work with variety of tools: Google video,
AST search, Reelsurfer
• CNET captioned video drove 30% increase in
Google hits
© 2008 Automatic Sync Technologies
Captioning Learning Outcomes Research
• “Augmenting an auditory experience with captions more than doubles the
retention and comprehension levels.” Gary Robson, The Closed
Captioning Handbook
• Adult students that used captioned video presentations progressed
significantly better than those using traditional literacy techniques.
Benjamin Michael Rogner, Adult Literacy: Captioned Videotapes and Word
Recognition
• Dual Coding Theory postulates that both visual and verbal information are
processed differently and along distinct channels with the human mind
creating separate representations for information processed in each
channel. Allan Paivio, University of Western Ontario
• Multi-Modal Learning: See It, Hear It, Do It, Master It. Use 2 or more
senses to avoid sensory overload (Granström, House, & Karlsson 2002,
Clark & Mayer 2003)
© 2009 Automatic Sync Technologies
46
Learning Outcomes:
SFSU Study
•
American Indian studies class, 2007
•
Instructional video materials delivered randomly to students-50% with captions 50%
without
•
Two trends emerged:
No captions: students were quite passive and silent during class discussions - with the
usual "usual speakers" dominating the conversation and generalizations were
pervasive.
o
With captions: students were more engaged and responsive to the questions asked
about the film. In a similar vein, students made interesting analogies to their everyday
lives and reference to specific information and events from the video was much more
abundant.
•
The most exciting of all was the correlation between this usage of captions and the
students' grades with an average increase of 1 full GPA for students exposed to
captions.
•
Source: And Captions For All? A Case Study of the Relevance of Using Captions in a
College Classroom by Robert Keith Collins, Assistant Professor, American Indian Studies
© 2009 Automatic Sync Technologies
47
Learning Outcomes:
SJSU User Feedback on Captioning
Better Absorption of Material
o
o
o
“It helped me to catch words that I didn't understand, and also
helped with spelling.”
“It allows me to ‘pause’ the lecture and take notes from the
captions when my note-taking lags behind the spoken lecture.”
“I caught several things the second time around reading captions
that I did not listening the first time around.”
Allows Better Interactivity with Course Material
o
“I much prefer the captioned lectures and being able to look at
the links while you are talking. So far this has been the BEST
online class I've taken at SJSU, others should learn from your
example.”
© 2009 Automatic Sync Technologies
48
SJSU User Feedback on Captioning
Diversifies Delivery of Video Media
o
o
“I was able to ‘read’ at my desk without having the audio turned
on so that others in my office wouldn't be bothered.”
“Captions also allow you to view videos when you are in a
situation where you are not able to use sound.”
© 2009 Automatic Sync Technologies
49
Campus Captioning Considerations
• Defining policy: what content, which audiences,
quality metrics, process
• Identifying responsible owners of content and of
captioning process
• Integrating into learning strategy
• Selecting approaches & vendors
• Facilitating procurement of resources
• Workflow automation
• Budget
© 2009 Automatic Sync Technologies
50
CSU
• Leads the country in accessibility policy
• Created ATI
• Evaluated options and vendors
• Facilitated procurement with AST
• Task force looking at content policies,
effectiveness metrics
© 2009 Automatic Sync Technologies
51
Rethinking The Accommodation Model
• Proactive vs. reactive
o
Ask student which options will help them learn
best?
• Example: Deaf students choices:
–
–
–
–
Sign language interpreter
CART system
Recorded lecture with transcripts/captions
Note takers
• Systemic solutions vs. one –at-a-time
• Communicate programs
© 2009 Automatic Sync Technologies
52
Scope of “Captionable” Media
•
University Communications (Promo and news videos)
•
Distance Learning materials, Podcasts
•
Recorded classes and learning objects
•
Material posted in content portals
•
VHS/DVD library archives
•
Broadcast productions
•
Special Event videos
•
Student content
53
© 2009 Automatic Sync Technologies
53
Content Portals
• iTunes U
o
250+ universities, 175K educational content
items, 58M users
• YouTube
o
160+ Universities, 30K videos
• Campus LMS; media servera
• Lecture Capture systems
• Academic Earth, Facebook, Twitter
54
© 2009 Automatic Sync Technologies
54
Why Use Content Portals?
•
•
•
•
•
•
Extensive adoption=distribution
Minimal training/ end user support
Inexpensive
Ubiquitous, cross-platform and devices
Adds value to brand
Creates framework to sell content
© 2009 Automatic Sync Technologies
55
Prioritizing Captioning Projects
• Critical Accommodations
• Distance Learning classes and materials
• Public information
• Training materials
• Events, communications
• Recorded lectures
© 2009 Automatic Sync Technologies
56
Prioritization Decision Factors
•
•
•
•
•
Time/urgency
Budget
Workflow
Expected usage frequency
Audience
o
Internal/external
• Primary purpose
o
Review or core instruction
© 2009 Automatic Sync Technologies
57
Funding Captioning
•
Grants
•
Centralized funding
•
Pay to download : iTunes U and YouTube have
infrastructures
•
o
UW Study says students will pay for content
o
Charge external community for downloads
Cost recovery through student fee assessment
o
•
UNLV approach
Sponsorship (advertising)
© 2009 Automatic Sync Technologies
58
The Desired Outcome
• Media that is:
Accessible
Compliant
Valuable to all audiences
Reusable
Discoverable
So that learning outcomes improve!
© 2009 Automatic Sync Technologies
59
Thank You For Attending!
And Thank You Kevin and Pat!
“Meet The Experts” on Captioning
with
Kevin Erler and Pat Brogan
Automatic Sync Technologies
June 25, 2009, 1 PM – 2 PM PDT
Produced by: Jean Wells, CSU ATI
Captions by: Bill Courtland, [email protected] 805.368.2802
Guests: Kevin Erler & Pat Brogan