PowerPoint 簡報

Download Report

Transcript PowerPoint 簡報

Digital Museum of Taiwan’s
Social and Humanities Video Archives
A National Science Council Digital Museum Research Project
Addresser: Dam-Ming Lee
Associate Professor
Graduate Institute of Arts and Technology
Taipei National University of the Arts
2002 / 12 /17
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
• A Joint Project of
Center for the Study of Art and Technology
Institute of Information Science, Academia Sinica
Computing Centre, Academia Sinica
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
• This Research Project is Comprised of 5 Sub-projects:
• Sub-project 1: The Digitization and Metadata Organization of
Video Archival Footage and the Planning of Exhibition Content
• Sub-project 2: The Study and Application of Video Information
Visualization
• Sub-project 3: The Construction of an Integrated Environment for
Supporting Video Retrieval, Streaming and Publishing
• Sub-project 4: Video Shot Change Detection and Watermarking
• Sub-project 5: Development of Audio Processing Techniques and
a Retrieval System for Video Archive
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
Sub-project 1: The Digitization and Metadata Organization of
Video Archival Footage and the Planning of Exhibition Content
is responsible for analyzing and digitizing video
material and extracting metadata from them, as well
as planning the webpage content in association with
Sub-project 2.
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
• Sub-project 2: The Study and Application of Video Information
Visualization
is in charge of designing the application of the video
content from Sub-project 1, the webpage script
development, and information visualization.
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
• Sub-project 3: The Construction of an Integrated Environment
for Supporting Video Retrieval, Streaming and Publishing
has two tasks:
1. to design a highly efficient knowledge searching
and management software for sub-project 1, and
2. to design an integrated environment and softwares
for producing and managing digital video/audio
knowledge.
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
• Sub-project 4: Video Shot Change Detection and
Watermarking
is responsible for developing shot-change autodetection and watermarking techniques needed in the
video footage digital archive management system of
the Digital Museum project.
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
• Sub-project 5: Development of Audio Processing Techniques
and a Retrieval System for Video Archive
• is responsible for developing video retrieval
technology based on speech recognition and spoken
document retrieval technologies. Sub-project 5
works closely with other sub-projects to develop an
efficient information retrieval tool for a digital
library consisting of a large amount of video and
audio information.
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
Digitization
Betacam
Tapes
MPEG-2
VIDEO
Server
MPEG-2
VIDEO
MPEG-2
VIDEO
MPEG-2
VIDEO
Spoken Doc.
Retrieval Sys.
MPEG-2
VIDEO
Sampling
Shot-change
Detection
WAV Spoken
Document
Speech Recognition system
Metadata
Analysis &
Implanting
Spoken Doc.
Extraction
System
MPEG-1
VIDEO
Add Watermarking
XML
Shot
Detection
Shots
Database
Key Frames
Retrieval
System
Streaming
MPEG-1
Summarization
Retrieval
Ssystem
An Introduction to Digital Museum of Taiwan’s Social and Humanities Video Archive
Metadata
implanting
Metadata
Spoken
Doc.
Retrieval
Spoken
Documents
Streaming
MPEG-1
Shots
Database
Shots
Key
Frame
Retrieval
Summarization
Retrieval
Key
Frames
Summarization
Streaming
Video
Video Footage Digital Archive Management System
Taiwan’s Social and Humanities Video Archive Digital Museum System
Interactive
Learning Design
Content
Planning for
Interactive
Learning
e-Movie
Visualization
e-Movie
Programming
e-Learning
Visualization
Script
Writing for
e-Learning
Database Search
Visualization
Thematic
Exhibition
Script Writing
for Thematic
Exhibition
Video Digital Museum Research Flow Chart
Metadata Analysis → Database Search Program
Design → Shot Change Detection Technology +
Watermarking Embedding + Speech Search
Technology → Webpage Visualization → Metadata
Implantation
Structure of the Video Digital Museum








Thematic Exhibition
Video Database Search
e-Editing
e-Movie
e-Learning
Video Interactive Learning
About Us
Related Websites
Front Page of the Video Digital Museum
 Thematic Exhibitions
• Indigenous Personalities: Writers, Tribal Social Workers,
•
•
•
•
Artists, Cultural Renaissance, Language Revival
Indigenous Culture: Art and Craft, Music, Ceremonies,
Architecture
Taiwan Social Movements: Democratization, Women’s
Movement, Labor Movement, Indigenous Movement,
Environment Movement
Taiwan Water Culture: Taiwan Water Resource, Underground
Water, Reservoir, Wetland Conservation, Forest
Conservation, Water Shortage
Taiwan Ecology: Plants, Birds, Insects, Ocean, Mountain
Digital Museum of Taiwan’s Social and Humanities Video Archive
Front Page of the Thematic Exhibition
 Metadata Analysis
Based on a metadata system developed by European Union’s
ECHO Project (European Chronicles On-line Project), we begin
analyzing and implanting our metadata.
Developing and designing Metadata based on ECHO and FRBR
•
•
•
models, we hope to accomplish the following goals:
Structured on 3 levels, “whole,” “sequence,” “scene,” and “shot,” we can
implant video information in each level in our Metadata system.
By utilizing new information technologies we can extract automatically
Metadata information, such as video, audio, and related transcript.
We can indicate specifically different relationships, including different
medium, such as whole, sequence, scene, and shot, as well as different
forms, such as video, audio, and transcript.
Please see our website: http://www.sinica.edu.tw/~metadata/
The ECHO Metadata Model
Source:Amato, G., Castelli, D., Pisani, S., Venerosi, O., Poncin, O., & Vinet, L. (2000).
Metadata modeling report (p. 15)
WORK
EXPRESSION
MANIFESTATION
ITEM
The IFLA FRBR Model
Source:IFLA Study Group on the Functional Requirements for Bibliographic Records. (1998).
Functional requirements for bibliographic records (p. 13). Müchen:K G Saur
 Metadata Requirement: FRBL Based
項目名稱
英文名稱
資料型態
大
小
必
填
片名
Title
Varchar
40
*
總片名
Series Title
Varchar
30
*
類別
Genre
Varchar
10
*
描述
Description
Text
-
Name
Varchar
20
職稱
Role
Varchar
20
單位
Affiliation
Varchar
30
Varchar
10
鄉鎮
Varchar
10
村/里
Varchar
10
多
值
屬性
提 供
者
AV Document 【Work】
參 與
人員
地點
姓名
縣市
People
Location
部落
主題
主要類別
Themes
次要類別
下拉式選單
◎
*
◎
填
人
填
人
填
人
填
人
填
人
表
表
表
表
表
填 表
人
Varchar
10
Themes Type
Varchar
20
Sub Themes
Varchar
20
下拉式選單
填 表
人
採 ISO8601 格 式 著 錄 : 填 表
YYYY-MM-DD
人
至少注入YYYY
預設值:中文
填 表
人
*
關鍵字
Key Words
Varchar
60
事件時間
Event Date
Varchar
20
描述語言
Description Language
Varchar
20
Kind Or Variety
Char
15
*
In
Varchar
15
*
Out
Varchar
15
Manifested By
Varchar
20
下拉式選單
*
填 表
人
Version 【Expression】
種類
長度
開始
Duration
結束
相 關
連結
相關媒體
Relationships
*
◎
◎
下拉式選單
填 表
人
填 表
人
填 表
人
 Metadata Requirement: FRBL Based
Video 【Expression】
層級
Kind
Char
15
*
影像敘述
Images Description
Text
-
*
摘要
Video Abstract
Varchar
30
代表畫面
Key Frame
Varchar
30
Position
Varchar
20
大小
Size
Varchar
20
角度
Angle
Varchar
10
時碼
Frame
Varchar
20
Name
Varchar
20
職稱
Role
Varchar
20
單位
Affiliation
Varchar
30
Position
Varchar
20
大小
Size
Varchar
20
敘述
Description
Text
-
時碼
Frame
Varchar
20
Direction
Varchar
大小
Magnitude
開始碼
結束碼
臉孔
位置
人
物
物件
物 件
運動
Faces
Person
姓名
位置
方向
Object
Objects
Motions
下拉式選單
影片摘要,此欄位
存路徑。
預設值:統一以該
段影片的第一格為
key frame。
以座標點表示
【註一】
【註一】
◎
填 表
人
填 表
人
系統
填 表
人
系統
系統偵測:俯角、
仰角、水平、側身、
背面【註一】
填 表
【註一】
人
填 表
人
以座標點表示
系統
20
以平均單位向量表示
填 表
人
填 表
人
系統
Varchar
20
以平均向量大小表示
系統
Start Frame
Varchar
20
系統
End Frame
Varchar
20
系統
◎
 Metadata Requirement: FRBL Based
攝影
機運
動
鏡頭
伸縮
影像
中的
文字
模式
Mode
Char
10
開始碼
Start Frame
Varchar
20
系統偵測:pan, tilt,
系統
crane, follow【註一】
【註一】
結束碼
End Frame
Varchar
20
【註一】
Mode
Char
20
開始碼
Start Frame
Varchar
20
系統偵測:zoom in、 系統
zoom out【註一】
【註一】
結束碼
End Frame
Varchar
20
【註一】
Position
Varchar
20
以座標點表示
Size
Varchar
20
文字
Text
Text
-
時碼
Frame
Varchar
20
模式
位置
大小
Camera
Movements
Camera
Zooms
Segmented
Texts
◎
有無聲音
Sound
Varchar
10
*
開始碼
First Frame
Varchar
20
*
結束碼
Last Frame
Varchar
20
*
秒格數
Frame Rate
Varchar
20
影像特徵
Visual Features
Varchar
60
顏色
Color
Varchar
10
*
Part of
Varchar
20
*
包含
Follow by
Varchar
20
*
有無聲
音
Has audio
Varchar
20
*
相關
連結
屬於
Relationships
下拉式選單
預設值:30
◎
下拉式選單
系統
填 表
人
填 表
人
填 表
人
填 表
人
填 表
人
填 表
人
系統
填
人
填
人
填
人
填
人
表
表
表
表
 Metadata Requirement: FRBL Based
Audio 【Expression】
層級
Kind
Char
15
*
聲音語言
Audio Language
Varchar
20
*
音軌數目
Channel Number
Varchar
頻率
Frequency
聲音類別
Type
聲
音
品
質
Sound
Quality
背景聲
錄音品質
下拉式選單
填 表
人
下拉式選單
填 表
人
10
下拉式選單
填 表
人
Varchar
10
預設值:44.1khz
填 表
人
Varchar
10
*
下拉式選單
填 表
人
Background Sound
Varchar
60
*
Sound Recording
Quality
Varchar
60
◎
◎
填 表
人
填 表
人
開始碼
Start Frame
Varchar
20
*
填 表
人
結束碼
End Frame
Varchar
20
*
填 表
人
Part Of
Varchar
20
填 表
人
包含
Follow By
Varchar
20
填 表
人
有無文稿
Has Transcript
Varchar
20
填 表
人
相
關
連
結
屬於
Relationships
 Metadata Requirement: FRBL Based
Transcript 【Expression】
文稿
Transcript
Text
-
◎
填 表
人
說話者
Speaker
Varchar
20
◎
填 表
人
頻寬
Band Width
Varcahr
10
說話語言
Speaker Language
Varchar
20
◎
下拉式選單
填 表
人
性別
Gender
Varchar
10
◎
下拉式選單
填 表
人
開始碼
Start frame
Varchar
20
*
填 表
人
結束碼
End frame
Varchar
20
*
填 表
人
填 表
人
Media 【Manifestation】
影帶原始規格
Original Format
Varchar
20
影帶轉拷規格
Video Tape Format
Varchar
20
*
相關連結
Has Storage
Varchar
10
*
◎
下拉式選單
填 表
人
下拉式選單
預設值:
Betacam -SP
填 表
人
是否有原始資料
庫,選擇是或
否.
填 表
人
 Metadata Requirement: FRBL Based
Storage 【Item】
存
放
位
置
原始影帶
識
別
資
料
案別
權
限
範
圍
Collocation
Original Type
Varchar
40
*
填表人
Digital Files
Varchar
40
*
填表人
Case Title
Varchar
60
填表人
節目名稱
Series Title
Varchar
40
填表人
卷號
Roll Number
Varchar
40
填表人
帶碼
Tape Number
Varchar
10
*
著作權擁有
者
Copyright Owner
Varchar
10
*
預設值:李道明
填表人
數位典藏單
位
Digital Copy Provider
Varchar
10
*
預設值:李道明
填表人
權限使用範
圍
Public Access
Varchar
10
*
能否在網路上使用,
選擇能或否.
填表人
數位檔案
Identify
註一:臉孔、攝影機運動與鏡頭伸縮欄位之值由系統自動偵測所得,預設值可預設為null。
填表人
 Metadata Requirement: Value
在上述需求欄位總表中,屬性欄位標示“下拉式選單”者,其對應之代碼表如下:
項目名稱
代碼
AV Document 【Work】
EVENTS/INTERVIEWS/SCENERIES/MUSIC AND DANCE/STILLS
Genre
Themes
Theme
Type
TAIWAN SOCIETY/INDIGENOUS PEOPLES/ENVIRONMENT PROTECTION/ETC. (TO BE EXPANDED)
Sub
Themes
POLITICS/INSTITUTIONS/PERSONALITIES/NGO/MEETINGS/SOCIAL MOVEMENTS/MEDICINE/EDUCATION/
RELIGIOUS RITES/CULTURE AND ARTS/ACTIVITIES/ENVIRONMENT AND ECOLOGY/ETC.
Version 【Expression】
Kind Or Variety
VIDEO/AUDIO/TRANSCRIPT
Video 【Expression】
Kind
SHOT/SCENE/SEQUENCE/WHOLE
Sound
SOUND/SILENT
Color
COLOR/B&W/MIXED
Audio 【Expression】
Kind
SHOT/SCENE/SEQUENCE/WHOLE
Audio Language
MANGARIN/ENGLISH/TAIWANESE/HAKKA/TSOU/SAO/AMIS/PUYUMA/ATAYAL/SAISIAT/PAIWAN/RUKAI/BUNU
N/TAO/KETAGALAN/KAVALAN/SYRAYA/ETC.
Channel
1/2/MIX/SEP
Type
SPEECH/INTERVIEW/MUSIC/LOCATION SOUND
Transcript 【Expression】
Speaker Language
MANGARIN/ENGLISH/TAIWANESE/HAKKA/TSOU/SAO/AMIS/PUYUMA/ATAYAL/SAISIAT/PAIWAN/RUKAI/BUNU
N/TAO/KETAGALAN/KAVALAN/SYRAYA/ETC.
Gender
M/F/OTHER
Media 【Manifestation】
Original Format
Betacam-SP/Digital Betacam/Betacam-SX/DV/DVCAM/S-VHS/VHS/16mm/35mm/Super-8/Super-16/Other
Video Tape Format
HDCAM/Betacam-SP/Digital Betacam/Betacam-SX/MPEF-IMX/DV/DVCAM/S-VHS/VHS/Other
• Interface of Video Database Management
 Visualization of Database Search Interface
 Video Database Search Result 1
 Video Database Search Result 2
 e-Movie
e-Movie Programming → Video Streaming → Interactive
Webpage Design
 e-Learning
e-Learning Content Planning → Visual and Interactive Webpage
Design → Learning Effect Appraisal
 e-Learning
“Documentary Film and Taiwan Society” Pilot Course
Please see: http://elearn.tnua.edu.tw/
 Video Interactive Learning
Content Analysis → Script Design → Webpage Design
Front page
Video Interactive Learning
• (1) Video Quiz (2) Wear and Tell (Male)
• (3) Wear and Tell (Female)
Wear and Tell (Male)
Video Quiz
Video Tape
Sequence
Shot
Transcript
Video
Relationship among Video and Audio Content
Audio
People
Case Title
Tape Number
Title
Themes
Format
Digital Files
Audio
Event Date
Description
Language
Video Tape
Sequence
Images
Description
Shot
Video
Transcript
Frame Rate
Audio
Location
Genre
Copyright
Owner
The E-R Model of Video Tape
Tape
Number
Shot
Number
Audio
Video Tape
Sequence
Shot
Video
Transcript
Key
Frame
People
Last
Frame
Images
Description
First
Frame
The E-R Model of Shot
Video Database Search System
Full Text Search System
Full Text Search Result
Detail of Search Result
Spoken Document Search
Spoken Document Search Result
Advanced Search
Advanced Search Result
Advanced Search Result
Key Frame
Video Streaming
Distribution Files
Video
Watermarking
System
Video Footage
Digital Archive
Management System
Key Frame
Extraction System
Shot Change
Detection System
MPEG-2
Files
Spoken Document
Extraction System
Summarization
Extraction System
Spoken Document
Search and Retrieval
System
Video
Summary
Technical Integration of Sub-projects 3, 4, and 5
XML Shot Change
Detection Result
Manual
Annotation
Video
Shot Change
Detection
Automatic
Annotation
Characteristics
Search
Off-line
On-line
User
Video Database
Query Result
Browsing
Word/Video
Queries
Shot Change
Detection
Characteristics
Search
Video Queries Flow Chart
Shot Change Detection Information Format DTD
<?xml version="1.0" encoding="Big5"?>
<!ELEMENT VideoShotDecision(VideoInfo, ShotInfo)>
<!ELEMENT VideoInfo(FileName, Length, FrameRate, Width, Height, Mode, Format)>
<!ELEMENT FileName (#PCDATA)>
<!ELEMENT Length (#PCDATA)>
<!ELEMENT FrameRate (#PCDATA)>
<!ELEMENT Width (#PCDATA)>
<!ELEMENT Height (#PCDATA)>
<!ELEMENT Mode (#PCDATA)>
<!ELEMENT Format (#PCDATA)>
<!ELEMENT ShotInfo(ShotDecision*)>
<!ELEMENT ShotDecision(ShotNumber, ShotStart, ShotEnd, Direction, KeyFrame)>
<!ELEMENT ShotNumber (#PCDATA)>
<!ELEMENT ShotStart (#PCDATA)>
<!ELEMENT ShotEnd (#PCDATA)>
<!ELEMENT Direction (#PCDATA)>
<!ELEMENT KeyFrame (#PCDATA)>
Example of Shot Change Detection Information
<?xml version="1.0" encoding="Big5"?>
<!DOCTYPE VideoShotDecision SYSTEM "VideoShotDecision.dtd">
<VideoShotDecision>
<VideoInfo>
<FileName>檔案名稱(主檔名)</FileName>
<Length>影像長度(以Frame Number為單位)</Length>
<FrameRate>影片播放速率(fps)</FrameRate>
<Width>畫面寬度</Width>
<Height>畫面高度</Height>
<Mode>視訊模式</Mode>
<Format>視訊格式</Format>
</VideoInfo>
<ShotInfo>
<ShotDecision>
<ShotNumber>Shot順序</ShotNumber>
<ShotStart>Shot起始值</ShotStart>
<ShotEnd>Shot結束值</ShotEnd>
<Direction>Shot長度</Direction>
<KeyFrame>關鍵畫面</KeyFrame>
</ShotDecision>
<ShotDecision>
<ShotNumber>Shot順序</ShotNumber>
<ShotStart>Shot起始值</ShotStart>
<ShotEnd>Shot結束值</ShotEnd>
<Direction>Shot長度</Direction>
<KeyFrame>關鍵畫面</KeyFrame>
</ShotDecision>
</ShotInfo>
</VideoShotDecision>
Software for Auto-segmentation of Shots
for each MBi ,
video
input
VLC
decoding
run-level
(rij , lij )
next
frame ?
for each MBi,
calculate parameters:
yes
u(i), u(i), d (i), d G (i)
no
h
Let (rij , lij ) be the
closest run-level
in the VLC table
no
(rij , lijh ) exist in
VLC table?
d (i )  1, if ( wi )  0
d Gh (i )   G
d G (i )  1, if ( wi )  0
d h (i )  d Gh (i )    
lijh  lij  (d h (i)  d (i))
fetch (rij , lij )
yes
has other
run-level ?
yes
no
calculate total amount bits of
reduced and increased bits by
respectively embedding negative
and positive watermark bits
watermark embedding processing
constant bit-rate processing
h
output VLC codeword
h
of each (rij , lij )
restore (rij , lij ) to (rij , lij )
until increased size  reduced size
where MBi is corresponding to
positive watermark bit
Flow Chart of Embedding Video Watermark
Adding Watermark
Original Video
Watermark Embedded
Video
Piracy
Watermark Inspection
Original Video
Watermark Embedded Video
A Comparison of Execution Time Among Watermark Detection and
Video Compression and Decompression
Mandarin Chinese Spoken Document Retrieval
Subword vs. Word for Retrieval
• Words contain lexical knowledge
Words enhance
precision
• Subwords offer robustness against
– Word tokenization ambiguity, e.g. 這一晚會如常舉行
• 這一晚 會 如常 舉行 [Tonight it will proceed as usual]
• 這一 晚會 如常 舉行 [This banquet will proceed as usual]
– Open vocabulary problem
• An unlimited number of words, but ~6,000 characters and ~400 syllables
offer complete textual and phonological coverage for Mandarin Chinese
– Homophone ambiguity
• 富庶 負數 複數 覆述 are totally different words but all pronounced as /fu
shu/
• A foreign word may be translated into different Chinese words
– Speech recognition errors
Subwords enhance
recall
Multi-scale Indexing
word
character
syllable
Overlapping N-gram Indexing
Given a document (or a query)
u1
u2
u3
u4
u5
u6
u7
u8
ui can be a word or a subword (character, syllable, phoneme, etc.)
We can have
Uni-gram: u1, u2, u3, u4, u5, u6, u7, u8, …
Overlapping bi-gram: u1u2, u2u3, u3u4, u4u5, u5u6, u6u7, u7u8, …
Overlapping tri-gram: u1u2u3, u2u3u4, u3u4u5, u4u5u6, u5u6u7, u6u7u8, …
…..
Each overlapping N-gram is called an indexing term
Vector Model – Basic Idea
user request
q[t ]  (1  log( f q (t ))) log( N D / N Dt )
documents dj
Query
Formation
Indexing
query q
Term
Frequency
Inverse
Document
Frequency
document
representations Vdj
Feature
Extraction
query
representation Vq
d[t ]  (1  log( f d [t ]))
Retrieval
S ( q, d ) 
List of documents
in decreasing order of
cos(Vdj,Vq)
Vq  Vd
Vq Vd
Evaluation
performance
Feedback
Vector Model – Information Fusion
nt
d [t ]  (1  log( cmt ( j )))
j 1
Using confidence measures
instead of frequency counts
Si ( q , d ) 
nt
q[t ]  (1  log( cmt ( j )))  log(N D / N Dt )
j 1
I
S (q, d )   wi Si (q, d )
i 1
Vqi Vdi
Vqi Vdi
Speech Recognition Evaluation
Transcribed 13
Segments of speech
Acoustic models: broadcast news speech
(16hours)
Language models: CNA text news (65M characters)
Lexicon: 61521 words
Syllable accuracy: 15.92%
Character accuracy: 8.18%
Transcribed 3.25
hours of data for
training acoustic
models
Syllable accuracy: 27.17%
Language model
adaptation
Collected a 6.4 MB
text corpus from
indigenous
webpages
Keyword
extraction
Acoustic models: broadcast news speech (16hours) + domain specific training speech
(3.25hours)
Language models: CNA text news (65M characters) + domain specific text corpus (3.2M
characters)
Lexicon: 61521 words +3674 domain specific words
Syllable accuracy: 30.04%
Character accuracy: 22.08%
Development of a Demo System
MPEG2 Video
Speech Recognition
Audio extraction
Speech segments
PCM Wave
Transcriptions
Query
Indexing
Indexing
Vectors
Vectors
Segmentation
Comparing
We have randomly selected 8 Betacam
cassettes for building the prototype retrieval
system.
The speech wave was automatically
chopped into 28-32 seconds segments based
on energy detection.
 A total of 386 segments were obtained.
Retrieved Documents
Demo System –
Web-based spoken document retrieval system
Video
Streaming
Client
Web
Browser
Indices
Retrieved
documents
Queries
Web
Server
Syllable
lattice
IR
Server