Transcript EUDC - epub

Requirements of End User Defined
Characters &
Some Frequently Used Solutions
Chinese Foundation for Digitization
Technology (CMEX)
Phobos Chang
Agenda
The Need For End User Defined Characters
 When to Use EUDC
 Embedded Resources
 Multi-Typefaces Support
 Mapping Information for Further Use
 Frequently Used Solutions

The Need For End User Defined Characters

The first well-accepted Traditional Chinese encoding
format only defines 13,053 hanzi characters.
– Some characters used in Address do not include.
– Characters used in People’s name do not fully include.
• Even Prime Minister’s name (only three characters) lacks one.
• Tax collection
• Fortune Teller’s issue
– Characters used in ancient books, used in the names of historical
people.
– Needs EUDC support from day one.

Starts from MS-DOS era.
When to Use EUDC


When EPUB devices cannot display characters bundled
when purchased.
GNU Unifont 5.1
– A project started in 1998 by Roman Czyborra
– Covers Basic Multilingual Plane (BMP) of Unicode 5.1 standard
– Bitmapped (8x8 or 16x16) font at the beginning, then transfer to
TrueType
– Has a character for BMP code point at first, beautiful is next
– Sample
Hardware Resource Limitation

Why does CMEX suggest using BMP as a minimum?
– BMP includes 27,484 normalized hanzi characters
– Supplementary Characters are too many for low-end devices
• Only CJK Unified Ideographys Extension B contains 42,711 characters
• Surrogate support is not public for now
– Not every book uses code points beside BMP, those books that
needs EUDC support are few
Requirement 1X

Define EUDC as
– End User Defined Character are those characters whose interpretation are
not specified by current Unicode standard, plus characters whose
interpretation are specified by Unicode standard but assigned code points
are not inside BMP.

Requirement One
– For any character which is not defined in current Unicode standard,
or which is defined in current Unicode standard but its code point is
outside of Unicode BMP can be used in the context of any one
EPUB document via EUDC support.
Embedded Resources & Requirement 2X
Not every EPUB hardware has wireless connection support.
 Those devices that have wireless connection, may be
carried to a location without connection, like basement. We
wish EUDC support works in such a circumstances.
 Requirement Two

– For any EPUB documents which contains EUDC, all resource files to
support the display of EUDC can be embedded inside the EPUB zip
compression file.
Multi-Typefaces Support


Some EPUB hardware can let user to choose which typeface
they want to use for display
For example, Song (細明體) and Kai (楷體) are two mostly
used fonts in Traditional Chinese.
– To display EUDC using either font, it will need two resources,
respectively.

Requirement Three
– It would be better to provide a mechanism to assign a
corresponding resource to support EUDC display for each font
using in an EPUB document.
Mapping Information for Further Use
What if EPUB hardware does not support EUDC?
 Provide useful information for later process.
 Requirement Four

– It would be better to embed mapping information for all the EUDC
using inside an EPUB document.
– When embeds mapping information inside an EPUB documents, for
EUDC that are interpreted by Unicode standard but beyond BMP,
mapping information should contain corresponding code point
such as U+20000 for each character;
– for EUDC that are not interpreted by Unicode standard, mapping
information should contain useful reference coding scheme, such as
TF-2121 used in Taiwan’s CNS11643 standard.
Use of Private Use Area

Most of the solutions for EUDC is PUA-centric in Taiwan.
– Input within Input Method Environment
– Display for every application
– Printing

Pros
– Easy to use when authoring
– Much more straightforward

Cons
– Will need to check code point range when rendering
– Unicode normalization
Frequently Used Solutions
In-line Images
 Java Applet for EUDC Display and Input
 EUDC Display using Ajax
 Embedded OpenType Font (EOT)
 sIRF
 Web Open Font Format

Embedded OpenType Font
Designed by Microsoft
 Submitted to W3C in 2007 as part of CSS3 and get rejected.
 Re-submitted to W3C in 2008 as a standalone submission
 IE only
 Not widely accepted even in Taiwan

sIFR
Scalable Inman Flash Replacement
 Open Source Javascript and Adobe Flash

Web Open Font Format
Developing in 2009
 A strong favorite for standardization by the W3C Web
Fonts Working Group
 Vendor Support

– FireFox since 3.6
– Microsoft IE 9
– Webkit