Consonant Characters and Inherent Vowels

Download Report

Transcript Consonant Characters and Inherent Vowels

Towards the Promised Land:
Globalization Developments
in Web Standards
Addison Phillips, Chair
W3C Internationalization WG
Presenter
• Globalization Architect,
Amazon Lab126
We make the Kindle
• Chair,
W3C Internationalization WG
Acknowledgements
• This presentation owes much of its content to these
contributors:
–
–
–
–
–
Richard Ishida (W3C International Activity lead)
Felix Sasaki (W3C MLW-LT)
Aharon Lanin (Google, bidi maven)
Norbert Lindenberg (ES-I18N)
Koji Ishii (Rakuten)
The Web: vastly improved or
room for improvement?
• Why “the promised land”?
The promise of a multilingual Web is being
realized and new W3C specifications help
demonstrate that.
Many features are implemented.
• Why only “towards”
We’ve waited a long time.
Many features we’ll talk about today are not
implemented yet or are only partially
implemented.
• What issues are more or less solved on the
Web?
• What are we doing to address the remaining
problems?
• How can you influence the outcomes?
Unicode
ًّ ‫عاملية‬
ّ ‫العاملية‬
ّ
!‫حقا‬
‫جعل شبكة الويب‬
!‫وب جهانی را بهدرستی جهانی سازیم‬
‫عاملگیر ویب کو حقیقی طور پر عاملگیر بنانا‬
"The Path W3C follows
Համաշխարհային ցանցն իրոք համաշխարհային դարձնելը
to making text on the
ᑖᑦᓱᒪ ᐃᑭᐊᖅᑭᕕᒃ ᓯᓚᕐᔪᐊᓕᒫᒥᒃ ᓈᕆᑎᑉᐹ.
"Дүниежүзілік торды" нағыз дүниежүзілік етеміз!
वर्ल्ड वाई् वेबलाई यथाथडमै ववश्वव्यापी बनाउने !Web truly global is
የዓለም አቀፉን ድር በእውነት አለም አቀፍ ማድረግ!
Unicode."
Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο
ਵਰਡ ਵਾਈਡ ਵੈਬ ਨੂੰ ਵਾਕਈ ਵਵਸ਼ਵ-ਵਵਆਪੀ ਬਨਾਉਣਾ !
缔造真正全球通行的万维网
!‫ליצור מהרשת רשת כלל עולמית באמת‬
ˈmeɪkɪŋ ðə wɜːld waɪd wɛb ˈtruːlɪ ˈwɜːldˈwaɪd
ワールド・ワイド・ウェッブを世界中に広げましょう
ធ្វើឲ្យធ ើលវ៉ាយធ ៉ាបមានទូទាំងពិភពធោកពិប្រាកដមែន!
전세계의 월드 와이드 웹으로 만들기!
Gwneud y we fyd-eang yn wirioneddol fyd-eang!
การทําให้ World Wide Web แพร่หลายไปทัว่ โลกอย่างแท้ จริง
འཛམ་གླིང་ཡོངས་འབྲེལ་འདླི་ ངོ་མ་འབད་རང་ འཛམ་གླིང་ཡོངས་ལུ་ཁྱབ་ཚུགསཔ་བཟོ་བ།
Tim Berners-Lee
Unicode
ًّ ‫عاملية‬
ّ ‫العاملية‬
ّ
!‫حقا‬
‫جعل شبكة الويب‬
!‫وب جهانی را بهدرستی جهانی سازیم‬
‫عاملگیر ویب کو حقیقی طور پر عاملگیر بنانا‬
Համաշխարհային ցանցն իրոք համաշխարհային դարձնելը
ᑖᑦᓱᒪ ᐃᑭᐊᖅᑭᕕᒃ ᓯᓚᕐᔪᐊᓕᒫᒥᒃ ᓈᕆᑎᑉᐹ.
"Дүниежүзілік торды" нағыз дүниежүзілік етеміз!
वर्ल्ड वाई् वेबलाई यथाथडमै ववश्वव्यापी बनाउने !
የዓለም አቀፉን ድር በእውነት አለም አቀፍ ማድረግ!
Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο
ਵਰਡ ਵਾਈਡ ਵੈਬ ਨੂੰ ਵਾਕਈ ਵਵਸ਼ਵ-ਵਵਆਪੀ ਬਨਾਉਣਾ !
缔造真正全球通行的万维网
!‫ליצור מהרשת רשת כלל עולמית באמת‬
ˈmeɪkɪŋ ðə wɜːld waɪd wɛb ˈtruːlɪ ˈwɜːldˈwaɪd
ワールド・ワイド・ウェッブを世界中に広げましょう
ធ្វើឲ្យធ ើលវ៉ាយធ ៉ាបមានទូទាំងពិភពធោកពិប្រាកដមែន!
전세계의 월드 와이드 웹으로 만들기!
Gwneud y we fyd-eang yn wirioneddol fyd-eang!
การทําให้ World Wide Web แพร่หลายไปทัว่ โลกอย่างแท้ จริง
འཛམ་གླིངhttp://googleblog.blogspot.com/2012/02/unicode-over-60་ཡོངས་འབྲེལ་འདླི་ ངོ་མ་འབད་རང་ འཛམ་གླིང་ཡོངས་ལུ་ཁྱབ་ཚུགསཔ་བཟོ་བ།
percent-of-web.html
Encoding declarations
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html lang='en'>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
...
• Strong encouragement to use UTF-8.
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8>
</head>
...
• New meta charset declaration. Either
approach will work, but check you don't
have both.
• Must be completely within the first 1024
bytes of the file.
HTML5 Encoding Spec
• Rules for determining, parsing, handling legacy
encodings.
Unicode versions and ids
<h2><a id="რჩეული">რჩეული
ფოტოსურათი</a></h1>
<p><a href="/wiki/ჭიამაია" title="ჭიამაია"
class="mw-redirect">ჭიამაია</a> (Coccinellidae),
ხოჭოების ოჯახს ეკუთვნის. აქვს ამობურცული,
მომრგვალო ან ოვალური სხეული. ზურგზე ღია
ფონზე შავი ლაქები აყრია, იშვიათად
...
History: CharMod
CharMod was the start of the International Activity, based
on requirements originally published in 1998. So how is
this news?
Normalization
NFD
I◌́zeli◌́to◌̋u◌̈l
NFC
Ízelítőül
Ha a világ beszélni akarna, Unicode-ul szólalna meg.
Regisztráljon már most a Tizedik Nemzetközi Unicode
Konferenciára, melyet 1997. március 10-12-én rendeznek
Meinz-ban, Németországban. Ezen a konferencián az iparág
több neves szakértője is résztvesz. Ízelítőül a témákból: a
világháló és a Unicode nemzetközisítése és lokalizálása, a
Unicode alkalmazása működő rendszerekben és
alkalmazásokban, szövegelrendezésnél, és többnyelvű
számítógépeken.
Evolution & Revolution
✘
Bidirectional text support
،‫✘نشاط التدويل‬
W3C
✔W3C ،‫نشاط التدويل‬
<description dir="rtl">W3C
</description>
،‫نشاط التدويل‬
Bidi isolation for inserted text
✘
<span dir=rtl>‫<לילית‬/span> - 3 reviews
Bidi isolation for inserted text
• CSS3 added the “isolate” value to the unicodebidi property.
• HTML5 adds a new <bdi> element, with unicodebidi:isolate in the default stylesheet.
• The <output> element behaves the same way.
Determining direction at run time
✓
✗
✗
Determining direction at run time
• HTML5 adds new “auto” value for
the dir attribute.
• CSS3 adds a “plaintext” value to the
unicode-bidi property to allow perparagraph auto-direction, primarily
for use on <textarea> and <pre>
elements.
• dir=auto sets the unicode-bidi CSS
property to “plaintext” for
<textarea> and <pre> elements, to
“bidi-override isolate” for <bdo>
elements, and to “isolate” otherwise.
• It estimates a direction according to
the UBA method.
<p>Your search - <span class=booktitle
dir=auto>‫ יוות דודיק תורהצה‬CSS</span>
- did not match any documents.</p>
Unicode Isolate Controls
Four new codepoints:
• U+2066 LEFT-TO-RIGHT ISOLATE (LRI)
• U+2067 RIGHT-TO-LEFT ISOLATE (RLI)
• U+2068 FIRST STRONG ISOLATE (FSI)
• U+2069 POP DIRECTIONAL ISOLATE (PDI)
FSI!‫פיצה‬PDI
- 3 reviews ==> !‫ פיצה‬3 - reviews
Unicode Isolates -> HTML Markup
• http://www.w3.org/International/wiki/
Html-bidi-isolation  Needs Comments!
– @direction (isolating)
– Option options rejected:
•
•
•
•
Change dir to be isolating
Use <bdi> for isolation
Add ‘rli’ ‘lri’ to @dir (<span dir=“rli”>)
Add @isolate (<span dir=“rtl” isolate>)
Other bidi changes
• Reporting the chosen direction of
<input> and <textarea> in form
submissions (@dirname)
• <br> should should serve as a bidi
separator
• Block elements as bidi separators
(isolating)
• <title> supports the dir attribute
• <option> supports the dir attribute
and be displayed accordingly both in
the dropdown and after being
chosen
CSS3
hanging
A
国
alphabetic
ideographic
ক
Implementers of user agents need to be
prodded by the public to support the
developing marketplace !
Requirements for Japanese Layout
What about my language?
• Other language
groups
interested in
building
documents can
do so
– Korean nearing
FPWD
– Indic languages
– ???
Vertical text
Writing Mode
CSS3 has a new
module for
“writing mode”
that supports
vertical text.
http://www.w3.org/TR/css3-writing-modes/
Ruby annotation
Ruby annotation
<ruby<rb>凝</rb><rt>ぎょう</rt></ruby>
<ruby><rb>視</rb><rt>し</rt></ruby>
<ruby>凝<rt>ぎょう</rt>視<rt>し</rt></ruby>
<ruby>
<rbc><rb>凝</rb><rb>視</rb></rbc>
<rtc><rt>ぎょう</rt><rt>し</rt></rtc>
</ruby>
Ruby Annotation
• http://rishida.net/misc/ruby/ruby-authoring.html
Hyphenation
Zusätzlich
erleichtert PLS die
Eingrenzung von
Anwendungen,
indem es
Aussprachebelang
e von anderen
Teilen der
Anwendung
abtrennt.
Zusätzlich erleichtert PLS die
Eingrenzung von
Anwendungen, indem es Aussprachebelange von
an-deren Teilen
der Anwendung
ab-trennt.
* { hyphens: auto; }
Hyphenation Support
• Hyphenation support is
starting to become
available.
– Still works best
with embedded
(server-side)
hinting
– Language
support??
Still in flux… development needed
Language declarations
<DOCTYPE html>
<html lang=it>
<head>
<meta http-equiv=Content-Language content="en, it">
</head>
...
• Attributes indicate the language of text
inside that element for text processors.
Only one language value allowed.
• Meta elements indicate the language of
the expected readership. Multiple
languages are ok.
• Attributes override other declarations.
Language declarations
<DOCTYPE html>
<html lang=it>
<head>
✘
<meta http-equiv=Content-Language content="en, it">
</head>
...
• Attributes indicate the language of text
inside that element for text processors.
Only one language value allowed.
• Meta elements indicate the language of
the expected readership. Multiple
languages are ok.
• Attributes override other declarations.
• The meta element with ContentLanguage is now non-conforming.
BCP 47 improvements
• Basis for Java7, JavaScript, PHP, .Net and other
locale systems
• -u- extension
– Unicode Locales (RFC 6067)
• :lang pseudo-attribute
– CSS selection
• -t- extension
– Transliterations and transformations
(RFC 6497)
Improved Date/Time Support
<time datetime="2004-08-08">8
สิงหาคม ๒๕๔๗</time>
<form>
<input type="date">
</form>
Locale Sensitivity
• Still an issue for the Web
– Date pickers not locale or language sensitive
– No markup-based control over format
– Time zone support is spotty
JavaScript gets locales at last!
• ECMAScript ‘intl’ extension work
– Locales based on BCP 47 language tags
– Date, number formatting
– Collation
– and more…
• Core spec addressing Unicode needs,
particularly supplementary character support
http://wiki.ecmascript.org/doku.php?id=strawman:i18n_api
ES I18N Spec
• Internationalization API Specification
• Developed by ECMA TC 39 + experts
• Collation, number, date & time formatting
• Started fall 2010
• Implementations and test suite in progress
• Approved in December 2012
Demos
• Locale Extension
• http://norbertlindenberg.com/javascript/
demos/Collation.html
• http://norbertlindenberg.com/javascript/
demos/DateTimeFormat.html
• Core Extension
• http://norbertlindenberg.com/javascript/
demos/RegExp.html
• http://norbertlindenberg.com/javascript/
demos/Supplementary.html
Webapps at W3C
• Various technologies that make Web-based applications
possible are under development. Some samples:
– IDL
– Web sockets, Web storage, Web workers
– XHR
– Widgets
– Selectors
– File APIs
– DOM
The Widget Spec
• Widget containers deliver “apps” crossplatform based on HTML5
– Extensive localization model
– Ability to set base locale
<widget xmlns=http://www.w3.org/ns/widgets
defaultlocale=“en”>
<name short="Weather"> Weather! a totally
awesome application! </name>
<name short=" "‫آب و هوا‬xml:lang="fa" dir="rtl">
<span dir="ltr" xml:lang="en">Weather!</span> ‫برنامه‬
‫<واقعا بزرگ‬/name>
</widget>
ITS 2.0
• Internationalization Tag Set (ITS) 2.0
• Currently being defined in W3C
MultilingualWeb-LT Working Group
• Latest Draft 6 December 2012 (“Last Call”)
http://www.w3.org/TR/its20/
• WG Homepage
http://www.w3.org/International/multilingualweb/lt/
• ITS 2.0 test suite https://github.com/finnle/ITS2.0-Testsuite/
42
“Translate” locally in HTML5 or XML
(example: DocBook)
<db:article ...>
<db:para>The <db:emphasis its:translate="no">World Wide Web
Consortium</db:emphasis> is making the World Web Web
worldwide!</db:para> ...</db:article>
<!DOCTYPE html>
<html> ...
<p>The <span translate=no>World Wide Web Consortium</span>
is making the World Web Web worldwide!</p>...</html>
43
Capturing guidance for spec developers
and implementers (and you)
markup for
bidirectional text
normalization
working with case
sensitivity
more information
about date & time
Tests
International Activity
http://www.w3.org/International/
Articles
Tutorials
Technical notes
Tests
Talks
Tools
Reviews
Checker tool
http://validator.w3.org/i18n-checker/
1.
Discover
2.
Check
Get involved!
• Follow the discussions on the internationalization mailing
lists (eg. [email protected]), and track other
technologies for internationally relevant topics. Follow our
RSS feeds and twitter channels (@webi18n and
@multilingweb)
• Read and review specifications (http://www.w3.org/TR/trtechnology-drafts) and send comments to the wwwinternational list or direct to the Working Group.
• Discuss local requirements for the Multilingual Web, and if
you identify missing features, find ways to coordinate
proposals.
• Use features needed for non-Latin script support and push
implementers to include more in browsers and authoring
tools.
• Join the Working Group
The Web needs your help
this is your Web –
not the W3C's
we need you to
make the Web
worldwide
get involved
Thank you
http://www.inter-locale.com/whitepaper/imug2013