Localization and HTML5: Technical Aspects

Download Report

Transcript Localization and HTML5: Technical Aspects

Localization and HTML5: Technical Aspects Felix Sasaki DFKI / W3C Fellow

Sasaki – Feisgiltt 2012 1

Pitch: Why this presentation?

• • • • HTML5 is the upcoming (or existing) format for content on the Web The Web is becoming multilingual HTML5 localization is essential to make this happen Localization workflows with HTML5 input / output need to take various aspects of HTML5 into account – learn more here  2 Sasaki – Feisgiltt 2012

Acknowledgement

• Thanks to Jirka Kosek for introducing the participants of the W3C MultilingualWeb-LT working group to the “do” and “do not” of HTML5 content creation and processing Sasaki – Feisgiltt 2012 3

Overview

• • • • HTML5 Serializations + Model Localization Workflow with HTML5 Metadata for (HTML5) Localization What Else?

Sasaki – Feisgiltt 2012 4

HTML5 – Serializations + Model

• Two serializations My example ... My example ... 5 Sasaki – Feisgiltt 2012

HTML5 – Serializations + Model

• Two serializations: HTML5 vs. XHTML5 My example ... My example ... 6 Sasaki – Feisgiltt 2012

HTML5 – Serializations + Model

• Two serializations: HTML5 vs. XHTML5 My example ... My example ... One Document Object Model (DOM) document.getElementsByTagName("meta") 7 Sasaki – Feisgiltt 2012

Rational

• • • More than 90% of the Web is invalid – See browser “Opera” MAMA report XHTML was revolution HTML5 is evolution – Parsing algorithm for existing Web content – Two serializations as input – Detailed error handling – Ouput: one DOM Sasaki – Feisgiltt 2012 8

Overview

• • • • HTML5 Serializations + Model Localization Workflow with HTML5 Metadata for (HTML5) Localization What Else?

Sasaki – Feisgiltt 2012 9

Localization Workflow with HTML5

HTML5 as XML HTML5 HTML5 as HTML HTML5 as HTML with errors XLIFF-based Localization XHTML5 10 Sasaki – Feisgiltt 2012

Localization Workflow with HTML5

HTML5 as XML HTML5 HTML5 as HTML HTML5 as HTML with errors Sasaki – Feisgiltt 2012 HTML5 parsing > DOM creation > (XML serialization) > XLIFF generation XLIFF-based Localization XHTML5 11

Localization Workflow with HTML5

HTML5 as XML HTML5 HTML5 as HTML HTML5 as HTML with errors Sasaki – Feisgiltt 2012 HTML5 parsing > DOM creation > (XML serialization) > XLIFF generation XLIFF-based Localization Transformation > XHTML5 > HTML5 parsing > HTML5 or XHTML5 XHTML5 12

Localization Workflow with HTML5

Central:

HTML5 parsing library

, e.g. validator.nu

HTML5 as XML HTML5 HTML5 as HTML XLIFF-based Localization XHTML5 HTML5 as HTML with errors Sasaki – Feisgiltt 2012

HTML5 parsing

> DOM creation > (XML serialization) > XLIFF generation Transformation > XHTML5 >

HTML5 parsing

> HTML5 or XHTML5 13

Overview

• • • • HTML5 Serializations + Model Localization Workflow with HTML5 Metadata for (HTML5) Localization What Else?

Sasaki – Feisgiltt 2012 14

Metadata for (HTML5) Localization: ITS 2.0

• • “Internationalization Tag Set” 2.0

Set of disjoint metadata items (“data categories”) for XML and HTML5 – Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size 15 Sasaki – Feisgiltt 2012

Metadata for (HTML5) Localization: ITS 2.0

• • “Internationalization Tag Set” 2.0

Some items are part of HTML5 spec – Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size 16 Sasaki – Feisgiltt 2012

“Translate”

Translate flag test: Default

The

translate=no

>World Wide Web Consortium is making the World Web Web worldwide!

Sasaki – Feisgiltt 2012 17

ITS “global rules”

• XPath based metadata approach – Attach metadata to several nodes – Specify metadata for a document format or (HTML) template – Example: map proprietary HTML to ITS “translate” 18 Sasaki – Feisgiltt 2012

ITS “inline”, e.g. global rules in HTML5

• • • “Work” inside HTML “script” element with proper mime type Upcoming: application/its+xml If possible: avoid; use linked rules ...

... 19 Sasaki – Feisgiltt 2012

“Terminology”

Terminology test: default

We need a new

its-term=yes

>motherboard

Sasaki – Feisgiltt 2012 20

“Directionality”

Dir test: Default

In Arabic, the title

dir=rtl lang=ar

> ، ليودتلا طاشن W3C means Internationalization Activity, W3C.

Sasaki – Feisgiltt 2012 21

“Ruby” – XHTML vs. HTML5

日本 にっぽん でん 日本 にっぽん でん

“Domain”

/> • • • Means Express domain information about content of „body“ element Domain information is in the „meta“ element Optional mapping of source content domains, e.g. automotive > auto Purpose: not define a domain vocabulary, but pass domain information to application (MT system, MT training tool) Sasaki – Feisgiltt 2012 23

“Storage Size”

Example

String to translate:

its-storage-size=25

>Papua New-Guinea

its-storage-size=25

>Dominican Replubic

Sasaki – Feisgiltt 2012 24

“Translate” in XML and HTML5

• ITS namespace vs. HTML5 native “translate” attribute ...

You need a new motherboard ... ...

You need a new motherboard...

...

Sasaki – Feisgiltt 2012 25

“Terminology” in XML and HTML5

• ITS namespace vs. HTML5 its-* “term” attribute ...

You need a new motherboard ... ...

You need a new motherboard...

...

Sasaki – Feisgiltt 2012 26

“Quality” metadata in the browser

……

its-loc-quality-issues-ref=#lq1

>c'es le contenu

… See life demo at http://tinyurl.com/its2-lq-html5 27 Sasaki – Feisgiltt 2012

Rationale for its-*

• • • HTML attributes are case insensitive; no qualified namespace ITS 1.0/2.0 attributes use – camel case: its:locNote, its:termInfo, its:withinText, … – ITS namespace Good news: conversion to HTML5 is straight forward – its-loc-note, its-term-info, its-within-text, … Sasaki – Feisgiltt 2012 28

Effect on Localization Workflow

translate, dir, its-locNote, its-termInfo, … : „interpretation“ like its:translate, its:termInfo, ...

HTML5 as XML HTML5 HTML5 as HTML XLIFF-based Localization XHTML5 HTML5 as HTML with errors Sasaki – Feisgiltt 2012 HTML5 parsing > DOM creation > (XML serialization) > XLIFF generation Transformation > XHTML5 > HTML5 parsing > HTML5 or XHTML5 29

Overview

• • • • HTML5 Serializations + Model Localization Workflow with HTML5 Metadata for (HTML5) Localization What Else?

Sasaki – Feisgiltt 2012 30

Other HTML versions

• • • “HTML legacy content”: no native supported for its-* HTML validation tools will complain Good news: its-* attributes “work” in older versions of HTML (e.g. 3.2 or 4.01), e.g. recognized by HTML DOM parser Sasaki – Feisgiltt 2012 31

Tool support

• its-* attributes in the pipeline for W3C HTML validator • Lot’s of XML+ITS / HTML5+ITS (partially) sensitive tools being developed in W3C MultilingualWeb-LT working group – HTML5 validation with ITS 2.0 metadata, XML tool chain, online MT system, translation package creation, simple MT, HTML-to-TMS roundtrip, CMS support (Drupal), quality check, browser based review, named entity annotation, … – *Very raw* details (but further links!) at http://tinyurl.com/its2-use-cases 32 Sasaki – Feisgiltt 2012

What’s missing?

• • • ITS 2.0 localization focuses on HTML markup – Elements, attributes Server side / client side scripting content not taken into account – JavaScript, PHP, … Using ITS 2.0 in HTML5 with XLIFF: still many bits missing – But: moving forward this week  Sasaki – Feisgiltt 2012 33

Overview again …

• • • • HTML5 Serializations + Model Localization Workflow with HTML5 Metadata for (HTML5) Localization What Else?

Sasaki – Feisgiltt 2012 34

ありがとうございました。 Localization and HTML5: Technical Aspects Felix Sasaki DFKI / W3C Fellow

Sasaki – Feisgiltt 2012 35

LOCALIZATION AND HTML5: POTENTIAL SLIDES FOR “CHALLENGES AND PROMISES”

Sasaki – Feisgiltt 2012 36

What is HTML5?

• • • DOM specification Parsing algorithm to cover most of current (and future) Web content A set of APIs – Part of HTML5 specification – Defined in separate documents Explanatory and other documents For markup authors, XML tool chains etc.

Sasaki – Feisgiltt 2012 37

HTML5 – Serializations + Model

• Two serializations My example ... My example ... 38 Sasaki – Feisgiltt 2012

HTML5 – Serializations + Model

• Two serializations: HTML5 vs. XHTML5 My example ... My example ... 39 Sasaki – Feisgiltt 2012

HTML5 – Serializations + Model

• Two serializations: HTML5 vs. XHTML5 My example ... My example ... One Document Object Model (DOM) document.getElementsByTagName("meta") 40 Sasaki – Feisgiltt 2012

Rational

• • • More than 90% of the Web is invalid – See browser “Opera” MAMA report XHTML was revolution HTML5 is evolution – Parsing algorithm for existing Web content – Two serializations as input – Detailed error handling – Ouput: one DOM Sasaki – Feisgiltt 2012 41

HTML5: current state

• • • Developed within – W3C: HTML5 to become a standard – WHATWG http://www.whatwg.org/ “living standard” - HTML as a High pressure in W3C to wrap up – Rationale: “We need one stable version” At the same time: “We need more features!” – e.g.

– ITS 2.0

– HTML accessibility http://www.w3.org/WAI/PF/html task-force Sasaki – Feisgiltt 2012 42

Plan: HTML5 finalized by 2014

• • • Finish HTML5 specification in W3C by 2014 Work closely with WHATWG and others on new features, for next version Don’t try to get everything into HTML5!

– Allow for extension specifications, e.g. ITS 2.0

– Moving forward at their own pace Sasaki – Feisgiltt 2012 43

HTML5 time line

2012 2013 2014 2015 2016 ---------- ---------- ---------- ---------- --------- HTML5.0 CR start ...CR, LC Rec ... ...

HTML5.1 FPWD -- LC + CR ...CR Rec From http://dev.w3.org/html5/decision-policy/html5-2014 plan.html

44 Sasaki – Feisgiltt 2012

Challenge: many extensions

• • • • • • • • • HTML+RDFa - RDFa WG Web Intents - Web Apps WG / Device APIs WG HTML Editing APIs - HTML Editing APIs CG HTML Media Capture - Device APIs WG Media Capture and Streams - Device APIs WG / WebRTC WG Media Fragments URI - Media Fragments WG Encrypted Media Extensions - HTML WG Media Source Extensions - HTML WG ...

many rel value specifications registered at the link type registry – Microformats Sasaki – Feisgiltt 2012 45

Promises: many extensions

• • See last slide  That also means: Easy of adding localization features to HTML5 Sasaki – Feisgiltt 2012 46

HTML5 and Localization Issues

• • • Localization: Mostly covered by ITS 2.0

Technical aspects: see presentation from Felix Sasaki on Tuesday Important: get by-in by browser vendors – Awareness of ITS 2.0

– Fostering browser based implementations – Easy of adoption for web developers 47 Sasaki – Feisgiltt 2012

• • •

Metadata for (HTML5) Localization: ITS 2.0

“Internationalization Tag Set” 2.0

Set of disjoint metadata items (“data categories”) for XML and HTML5 Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size 48 Sasaki – Feisgiltt 2012

Metadata for (HTML5) Localization: ITS 2.0

• • “Internationalization Tag Set” 2.0

Some items are part of HTML5 spec – Translate, Localization Note, Terminology, Directionality, Ruby, Language Information, Elements Within Text, Domain, Locale Filter, Provenance, Text Analysis Annotation, External Resource, Target Pointer, Id Value, Preserve Space, Localization Quality Issue, Localization Quality Précis, MT Confidence, Allowed Characters, Storage Size 49 Sasaki – Feisgiltt 2012

HTML5 and Internationalization Issues

• Many things to do – Ruby – International layout (work done mostly via CSS3 modules) – Here: our most favorite i18n core issues 50 Sasaki – Feisgiltt 2012