What`s New in XLIFF 1.2 - Localisation Research Centre

Download Report

Transcript What`s New in XLIFF 1.2 - Localisation Research Centre

What’s New in XLIFF
1.2?
Tony Jewtushenko
Director Research & Development
Product Innovator Ltd.
Co-Chair – OASIS XLIFF TC
The XML Localisation Interchange File Format
Agenda

Overview of XLIFF
Definition, goals, benefits, architecture and basic XLIFF
concepts

What’s new in XLIFF 1.2
New and changed features of XLIFF 1.2 normative
specification

Non-Normative Representation Guides
A brief introduction of the representation guides provided with
XLIFF 1.2
XLIFF Overview
A glance at the definitions, goals and benefits
of the XML Localisation Interchange File
Format.
What is XLIFF?
A specification
for the lossless interchange of localizable
data and its related information,
which is tool-neutral,
has been formalized as an XML
vocabulary,
and features an extensibility mechanism.
Why XLIFF was created…
Localisation Is Difficult




Insufficient interoperability between tools
Lack of support for overall localisation
workflow
Necessity of localisation tools developers to
deal with many formats
Large number of proprietary intermediate
formats
XLIFF Timeline
Sep 2000
DataDefinition
Kickoff
Jun 2001
Whitepaper
Published
2001
Dec 2001
OASIS XLIFF
TC Proposal
Submitted
2002
9/00
Mar 2001
Apr 2002
Draft 1.0 Spec
XLIFF 1.0
and DTD
Committee Spec
published
Approved
2003
May 2003
Nov 03
XLIFF 1.1
Revised XLIFF 1.1
Committee Spec Committee Spec
Approved
Approved
May 2006
XLIFF 1.2 Committee Spec
Representation Guides
Approved
2004
2005
2006
9/06
Dec 03 - May 06
XLIFF 1.2 Segmentation
Representation Guides for (X)HTML,
Java, PO/POT
Aug 03 - Sep 03
14 Jul, 2006 - 12 Sep, 2006
XLIFF 1.1 Public
XLIFF 1.2
Peer Review
Public Peer Review
Contributors to XLIFF Past and Present












Alchemy Software
Bowne Global Solutions
Convey Software
Ektron, Inc
ENLASO Corp (RWS)
Globalsight
Heartsome
HP
Idiom Technologies, Inc
Lionbridge
LRC
Lotus/IBM












Microsoft
Moravia IT
Novell
Oracle
Red Hat
PASS Engineering
SAP
SDL International
Sun Microsystems
Tektronix
TRADOS
XML Intl
OASIS XLIFF TC Members
as of 1 Sept 06


TC Officers:

Chairs: Tony Jewtushenko, Product Innovator Ltd; Bryan Schnabel, Tektronix

Secretary: Peter Reynolds, Idiom Technologies, Inc.
Current Members of TC:
Mat Lovatt, Oracle
•
Doug Domeny, Ektron
•
Rodolfo Raya, Heartsome
•
Eiju Akahane, IBM
•
Steven Harris, Idiom Technologies, Inc.
•
Fredrik Corneliusson, Lionbridge
•
Joachim Schurig, Lionbridge
•
Milan Karasek, Moravia IT
•
Florian Sachse, Pass Engineering
•
Christian Lieske, SAP
•
Magnus Martikainen, SDL International
•
David Pooley, SDL International
•
Kevin Bargary, University of Limerick Localisation Research Centre
•
Reinhard Schaler, University of Limerick Localisation Research Centre
•
Andrzej Zydron, XML- Intl
•
OASIS: Standards Body
Home of XLIFF






OASIS: Organization for the Advancement of Structured
Information Standards
World’s largest independent, non-profit organization dedicated to
the standardisation of XML applications and Web Services
More than 150 member companies plus individuals
Operates XML.ORG Registry, the open community
clearinghouse of XML application schemas clearinghouse of XML
application schemas
Technical work on XML interoperability includes XML
conformance and XML Registries/Repositories
General XML technical resource
XLIFF Benefits:
Reduces Effort in
Deploying Integrated
Best of Breed
Solutions
Reduces Vendor
Lock-In, Re-Use
Reduce cost,
turnaround time
Reduces Defects
introduced by
Manual Processing
and Handling
Interoperability
Open
Standards
Automation
Flexiblility
Cost,
Time
Scalability
Easy to scale and future
proof
Leverages services,
technologies,
vendors
High Level XLIFF
Architecture
1.
2.
3.
4.
An XLIFF document is a container for all
data needed for a localisation project:
Localizable objects (e.g. text strings,
graphics) in source and target languages.
Supplementary information (e.g. glossaries,
or material to recreate the original format).
Administrative information (e.g. workflow
data).
Custom data (e.g. initialization information
for tools).
The XLIFF Document




An XLIFF document is designed to store the
extracted data related to localisation.
Each given source container (e.g. a file, a
database table, and so forth) corresponds to
a <file> element in XLIFF.
Each XLIFF document can include several
<file> elements.
An entire localisation project could stored in a
single XLIFF document.
Bilingual Model



Each <file> element is designed to store
one source language and one target
language
The rationale is that the translation of
different target language is done by different
people most of the time
However, languages in <alt-trans> element
can be different. For example, proposed
matches in national Portuguese when
translating into Brazilian Portuguese.
Localisable Objects



Besides localisable text, XLIFF can also
contain other localisable object types such as
binary graphics
Supplementary information can be
represented in a generic way through inline
codes (e.g. formatting of text)
Relationship between object can be captured
(e.g. a hierarchical menu or text related to a
web graphic)
Supplementary Info

XLIFF provides “hooks” for storing
supplementary information in reference
element




Glossaries
Translation memories
Segmentation Rules (via SRX file)
The supplementary information can be
referenced (i.e. reside outside of the
document), or embedded within the
document
Administrative Info
XLIFF provides mechanisms for capturing
administrative information:




For relating source material to XLIFF
documents.
For storing workflow data.
For providing pre-translation entries.
For keeping track of changes.
Administrative Info – PreTranslation
A set of proposed translations can be
included for each <trans-unit> element,
using the <alt-trans> element.
<trans-unit id='1'>
<source xml:lang='en'>The text</source>
<alt-trans quality-match='high'
origin='MTsystem'>
<target xml:lang='fr'>Le texte</target>
</alt-trans>
</trans-unit>
Customising XLIFF
Customise XLIFF by extending (adding) user
defined:



Elements
Attributes
Attribute Values
Extending Elements

Extension points in the following elements:


content of each custom element can be any valid
XML content:


<alt-trans>, <bin-unit>,<group>, <header>,<tool>,
<trans-unit>, and new in 1.2: <xliff> and <seg-source>.
empty content, PCDATA, mixed content, and so forth
Custom elements defined in private namespace
schema
Example of Extending Elements
<xliff version='1.2'
xmlns='urn:oasis:names:tc:xliff:document:1.2'
xmlns:sup='http://www.ChaucerState.ac.pg/Frm/XLFSup-v1'>
<file original='passus-1.doc' source-language='enm‘
datatype='plaintext'>
<group>
<sup:SourceInfo>
<sup:Book>Piers Plowman, Passus 1</sup:Book>
<sup:Author>William Langland</sup:Author>
</sup:SourceInfo>
<sup:WorkInfo Task='transcription' Context='Middle-English:1360'/>
<trans-unit id='1'>
<source xml:lang='enm'>What this mountaigne bymeneth</source>
<target xml:lang='en'>What this mountain means</target>
<sup:Reference Type='strophe'>1-a</sup:Reference>
</trans-unit>
</group>
</file>
</xliff>
Non-XLIFF elements in
BOLD
Non-XLIFF elements
Defined in XSD:
<xsd:schema targetNamespace="XLFSup-v1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sup="http://www.ChaucerState.ac.pg/Frm/XLFSup-v1"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xsd:element name="SourceInfo">
<xsd:complexType>
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="Book" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="WorkInfo">
<xsd:complexType>
<xsd:attribute name="Task" type="xsd:string"/>
<xsd:attribute name="Context" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="Reference">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">Struct_InLine
<xsd:attribute name="Type" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Extending Attributes

Attributes of a namespace different than XLIFF can
be included in these XLIFF elements:



<alt-trans>, <bin-source>, <bintarget>,<bin-unit>, <bpt>,
<bx/>, <ept>, <ex/>, <file>, <g>, <group>, <it>,
<mrk>,<ph>, <source>, <target>, <tool>, <trans-unit>,
<x/>, and new in 1.2 :<xliff>, <seg-source>.
No specific location where to insert the non-XLIFF
attributes
No limit to the number of non-XLIFF attributes that
can be used in an XLIFF document
Extending Attributes
Attributes from HTML extend <group> and <trans-unit>
<xliff version='1.2'
xmlns='urn:oasis:names:tc:xliff:document:1.2'
xmlns:htm='http://www.w3.org/1999/xhtml'>
<file original='table.htm' source-language='en' datatype='html'>
<group restype='table' htm:border='1' htm:cellpadding='5‘ htm:cellspacing='0' htm:width='100%'>
<group restype='row'>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 1</source>
</trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 2</source>
</trans-unit>
</group>
<group restype='row'>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 1</source>
</trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 2</source>
</trans-unit>
</group>
</group>
</file>
</xliff>
Extending Attribute Values



Attributes where the list of values can be extended
are the following: context-type, count-type, ctype,
datatype, mtype, priority, purpose, restype, size-unit,
state, state-qualifier, unit; new in 1.2: alttranstype,
reformat
User-defined values must start with a “x-” prefix
There is no specified mechanism to validate
individual user-defined values, beyond starting with
“x-”
Example of Extending
Attribute Values
The following excerpt shows how the
user-defined value “x-for-engineer” can
be utilized in a document:
...
<group>
<context-group name='EngineersData'>
<context context-type='x-forengineers'>Data...</context>
</context-group>
</group>
...
Embedding XLIFF


Can embed an entire or part of an XLIFF doc
in other XML doc
Valid where XML defined by XML Schema
(XSD) includes an <any> element in the
definition of the element where the XLIFF
data can be inserted
What’s new in XLIFF 1.2
New and changed features of XLIFF 1.2 normative
specification
New, Deprecated or
Changed 1.1 to 1.2






Validation via Transitional and Strict models
Segmentation Support added
Add mid as an optional attribute for the <alt-trans> element
Changed name attribute for <context-group> from required to
optional, and modified description
Added extension point at <xliff>
Tracking/Accepting Suggested Translations added:
 Add a alttranstype attribute for the alt-trans element.
 Deprecate the use of multiple target elements in a single alttrans.
 Deprecate the restype attribute for the target element.
 Introduce the phase-name attribute for alt-trans element.
 Introduce a convention: more recent alt-trans elements should
appear before older ones.
Validation in 1.2

Validation via two “Flavours” of XSD
(Schema):

Transitional: Deprecated (obsolete) elements
and attributes are permitted. Use to validate
reading older version documents (XLIFF 1.1).
xsi:schemaLocation='urn:oasis:names:tc:xliff:doc
ument:1.2 xliffcore-1.2-transitional.xsd‘

Strict: Deprecated items are not permitted. Use
to validate when creating XLIFF 1.2 documents.
xsi:schemaLocation='urn:oasis:names:tc:xliff:document
:1.2 xliffcore-1.2-strict.xsd'
XLIFF 1.2 Segmentation:
seg-source
How corresponding segments are referenced between
<seg-source> and <target>
<trans-unit id= "1">
<source>First sentence.Second sentence.</source>
<seg-source>
<mrk mtype="seg" mid="1">First sentence.</mrk>
<mrk mtype="seg" mid="2">Second sentence.</mrk>
</seg-source>
<target>
<mrk mtype="seg" mid="1">Translated first sentence.</mrk>
<mrk mtype="seg" mid="2">Translated second sentence.</mrk>
</target>
</trans-unit>
XLIFF 1.2 Segmentation:
seg-source
Alt-trans may also be segmented:
<trans-unit id="3">
<source>First sentence. Second sentence.</source>
<alt-trans match-quality="100%">
<source>The second sentence.</source>
<seg-source>
<mrk mtype="seg" mid="1">First sentence.</mrk>
<mrk mtype="seg" mid="2">Second sentence.</mrk>
</seg-source>
<target>
<mrk mtype="seg" mid="1">Translated first sentence.</mrk>
<mrk mtype="seg" mid="2">Translated second sentence.</mrk>
</target>
</alt-trans>
</trans-unit>
XLIFF 1.2 Segmentation:
merged-trans
Aggregating translations across multiple
trans-units:
<group merged-trans="yes">
<trans-unit id="t1">
<source>The German acronym v.</source>
<target equiv-trans="no">Niemiecki skrót v. OT
oznacza górną pozycję silnika.</target>
</trans-unit>
<trans-unit id="t2">
<source>OT signifies the top dead center position for
an engine.</source>
<target equiv-trans="no"/>
</trans-unit>
</group>
XLIFF 1.2 Segmentation:
equiv-trans
To denote when translation is not direct
equivalent to source:
<trans-unit id="t1">
<source>Constrained text for limited</source>
<target equiv-trans="no">Tekst angielski dla</target>
</trans-unit>
<trans-unit id="t2">
<source>display for English</source>
<target equiv-trans="no">ograniczonego pola</target>
</trans-unit>
XLIFF 1.2 Add a type attribute for
the <alt-trans> element
The type attribute is to be optional, and is to have the following
values and meanings:
Value
Meaning
proposal (default)
The <alt-trans> represents a translation proposal from a
translation memory or other resource.
previous-version
The <alt-trans> represents a previous version of the
<target> element
rejected
The <alt-trans> represents a rejected version of the <target>
element.
reference
The <alt-trans> represents a translation to be used for
reference purposes only, for example from a related product
or a different language
accepted
The <alt-trans> represents a proposed translation that was
used for the translation of the trans-unit, possibly modified.
XLIFF 1.2 Additional
revision to alt-trans




Introduce the phase-name attribute for <alt-trans>
 makes it possible to find out who made the change, when,
and which process the change was introduced in
Deprecate the restype attribute for the <target> element
 no longer needed, as the <target> is always of the same restype
as the <trans-unit> or <alt-trans> it appears in
Introduce the phase-name attribute for <alt-trans>
 makes it possible to find out who made the change, when,
and which process the change was introduced in
convention: more recent <alt-trans> elements should appear
before older ones
 determine the order of changes if multiple previous versions have
been introduced
Non-Normative
Representation Guides
A brief walk-through of the Representation
Guides provided with XLIFF 1.2
Purpose of the Guides


Synonymous with “profile” specifications
Non-normative


Not requirement for “legal” XLIFF 1.2
Guidance for consistently representing native
formats as XLIFF across implementations


Kickstart new implementations
Better interoperability between tools
Guide Contents




Recommended Extraction Techniques and
Considerations
Recommended mappings from native
structures to XLIFF
Strategies for implementing Translation
Memory support (using inline tags)
Detailed examples and supplementary
sample files
Extract-Localize-Merge
Minimalist Approach

Process:
1.
2.
3.
4.
5.



Identify localisable content (resources) and non-localisable content (code)
Populate XLIFF document’s trans-unit and bin-unit with localisable content
Create “Skeleton File” with localisable content stripped out and replaced with tokens that map to
XLIFF trans-unit or bin-unit ID’s
Translate XLIFF document
Merge translated data in XLIFF with Skeleton to generate the localised translated material
Skeleton file is optional and not recommended in certain circumstances (e.g., HTML or if tool
interoperability required)
In <SKL> embed the entire Skeleton file within the XLIFF file or specify the file’s location
XLIFF doesn’t define the Skeleton file or token format
Convert/Transform Paradigm
(maximalist approach)
Original
Material
Filter
Translated
Material
XLIFF

Process:
1.
2.
3.
4.


Convert original material by mapping entire original document to XLIFF (using
representation guides)
Structural information (code) stored in XLIFF container as non-translatable trans-units /
bin-units
Translate XLIFF content
Generate the native translated material directly from the XLIFF content
Best suited for textual resource formats (RCDATA, Java, PO/POT) and mark-up
languages like (X)HTML and XML
Difficult and impractical for binary resource formats (e.g., EXE’s and DLL’s)
Minimalist Example –Source
Content & Skeleton
A very simple HTML file:
<html>
<head>
<h1 class='title'>Almost the Smallest HTML File</title>
</head>
<body>
<p>Just some stuff here to fill up space</p>
</body>
</html>
Original
Content
Filter
…
<html>
<head>
<title>%%%1%%%</title>
</head>
<body>
<p>%%%2%%%</p>
</body>
</html>
Skeleton
<header>
<skl>
<external-file href='sample.skl'/>
</skl>
</header>
<body>
<trans-unit id='%%%1%%%'>
<source xml:lang='en'>Almost the Smallest HTML File</source>
</trans-unit>
<trans-unit id='%%%2%%% “restype='x-html-p'>
<source xml:lang='en'>Just some stuff here to fill up
space</source>
</trans-unit>
</body>
XLIFF
…
Maximalist Example –
Transform content to XLIFF
Full Transformation:
<html>
<head>
<h1 class='title'>Almost the Smallest HTML File</title>
</head>
<body>
<p>Just some stuff here to fill up space</p>
</body>
</html>
Original
Content
…
<body>
<group restype='x-html-html'>
<group restype='x-html-head'>
<trans-unit id='1' restype='x-html-p-title' html:class='title'>
<source xml:lang='en'>Almost the Smallest HTML File</source>
</trans-unit>
</group>
<group restype='x-html-body'>
<trans-unit id='2' restype='x-html-p'>
<source xml:lang='en'>Just some stuff here to fill up space</source>
</trans-unit>
</group>
</group>
</body>
…
XLIFF
Guides provided with
XLIFF 1.2

(X)HTML


Java Resource Bundles


Many flavours of HTML, guide focuses on HTML
4.01, XHTML 1.0
Support for java.util.ResourceBundle
abstract class’ two subclasses:
PropertyResourceBundle and
ListResourceBundle
Gettext PO/POT files

Linux resource format
To Get the Most from the
Guides

Review the document in full before commencing design or development of an
XLIFF solution



Consider the Guide’s recommended Extraction approach when designing
overall architecture:




HTML recommends “maximalist”, but provides examples for “minimalist” as well.
Both PO/POT and Java make no specific recommendation, but examples are
“maximalist”
Order of Extraction recommendations: typically in the order of the data in the source
document
Refer to Mappings Reference in each guide when designing and building filters



Considerations for recommended source document structure and content
Identify exceptions (e.g., dynamically generated HTML via server-side processing)
Recommendations are comprehensive with many examples
Non-standard structures and conventions are dealt with (especially for (X)HTML)
Use the Sample files



Valuable reference for learning
Provides validation during development effort
Verify compliance by feeding sample files into filter – either native source or XLIFF
More Representation
Guides

Late draft of Windows 32 / .NET



Not approved, but is posted on the XLIFF website
Requires more expert input
More to follow upon request
More Information


The XLIFF TC Web Site: http://www.xliff.org
Presenter:

XLIFF TC Co-Chair: Tony Jewtushenko (Product
Innovator Ltd)
([email protected])
Thank You...
Questions?
Product Innovator Ltd
provides product management and software process
improvement training and mentoring services to
technology companies seeking to maximize their
productivity and revenue potential
Contact:
[email protected]
www.productinnovator.com
+353 1 8875183 / +353.87.2479057