What`s New in XLIFF 1.2 - Localisation Research Centre
Download
Report
Transcript What`s New in XLIFF 1.2 - Localisation Research Centre
What’s New in XLIFF
1.2?
Tony Jewtushenko
Director Research & Development
Product Innovator Ltd.
Co-Chair – OASIS XLIFF TC
The XML Localisation Interchange File Format
Agenda
Overview of XLIFF
Definition, goals, benefits, architecture and basic XLIFF
concepts
What’s new in XLIFF 1.2
New and changed features of XLIFF 1.2 normative
specification
Non-Normative Representation Guides
A brief introduction of the representation guides provided with
XLIFF 1.2
XLIFF Overview
A glance at the definitions, goals and benefits
of the XML Localisation Interchange File
Format.
What is XLIFF?
A specification
for the lossless interchange of localizable
data and its related information,
which is tool-neutral,
has been formalized as an XML
vocabulary,
and features an extensibility mechanism.
Why XLIFF was created…
Localisation Is Difficult
Insufficient interoperability between tools
Lack of support for overall localisation
workflow
Necessity of localisation tools developers to
deal with many formats
Large number of proprietary intermediate
formats
XLIFF Timeline
Sep 2000
DataDefinition
Kickoff
Jun 2001
Whitepaper
Published
2001
Dec 2001
OASIS XLIFF
TC Proposal
Submitted
2002
9/00
Mar 2001
Apr 2002
Draft 1.0 Spec
XLIFF 1.0
and DTD
Committee Spec
published
Approved
2003
May 2003
Nov 03
XLIFF 1.1
Revised XLIFF 1.1
Committee Spec Committee Spec
Approved
Approved
May 2006
XLIFF 1.2 Committee Spec
Representation Guides
Approved
2004
2005
2006
9/06
Dec 03 - May 06
XLIFF 1.2 Segmentation
Representation Guides for (X)HTML,
Java, PO/POT
Aug 03 - Sep 03
14 Jul, 2006 - 12 Sep, 2006
XLIFF 1.1 Public
XLIFF 1.2
Peer Review
Public Peer Review
Contributors to XLIFF Past and Present
Alchemy Software
Bowne Global Solutions
Convey Software
Ektron, Inc
ENLASO Corp (RWS)
Globalsight
Heartsome
HP
Idiom Technologies, Inc
Lionbridge
LRC
Lotus/IBM
Microsoft
Moravia IT
Novell
Oracle
Red Hat
PASS Engineering
SAP
SDL International
Sun Microsystems
Tektronix
TRADOS
XML Intl
OASIS XLIFF TC Members
as of 1 Sept 06
TC Officers:
Chairs: Tony Jewtushenko, Product Innovator Ltd; Bryan Schnabel, Tektronix
Secretary: Peter Reynolds, Idiom Technologies, Inc.
Current Members of TC:
Mat Lovatt, Oracle
•
Doug Domeny, Ektron
•
Rodolfo Raya, Heartsome
•
Eiju Akahane, IBM
•
Steven Harris, Idiom Technologies, Inc.
•
Fredrik Corneliusson, Lionbridge
•
Joachim Schurig, Lionbridge
•
Milan Karasek, Moravia IT
•
Florian Sachse, Pass Engineering
•
Christian Lieske, SAP
•
Magnus Martikainen, SDL International
•
David Pooley, SDL International
•
Kevin Bargary, University of Limerick Localisation Research Centre
•
Reinhard Schaler, University of Limerick Localisation Research Centre
•
Andrzej Zydron, XML- Intl
•
OASIS: Standards Body
Home of XLIFF
OASIS: Organization for the Advancement of Structured
Information Standards
World’s largest independent, non-profit organization dedicated to
the standardisation of XML applications and Web Services
More than 150 member companies plus individuals
Operates XML.ORG Registry, the open community
clearinghouse of XML application schemas clearinghouse of XML
application schemas
Technical work on XML interoperability includes XML
conformance and XML Registries/Repositories
General XML technical resource
XLIFF Benefits:
Reduces Effort in
Deploying Integrated
Best of Breed
Solutions
Reduces Vendor
Lock-In, Re-Use
Reduce cost,
turnaround time
Reduces Defects
introduced by
Manual Processing
and Handling
Interoperability
Open
Standards
Automation
Flexiblility
Cost,
Time
Scalability
Easy to scale and future
proof
Leverages services,
technologies,
vendors
High Level XLIFF
Architecture
1.
2.
3.
4.
An XLIFF document is a container for all
data needed for a localisation project:
Localizable objects (e.g. text strings,
graphics) in source and target languages.
Supplementary information (e.g. glossaries,
or material to recreate the original format).
Administrative information (e.g. workflow
data).
Custom data (e.g. initialization information
for tools).
The XLIFF Document
An XLIFF document is designed to store the
extracted data related to localisation.
Each given source container (e.g. a file, a
database table, and so forth) corresponds to
a <file> element in XLIFF.
Each XLIFF document can include several
<file> elements.
An entire localisation project could stored in a
single XLIFF document.
Bilingual Model
Each <file> element is designed to store
one source language and one target
language
The rationale is that the translation of
different target language is done by different
people most of the time
However, languages in <alt-trans> element
can be different. For example, proposed
matches in national Portuguese when
translating into Brazilian Portuguese.
Localisable Objects
Besides localisable text, XLIFF can also
contain other localisable object types such as
binary graphics
Supplementary information can be
represented in a generic way through inline
codes (e.g. formatting of text)
Relationship between object can be captured
(e.g. a hierarchical menu or text related to a
web graphic)
Supplementary Info
XLIFF provides “hooks” for storing
supplementary information in reference
element
Glossaries
Translation memories
Segmentation Rules (via SRX file)
The supplementary information can be
referenced (i.e. reside outside of the
document), or embedded within the
document
Administrative Info
XLIFF provides mechanisms for capturing
administrative information:
For relating source material to XLIFF
documents.
For storing workflow data.
For providing pre-translation entries.
For keeping track of changes.
Administrative Info – PreTranslation
A set of proposed translations can be
included for each <trans-unit> element,
using the <alt-trans> element.
<trans-unit id='1'>
<source xml:lang='en'>The text</source>
<alt-trans quality-match='high'
origin='MTsystem'>
<target xml:lang='fr'>Le texte</target>
</alt-trans>
</trans-unit>
Customising XLIFF
Customise XLIFF by extending (adding) user
defined:
Elements
Attributes
Attribute Values
Extending Elements
Extension points in the following elements:
content of each custom element can be any valid
XML content:
<alt-trans>, <bin-unit>,<group>, <header>,<tool>,
<trans-unit>, and new in 1.2: <xliff> and <seg-source>.
empty content, PCDATA, mixed content, and so forth
Custom elements defined in private namespace
schema
Example of Extending Elements
<xliff version='1.2'
xmlns='urn:oasis:names:tc:xliff:document:1.2'
xmlns:sup='http://www.ChaucerState.ac.pg/Frm/XLFSup-v1'>
<file original='passus-1.doc' source-language='enm‘
datatype='plaintext'>
<group>
<sup:SourceInfo>
<sup:Book>Piers Plowman, Passus 1</sup:Book>
<sup:Author>William Langland</sup:Author>
</sup:SourceInfo>
<sup:WorkInfo Task='transcription' Context='Middle-English:1360'/>
<trans-unit id='1'>
<source xml:lang='enm'>What this mountaigne bymeneth</source>
<target xml:lang='en'>What this mountain means</target>
<sup:Reference Type='strophe'>1-a</sup:Reference>
</trans-unit>
</group>
</file>
</xliff>
Non-XLIFF elements in
BOLD
Non-XLIFF elements
Defined in XSD:
<xsd:schema targetNamespace="XLFSup-v1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sup="http://www.ChaucerState.ac.pg/Frm/XLFSup-v1"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xsd:element name="SourceInfo">
<xsd:complexType>
<xsd:sequence maxOccurs="unbounded">
<xsd:element name="Book" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="WorkInfo">
<xsd:complexType>
<xsd:attribute name="Task" type="xsd:string"/>
<xsd:attribute name="Context" type="xsd:string"/>
</xsd:complexType>
</xsd:element>
<xsd:element name="Reference">
<xsd:complexType>
<xsd:simpleContent>
<xsd:extension base="xsd:string">Struct_InLine
<xsd:attribute name="Type" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Extending Attributes
Attributes of a namespace different than XLIFF can
be included in these XLIFF elements:
<alt-trans>, <bin-source>, <bintarget>,<bin-unit>, <bpt>,
<bx/>, <ept>, <ex/>, <file>, <g>, <group>, <it>,
<mrk>,<ph>, <source>, <target>, <tool>, <trans-unit>,
<x/>, and new in 1.2 :<xliff>, <seg-source>.
No specific location where to insert the non-XLIFF
attributes
No limit to the number of non-XLIFF attributes that
can be used in an XLIFF document
Extending Attributes
Attributes from HTML extend <group> and <trans-unit>
<xliff version='1.2'
xmlns='urn:oasis:names:tc:xliff:document:1.2'
xmlns:htm='http://www.w3.org/1999/xhtml'>
<file original='table.htm' source-language='en' datatype='html'>
<group restype='table' htm:border='1' htm:cellpadding='5‘ htm:cellspacing='0' htm:width='100%'>
<group restype='row'>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 1</source>
</trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 1 column 2</source>
</trans-unit>
</group>
<group restype='row'>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 1</source>
</trans-unit>
<trans-unit id='1' htm:valign='top' htm:width='30%'>
<source>Text of row 2 column 2</source>
</trans-unit>
</group>
</group>
</file>
</xliff>
Extending Attribute Values
Attributes where the list of values can be extended
are the following: context-type, count-type, ctype,
datatype, mtype, priority, purpose, restype, size-unit,
state, state-qualifier, unit; new in 1.2: alttranstype,
reformat
User-defined values must start with a “x-” prefix
There is no specified mechanism to validate
individual user-defined values, beyond starting with
“x-”
Example of Extending
Attribute Values
The following excerpt shows how the
user-defined value “x-for-engineer” can
be utilized in a document:
...
<group>
<context-group name='EngineersData'>
<context context-type='x-forengineers'>Data...</context>
</context-group>
</group>
...
Embedding XLIFF
Can embed an entire or part of an XLIFF doc
in other XML doc
Valid where XML defined by XML Schema
(XSD) includes an <any> element in the
definition of the element where the XLIFF
data can be inserted
What’s new in XLIFF 1.2
New and changed features of XLIFF 1.2 normative
specification
New, Deprecated or
Changed 1.1 to 1.2
Validation via Transitional and Strict models
Segmentation Support added
Add mid as an optional attribute for the <alt-trans> element
Changed name attribute for <context-group> from required to
optional, and modified description
Added extension point at <xliff>
Tracking/Accepting Suggested Translations added:
Add a alttranstype attribute for the alt-trans element.
Deprecate the use of multiple target elements in a single alttrans.
Deprecate the restype attribute for the target element.
Introduce the phase-name attribute for alt-trans element.
Introduce a convention: more recent alt-trans elements should
appear before older ones.
Validation in 1.2
Validation via two “Flavours” of XSD
(Schema):
Transitional: Deprecated (obsolete) elements
and attributes are permitted. Use to validate
reading older version documents (XLIFF 1.1).
xsi:schemaLocation='urn:oasis:names:tc:xliff:doc
ument:1.2 xliffcore-1.2-transitional.xsd‘
Strict: Deprecated items are not permitted. Use
to validate when creating XLIFF 1.2 documents.
xsi:schemaLocation='urn:oasis:names:tc:xliff:document
:1.2 xliffcore-1.2-strict.xsd'
XLIFF 1.2 Segmentation:
seg-source
How corresponding segments are referenced between
<seg-source> and <target>
<trans-unit id= "1">
<source>First sentence.Second sentence.</source>
<seg-source>
<mrk mtype="seg" mid="1">First sentence.</mrk>
<mrk mtype="seg" mid="2">Second sentence.</mrk>
</seg-source>
<target>
<mrk mtype="seg" mid="1">Translated first sentence.</mrk>
<mrk mtype="seg" mid="2">Translated second sentence.</mrk>
</target>
</trans-unit>
XLIFF 1.2 Segmentation:
seg-source
Alt-trans may also be segmented:
<trans-unit id="3">
<source>First sentence. Second sentence.</source>
<alt-trans match-quality="100%">
<source>The second sentence.</source>
<seg-source>
<mrk mtype="seg" mid="1">First sentence.</mrk>
<mrk mtype="seg" mid="2">Second sentence.</mrk>
</seg-source>
<target>
<mrk mtype="seg" mid="1">Translated first sentence.</mrk>
<mrk mtype="seg" mid="2">Translated second sentence.</mrk>
</target>
</alt-trans>
</trans-unit>
XLIFF 1.2 Segmentation:
merged-trans
Aggregating translations across multiple
trans-units:
<group merged-trans="yes">
<trans-unit id="t1">
<source>The German acronym v.</source>
<target equiv-trans="no">Niemiecki skrót v. OT
oznacza górną pozycję silnika.</target>
</trans-unit>
<trans-unit id="t2">
<source>OT signifies the top dead center position for
an engine.</source>
<target equiv-trans="no"/>
</trans-unit>
</group>
XLIFF 1.2 Segmentation:
equiv-trans
To denote when translation is not direct
equivalent to source:
<trans-unit id="t1">
<source>Constrained text for limited</source>
<target equiv-trans="no">Tekst angielski dla</target>
</trans-unit>
<trans-unit id="t2">
<source>display for English</source>
<target equiv-trans="no">ograniczonego pola</target>
</trans-unit>
XLIFF 1.2 Add a type attribute for
the <alt-trans> element
The type attribute is to be optional, and is to have the following
values and meanings:
Value
Meaning
proposal (default)
The <alt-trans> represents a translation proposal from a
translation memory or other resource.
previous-version
The <alt-trans> represents a previous version of the
<target> element
rejected
The <alt-trans> represents a rejected version of the <target>
element.
reference
The <alt-trans> represents a translation to be used for
reference purposes only, for example from a related product
or a different language
accepted
The <alt-trans> represents a proposed translation that was
used for the translation of the trans-unit, possibly modified.
XLIFF 1.2 Additional
revision to alt-trans
Introduce the phase-name attribute for <alt-trans>
makes it possible to find out who made the change, when,
and which process the change was introduced in
Deprecate the restype attribute for the <target> element
no longer needed, as the <target> is always of the same restype
as the <trans-unit> or <alt-trans> it appears in
Introduce the phase-name attribute for <alt-trans>
makes it possible to find out who made the change, when,
and which process the change was introduced in
convention: more recent <alt-trans> elements should appear
before older ones
determine the order of changes if multiple previous versions have
been introduced
Non-Normative
Representation Guides
A brief walk-through of the Representation
Guides provided with XLIFF 1.2
Purpose of the Guides
Synonymous with “profile” specifications
Non-normative
Not requirement for “legal” XLIFF 1.2
Guidance for consistently representing native
formats as XLIFF across implementations
Kickstart new implementations
Better interoperability between tools
Guide Contents
Recommended Extraction Techniques and
Considerations
Recommended mappings from native
structures to XLIFF
Strategies for implementing Translation
Memory support (using inline tags)
Detailed examples and supplementary
sample files
Extract-Localize-Merge
Minimalist Approach
Process:
1.
2.
3.
4.
5.
Identify localisable content (resources) and non-localisable content (code)
Populate XLIFF document’s trans-unit and bin-unit with localisable content
Create “Skeleton File” with localisable content stripped out and replaced with tokens that map to
XLIFF trans-unit or bin-unit ID’s
Translate XLIFF document
Merge translated data in XLIFF with Skeleton to generate the localised translated material
Skeleton file is optional and not recommended in certain circumstances (e.g., HTML or if tool
interoperability required)
In <SKL> embed the entire Skeleton file within the XLIFF file or specify the file’s location
XLIFF doesn’t define the Skeleton file or token format
Convert/Transform Paradigm
(maximalist approach)
Original
Material
Filter
Translated
Material
XLIFF
Process:
1.
2.
3.
4.
Convert original material by mapping entire original document to XLIFF (using
representation guides)
Structural information (code) stored in XLIFF container as non-translatable trans-units /
bin-units
Translate XLIFF content
Generate the native translated material directly from the XLIFF content
Best suited for textual resource formats (RCDATA, Java, PO/POT) and mark-up
languages like (X)HTML and XML
Difficult and impractical for binary resource formats (e.g., EXE’s and DLL’s)
Minimalist Example –Source
Content & Skeleton
A very simple HTML file:
<html>
<head>
<h1 class='title'>Almost the Smallest HTML File</title>
</head>
<body>
<p>Just some stuff here to fill up space</p>
</body>
</html>
Original
Content
Filter
…
<html>
<head>
<title>%%%1%%%</title>
</head>
<body>
<p>%%%2%%%</p>
</body>
</html>
Skeleton
<header>
<skl>
<external-file href='sample.skl'/>
</skl>
</header>
<body>
<trans-unit id='%%%1%%%'>
<source xml:lang='en'>Almost the Smallest HTML File</source>
</trans-unit>
<trans-unit id='%%%2%%% “restype='x-html-p'>
<source xml:lang='en'>Just some stuff here to fill up
space</source>
</trans-unit>
</body>
XLIFF
…
Maximalist Example –
Transform content to XLIFF
Full Transformation:
<html>
<head>
<h1 class='title'>Almost the Smallest HTML File</title>
</head>
<body>
<p>Just some stuff here to fill up space</p>
</body>
</html>
Original
Content
…
<body>
<group restype='x-html-html'>
<group restype='x-html-head'>
<trans-unit id='1' restype='x-html-p-title' html:class='title'>
<source xml:lang='en'>Almost the Smallest HTML File</source>
</trans-unit>
</group>
<group restype='x-html-body'>
<trans-unit id='2' restype='x-html-p'>
<source xml:lang='en'>Just some stuff here to fill up space</source>
</trans-unit>
</group>
</group>
</body>
…
XLIFF
Guides provided with
XLIFF 1.2
(X)HTML
Java Resource Bundles
Many flavours of HTML, guide focuses on HTML
4.01, XHTML 1.0
Support for java.util.ResourceBundle
abstract class’ two subclasses:
PropertyResourceBundle and
ListResourceBundle
Gettext PO/POT files
Linux resource format
To Get the Most from the
Guides
Review the document in full before commencing design or development of an
XLIFF solution
Consider the Guide’s recommended Extraction approach when designing
overall architecture:
HTML recommends “maximalist”, but provides examples for “minimalist” as well.
Both PO/POT and Java make no specific recommendation, but examples are
“maximalist”
Order of Extraction recommendations: typically in the order of the data in the source
document
Refer to Mappings Reference in each guide when designing and building filters
Considerations for recommended source document structure and content
Identify exceptions (e.g., dynamically generated HTML via server-side processing)
Recommendations are comprehensive with many examples
Non-standard structures and conventions are dealt with (especially for (X)HTML)
Use the Sample files
Valuable reference for learning
Provides validation during development effort
Verify compliance by feeding sample files into filter – either native source or XLIFF
More Representation
Guides
Late draft of Windows 32 / .NET
Not approved, but is posted on the XLIFF website
Requires more expert input
More to follow upon request
More Information
The XLIFF TC Web Site: http://www.xliff.org
Presenter:
XLIFF TC Co-Chair: Tony Jewtushenko (Product
Innovator Ltd)
([email protected])
Thank You...
Questions?
Product Innovator Ltd
provides product management and software process
improvement training and mentoring services to
technology companies seeking to maximize their
productivity and revenue potential
Contact:
[email protected]
www.productinnovator.com
+353 1 8875183 / +353.87.2479057