Internationalization

Download Report

Transcript Internationalization

Internationalization
An Introduction
Part II: Enabling
License
This presentation and its associated materials licensed under a
Creative Commons Attribution-Noncommercial-No Derivative
Works 2.5 License.
You may use these materials without obtaining permission from the
author. Any materials used or redistributed must contain this notice.
[Derivative works may be permitted with permission of the author.]
This work is copyright © 2008-2011 by Addison P. Phillips
Who is this guy?
• Globalization Architect, Lab126
We make the technology behind the Kindle
• Chair,
W3C Internationalization WG
Internationalization is:
• the design and development of a product that
is enabled for target audiences that vary in
culture, region, or language. [W3C]
• a fundamental architectural approach to
software development
Related Concepts
• Localization: creation of a product tailored to
a particular target market
• Translation: process of converting text from
one language to another
• Globalization: unified approach to creating
global products, especially those that support
multiple geographies simultaneously
Mystic Numbering (M4C N7G)
Opinions differ on
capitalization (C12N);
choose from:
 i18N
 I18n
 I18n
 I18N
Very geeky; not very
internationalized
(I19G?)
II N1 T2 E3 R4 N
ATI O NALI ZATI O N
5 6 7 8 9 10 11 12 13 14 15 16 17 18 N
I18N
Localization
Globalization
Canonicalization
Accessibility
=
=
=
=
L10N
G11N
C14N
A12Y
The Internationalization Approach
•
•
•
•
•
•
Gather requirements globally
Enable
Externalize
Customize
Test and support globally
Localize
The Internationalization Approach
• Enabling—the same code supports multiple regions or
cultures. Sometimes called a “global binary”.
• Externalization—plan for localizability by separating
“content” from code. This makes localization for specific
languages, regions, or cultures easy, fast, and cheap.
• Customization—add culturally specific functionality,
presentation, or content to an application.
A Global Approach
• Internationalization turns technical problems
into business decisions
• Balance priorities based on real user
distribution/requirements
– Consider global user population as a whole
– Consider specific market requirements on an equal
footing
– Potential markets for the product
Internationalization Myths
We
We
We
We
need
need
need
need
•
•
•
•
•
•
•
•
•
•
special experts.
an extra development cycle.
six more months to build it.
people who speak (language).
We (wrote it in Java/C#, used Unicode, etc.), so
it is internationalized.
We made the assumption that the product
would only ever have English screens: all our
users understand it anyway.
A localized product is internationalized.
An internationalized product is slow/slower.
It takes longer to write internationalized code.
We can’t read the screens/it is too hard to test.
We have no intention of localizing, so no need
to internationalize.
We don’t have any customers there.
The users in (some country) never complained,
so it must work.
This product is 100% fully internationalized.
Internationalization Truths:
“Well, it depends…”
• Generalize designs
– Locale independent data structures
– Locale sensitive display
• Externalize cultural or linguistic variations
• Customize as a last resort
Buy In: The Key to Success
• For internationalization to be a success over
time, there must be commitment:
– Management
– Product Team
– Development Team
• All developers, not a splinter group
Addressable
Market:
Why Do
Internationalization?
Globalized Product Development
Internationalization turns technical problems into
business decisions.
– Localization: Choose which markets to translate user
interface or documentation for with no engineering.
– Deployment : Choose whether to serve applications from
a single site, cluster of sites, or in each target market.
– Development : Add content and features to products as
necessary in each target market.
– Integration and Interoperability: Servers and products
can work together around the world, so customers can
truly create “Enterprise” solutions.
Development Methodologies
 Independent of
development methodology
Develop
Requirements
(all
customers)
Develop
Roadmap
(global
deployment)
 Agile? Waterfall? You make
the choice.
 Encompasses the full
development cycle:





Design
Development
QC
Release
Support
Develop
Requirements
& Architecture
RTM/GA
(by market)
Test
(nonEnglish/nonASCII)
Design
(internationali
zed)
Code
(Enable,
externalize,
customizable)
The Customization Approach
• Let’s do it in a separate release.
• Let’s make a branch for the international
customers.
• Let’s get a special team of people to work on
the international release.
How That Model Really Looks
bug fixes
1.0
sexy new features
1.0a
2.0 Main Line
International Branch
functionality
gaps: intl
users waiting
for 2.0i now
Merges and Fixes
Lots more people
and cost
1.0i
Lost $ and opportunity
lots of cost to get there
Time
International
Release 1.0
The Problem with Customization











Code forks. (double, triple coding)
Lag time for international releases.
Non-adoption of localized release.
Full regression of every language.
Quality or commitment perception.
Lack of data exchange between language versions.
Difficult to repeat (every version is a repeat)
Proliferation of bugs and of support problems.
International features are cancelled.
Core product still doesn’t work/can’t address similar markets.
Loss of market share.
Large Animal Pictures
ANALYZING AND DEVELOPING A
DESIGN
The Problem
dates
Your Application
numbers
images
colors
addresses
local rules
strings
local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers
The Solution
Locale-independent
global binary
Locale-dependent
resources
(includes code)
Large Animal Pictures
Resources
Input
Global Code
Software
Component
I/O
Output
Enterprise Animal Pictures
Your System
Convert to
Legacy
Unicode
Interface
Unicode Cloud
API
Front End
Detect / Convert
Legacy
Encoding
Unicode
Capture
Encoding
Detect / Convert
Business Logic
Data Store
Partner/
Content Provider
Operating Env.
Internationalization Issues
•
Text Processing
– Character encodings, including Unicode, spelling, word breaks, collation,
and so on
•
Language
– Of the software (localization)
– Of solutions built using the software (localizability, data)
•
Locale-affected formats
– dates, numbers and the like
•
Regionally-affected formats
– names, addresses, currency, and the like
•
Time-related issues
– time zone, calendar, holidays, work rules and the like
•
Cultural adaptation
– presentation, style, position, color use, and the like
•
Legal requirements
– accessibility, SOX, DRM, moderation, security, content, and the like
Levels of Enablement
• Not Enabled
• Single-Language-at-a-Time (SLAAT)
All components run in the same language and
encoding environment correctly.
• Multi-Locale
Unicode support; components run in different
locales, languages, encodings, and time zones
Test Your Assumptions
Gender:
 Male
× Female
Choose Your Language
How is this company doing?
Making Code Aware of Culture
ENABLING
What is “enabling”?
• Enabled software:
adapts the display, processing, validation, storage,
and transmission of data according to the cultural,
linguistic, and regional needs of the users
– Text, Characters, and Encodings
– Locale Awareness
– Times and Time Zones
A “global binary” is a single
object-code version that is
used in all markets,
regardless of localization.
Don’t Code What You Think You Know
5/2/7
sometime in February?
sometime in May?
sometime in 2005?
1.234
more than 1000?
less than 2?
4.32.MD
number, time, currency?
morning or afternoon?
Date Formats
Culture
Format
Example
U. S. A.
mdy, /
2/16/05
France
dmy, .
16.2.05
France
dmy, -
16-2-05
CJKT
ymd, /
2005/2/16
CJKT
ymd,年月日
2005年2月16日
Japan
e¥md,
平成17年2月16日
Japan
¥md, /
17/2/16
Time Formats
•
•
•
•
•
•
U.S.A.:
France:
Japan:
Japan:
Korea:
Thai:
4:00 p.m.
16.00
1600
ごご4:00
오후 4:32
16:32 น.
• Albanian:
• Arabic:
4.32.MD
04:32 ‫م‬
More Examples
Assumptions about date tokens:
USA:
French:
Sun, Mon, Tue
lun. mar. mer.
Russian:
USA:
French:
Пн Вв Ср
Jan, Feb, Mar
janv. févr. mars avr.
Spanish (Spain):
Spanish (Americas):
ene, feb, mar
Ene, Feb, Mar
3 positions, titlecase
four positions
lowercase
two positions, Cyrillic
3 positions, titlecase
variable (4 or 5)
positions, lowercase
not titlecase
titlecase
Calendars: What Year Is It?
• Legal, ceremonial, or popular requirement
Gregorian
Japan Emperor:
Thailand (Buddhist):
Chinese (traditional):
Hebrew
Hijri (Islamic)
Armenian
etc. etc. etc.
2012
24 Heisei (平成24年 )
2555 (Gregorian + 543)
4704 (lunar)
5767 ‫(תשסו‬lunar)
1428 (lunar)
1461 (ԹՎ ՌՆԾԶ )
Weekends and Holidays
• When is the weekend?
– Friday is part of the weekend in some countries.
• Both official and unofficial holidays vary widely in number. Here are a
few to watch for:
– USA:
–
–
–
–
–
Japan:
China:
Britain:
France:
Spain:
July 4, MLK, President’s Day, Veteran’s Day, Flag Day,
Columbus Day, Thanksgiving…
Golden Week
New Year’s
Guy Fawke’s Day, Boxing Day
Bastille Day
Reyes Magos
Calendar Display
Numbers
Grouping and decimal separators:
England:
Germany:
Switzerland:
Swiss money:
France:
India:
12,345.67
12.345,67
12’345,67
12’345.67
12 345,67
12,34,567.89
France uses a non-breaking space!
India: number of digits in groupings changes!
Lists
List delimiters & separators can conflict
French example:
2 345,67, 1 012,34, 45,67
hard to read
List
List myNumberList
myNumberList == getList();
getList();
NumberFormat
NumberFormat nf
nf == NumberFormat.getInstance();
NumberFormat.getInstance();
StringBuffer
StringBuffer buf
buf == new
new StringBuffer();
StringBuffer();
Iterator
Iterator iter
iter == myNumberList.listIterator();
myNumberList.listIterator();
while
while (iter.hasNext())
(iter.hasNext()) {{
buf.append(nf.format(((Number)iter.next()).doubleValue());
buf.append(nf.format(((Number)iter.next()).doubleValue());
buf.append(“,
buf.append(“, “);
“);
}}
System.out.println(buf.toString());
System.out.println(buf.toString());
2 345,67 ; 1 012,34 ; 45,67
easier to read
Collation
( A
F A N C Y
W O R D
F O R
“ S O R T I N G ” )
English:
ABC...RSTUVWXYZ
German:
AÄB...NOÖ...SßTUÜV…YZ
Swedish/Finnish:
Norwegian:
AB...STUVWXYZÅÄÖ
AB...VWXYÜZÆØÅ
Organizing Information
• “Alphabet” differences
• Additional information
– for example: yomi
• ASCII vs. the world
• Mixed information sets
“Should I be writing all of this down…”
• Wide range of
variation
• Obscure formats
• Difficult to obtain
reliable information
on formats
• Lots of work to
implement and
maintain
Enabling means not
having to know
(m)any of the details
Supporting International Formats
• Use neutral data
structures
– Makes code
independent of locale
– Most data types are
locale-neutral:
• Boolean
• String, char
• Number classes
• Date, Calendar
• Encapsulate
formatting/validation
in a function
– Format style chosen
dynamically at runtime
– Format details don’t
have to be specified or
researched
– APIs know the gory
details
Essence of Enabling
• Object to Presentation, Presentation to Object
–
–
–
–
–
–
–
–
–
–
Integers
Floats
Percents
Currencies
Dates
Locale
Times
Durations
Collation (lists)
Weights/measures/sizes
Resources (user interface strings)
user
presentation
Locale
• an identifier or data structure that allows
programmers to access culturally and
linguistically affected functionality in a system.
• Many systems now based on IETF BCP 47; for example
JavaScript, Java 7, and CLDR
Complex Types
• Data structures, APIs, or classes built from basic types must
include similar capabilities.
– Store data in a locale-neutral or independent format.
– Display in a language/regional/culturally sensitive manner
– Convert from locale format to locale-neutral or locale-independent
storage format.
Design Time and Data Structures
• Identify your own “locale bias”
– Field names matter!
• “Postal Code”, not “ZIP code”.
• Family Name/Given Name, not First Name/Last Name
– Avoid problematic fields
• Postal address parsing? Area code? Etc.
Currency
• Currency formatting is
usually similar to number
formatting. But things can
vary widely here, too:
–
–
–
–
$1,100.00 [USA]
€1 100,00 [France-Euro]
¥1,100 [Japan]
1.100$00 Esc. [Portugal,
obsolete]
– SFr. 1’000.00 [Switzerland]
•
Currency associated with the
locale doesn’t always apply.
Store the currency type with
value.
– Use ISO 4217 std. codes (USD,
JPY, EUR, RUR)
•
•
•
Not always one symbol.
Not always two decimal places.
$100 + ¥100 = $101
•
Consider neutral displays!
Being Locale Neutral
• Avoid or reduce locale-affected display to
increase portability
– Use unambiguous formats, such as ISO 8601like dates, especially in log files and the like
• 2005-04-01 14:17:00 UTC
– Use consistent formats (‘user locale’),
especially in columns or collections of data
Amount
351,234.56
102,556.78
65,336.00
212,345.00
Currency
USD
EUR
JPY
INR
Amount
351,234.56
102 556,78
65336
2,12,345.00
Currency
USD
EUR
JPY
INR
“The String is the Thing”
• Text doesn’t get translated on the fly.
• Don’t use text as an identifier or foreign key.
– Use ID Numbers or not-human-readable values instead of requiring text
fields to match.
– “Intrinsic” data value versus “display” data value.
• Enumerated values displayed as strings.
• Use display strings.
Enumerated
ACCOUNTS_PAYABLE
Displayed
“Accounts Payable”
“pagável de clientes”
English-like Construction
• Concatenation
– string1 + string2
• Pluralization
– Dog + “s” = “dogs”
This topic will be covered in
greater depth in the section on
localization.
Databases
• Most databases can only handle one collation sequence per instance
or one collation per index.
– Remove reliance on alphalists.
– Self-collate short lists.
– Pre-collate long lists?
• Example: NLS_SORT controls the way Oracle returns data (collation
sequence).
– Global environment variable.
– Not necessarily under your control.
– Indices are built on a predetermined or binary sort.
Enabling Summary
• Understand Encodings and Unicode
– All text has an encoding!
• Be Locale-Aware
– Create locale-neutral data structures
– Separate display from storage
Dates, Times, Durations, Calendars
a little aside…
IT’S ABOUT TIME
Observed Time
Incremental Time
• Computed time based on “clock ticks” in an
“epoch”
– The epochal date is arbitrary. The UNIX epoch is
midnight, January 1, 1970, UTC.
Field Based Time
• Time based on calendric fields (day, month,
year, hour, minute, second)
• Some systems have data types for “field
based” time also.
What is a Time Zone
• A time zone is a geographical region or area
that has common rules for determining the
local observed time as it relates to monotonic
(computer) time.
• Distinctions include:
– Offset from UTC
– Daylight Savings (Summer Time) behavior
– Historic changes in offset or DST behavior
– Political control
Durations and Repeating Events
Wall-time:
this meeting is at 2 PM Pacific
time every Tuesday
– interval between meetings
may vary in number of
seconds
• Daylight time transitions
• Changes in DST rules
Fixed-duration:
run the virus scanner every
57 minutes
– interval is always 342000
milliseconds
Time Zone Affected Scenarios
• Zone independent
– only “incremental” times
are necessary
• Local time, past only
– future changes to time
zone rules not applicable
– example: logging system
• Local time, both past and
future
– time zone rule changes
may affect some time
values
– example: calendar
program
• Floating times
– events not tied to a specific
time zone
– example: birthdate, start date,
definition of “night” for phone
usage
• Recurring events
– events that recur—sometimes
during and sometimes not
during daylight savings.
– example: weekly status
meeting
Time Zone Scenarios
• Zone Independent — generally timestamps
that don’t refer to a specific time zone.
– Record local offset or (better) use UTC
– May want wall time for analysis
Time Zone Scenarios
• Local Time (Past Only)—
• Local Time (Past+Future)
times that cannot change
their relationship to DST
— time values may need to
change if DST rules change
– Store zone ID and time value
[may store offset instead of
zone ID]
– Store original offset along
with zone ID and time value
– May require a database crawl
if DST rules change
Time Zone Scenarios
• Floating Times — times that don’t change
regardless of where you are in the world.
– Publication dates
– Birth dates (or any anniversary date)
– Etc.
• Handle using UTC and
avoiding zone casting
Time Zone Scenarios
• Recurring Events — time values that occur in
both DST and non-DST time
– Store time, recurrence period, zone ID, original
offset, and whether to tie recurrence to DST
Time Zone Identifiers
• Often based on the IANA time zone database
(tzinfo) [formerly “Olson IDs”]
Offset
Etc/UTC
Etc/GMT+1
Continent/Region/City
America/Indiana/
Indianapolis
Ocean/Island(City)
Atlantic/Canary
Pacific/Auckland
Pacific/Pago_Pago
Continent/City
America/Los_Angeles
Europe/Paris
Asia/Tokyo
Antarctica/DumontD
Urville
Time Zone Hints
• Only 21 countries have more than one time
zone (if you know the country, you often know
the time zone)
•
Argentina, Australia, Brazil, Canada, Chile, Democratic Republic of the Congo, Ecuador, France, Greenland,
Indonesia, Kazakhstan, Kiribati, Mexico, Micronesia, Mongolia, New Zealand, Portugal, Russia, Spain, and
the United States.
– Of these, most have maritime or overseas regions.
Examples:
• Ecuador: Galapagos Islands
• Chile: Easter Island
• Portugal: Azores
Locale-Neutral Formats
• Use locale-neutral formats for interchange:
– ISO 8601
– Incremental time values (e.g. time_t)
– Distinguish time zone if necessary for
interpretation
• Offset is not the same as time zone
SQL data types and XML
formats are often field-based,
while programming languages
are usually incremental.
At any given time, in UTC, it
is the same time
everywhere that time is
measured.
Formatting Dates and Times
October 10, 14H 6:05:45 AM JST
Requires more than
just a locale!
 date
 time zone
 calendar
value being
formatted
defines relation to
“wall time”
defines rules for
calculating field
values
1034197545321L
Asia/Tokyo
Japanese Imperial
Externalization
Making software localizable
What is localization?
“What is localization?” Zula asked.
Peter sighed, letting her know it was a stupid
question.
“Translating foreign software into Hungarian,
making things work correctly in the special
environment of Hungary,” Csongor explained, and
Zula thought that she could glimpse, here, in the
way that he contentedly explained things, Csongor’s
father the school-teacher.
Reamde by Neil Stephenson
What is Localization?
• The process of tailoring a product to a specific
target market.
– Translation of messages
– Adaptation to local preferences
– Addition (or subtraction) of content or features
Localization is Obvious
… but it isn’t “internationalization”
• Localizability is internationalization.
– Externalize text
– Externalize presentation
– Dynamic composition
– Distribution of language content
– “Plug-in” features
What is a ‘Resource’?
any application component loaded
dynamically at runtime, rather than
compiled into the application
In localization: source code files containing
language, region, or culturally-affected materials
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
Text
Error messages
Icons
Pictures
Fonts
Colors
Graphics
Sizes
Positions
Magic Numbers
Mnemonics (“Alt+G”,
“F4”, etc.)
File Locations
Dictionaries
Glossaries
Grammar Rules
Code
Why Resources?
Before
Text
Error messages
Icons
Pictures
Fonts
Colors
Graphics
Sizes
Positions
Magic Numbers
Mnemonics
Dictionaries
Glossaries
Grammar Rules
Culturally specific code
After
Avoiding Forks
English Version
Global Binary
Resources
Language +1 Version
Resources
Resources
Resources
Forked Code Woes
•
•
•
•
•
•
Hard to fix and maintain
Different versions in the field
Delays in releasing localized product
Different functionality by region
Confusing for customers/users
Versions are not interoperable and might not
be able to exchange data!
More Benefits
•
•
•
•
•
Rename or re-brand product
Fix spelling or grammar mistakes
Fix usability
Make terminology consistent
Test drive new customer experiences, try new
designs, etc.
… all without a rebuild!
"Project-Id-Version: blanket 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2011-03-23 15:43-0700\n"
"PO-Revision-Date: 2011-03-23 15:43-0700\n"
"Last-Translator: Richard Gillam <gillam (a] lab126.com>\n"
"Language-Team: en <kindle-i18n-team (a] lab126.com>\n"
"MIME-Version: 1.0\n" 20 "Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
# font
msgid “my.font.name"
msgstr "dialog"
#: progress bar in point based
msgid "progress_bar.rect"
msgstr "43.11,64.67,172.45,12.93"
msgid "progress_bar.border"
msgstr "2"
# bounding box: x_pos,y_pos,width,height
msgid "shutdown.cust_service.header.rect"
msgstr "0,14.65,258.68,12.07"
msgid "shutdown.cust_service.header"
msgstr "Repair Needed"
What’s wrong here?
String1 = There are
String2 = no
String 3 = tables in
String4 = files
String5 = .
Messages I Could Build:
There are files.
There are no files.
There are 50 files.
There are tables in files.
There are no tables in files.
There are 50 tables in no files.
There are tables in.
Let’s Google Translate That!
Messages I Could Build:
There are files.
There are no files.
There are 50 files.
There are tables in files.
There are no tables in files.
There are 50 tables in no files.
There are tables in.
Il ya des fichiers.
Il n'y a pas les fichiers.
Il ya 50 fichiers.
Il ya des tables dans des fichiers.
Il n'ya pas de tables dans des
fichiers.
Il ya 50 tableaux dans aucun fichier.
Il ya des tables po.
Don’t Build Text From Fragments
• Text fragments are hard to translate
– Fragments may not follow grammar rules
– Cannot know which parts go together
– Parts can be reused in incompatible ways
• Internationalization APIs offer “patterns” to
fix:
[] files out of [] were deleted.
An error occurred at [] on [].
Page [] of []
Processing: []% complete.
Example: MessageFormat (Java)
• Number replacement variables.
• Provide typing and formatting information where possible.
• Externalize as a single unitary string.
There were {0} tables on {1}.
There were {0,number,integer} tables on
{1,date,short}.
{1,date}に{0,number,integer}のテーブルがあった。
What’s My Gender
“Documenti del Chris“
"Documenti della Chris”
"Documenti - Chris"
More Issues With Text Composition
– There were one errors found.
– You have earned your 22th set of bonus points.
Sentence Parts Must Agree
• Endings, Gender, Plurality, Case
– e.g. Japanese counting uses different words for
different kinds of objects
– e.g. Slavic languages use different endings for
singular, few, many…
Complex Message Formatting
There were no errors.
There was 1 error.
There were 2 errors.
0:There were no errors.
1:There was {0} error.
2:There were {0} errors.
“choice format” APIs allow for
different resources to be used
based on runtime values.
Examples:
 ordinal numbers (1st, 2nd,
3rd, 4th, etc.)
 complex messages, such as
“27 seconds ago” vs. “10
minutes ago”
0:не было ошибок
1:была {0} ошибка
2:были {0} ошибки
5:были {0} ошибок
The number of resources
may need to vary by locale
or language
Images and Icons
•
•
•
•
Avoid metaphors
Avoid cultural sensitivities
Avoid body parts
Replace as necessary
• Avoid putting text into graphics
Graphic: $20
Text: $0.06
Images and Culture
• Beware your
biases—even
“good” ones.
Meet your friends on our
new social website for
India
Isn’t it Swell?
English is very succinct.
– Words in other languages
are longer
– Sentences are longer
– Characters may be larger
More Swollen Text
• 30% in length (alphabetics, abjads, etc.)
• 30% in height (ideographics)
• But… a rule of thumb, not a “fact”
– Measure your results with care.
A Cautionary Tale
GUI Layout
Dereferencing
• Minimize sentence building
• Minimize arguments per string
• Use subject:predicate wherever possible
Don’t do this:
Your balance is $100.00.
When you can do this:
Balance: $100.00
Dynamic vs. Static Layout
•
•
•
•
Magic numbers
Externalized layouts
Mnemonics
Colors
Localizing Styles
• Bolding is not universal for emphasis
– Italicization, Capitalization, etc. are also not
universal (some scripts don’t have these
attributes)
• Use Logical not Presentational names
– Describe the function not the appearance. For
example, use “emphasis” instead of “italics”.
中国
Amikake
Wakiten
Use of Color
“Going Down”
“Going Up”
Non-Translatable Resources
• Some content should be externalized but not translated
– Sometimes referred to as “DNT” for “do not translate”
• Externalize? Yes…
– Segregate DNT material from translated material if possible (by
using separate resource files or separate resource blocks within a
file).
– Developers can’t always tell when something should or should not
be DNT… and neither can translators (context is missing)
The “Locale” in “Localization”
• Resources “fall back”
to find the best match
Global Binary
Resources
Falling back
zh-Hans-SG (Chinese, Simplified script, Singapore)
zh-Hans (Chinese, Simplified script)
zh (Chinese)
(root)
Sparse Population
• A given language resource may not contain a
complete set of resources.
– Some resource language fall back for each subresource (such as a particular value)
“appName”
“Démo”
“dialogTitle” “Bonjour monde”
“appName” “Demo”
“maxRows” 57
“dialogTitle” “Hello World”
Getting the Right Locale
Client Locale
Server Locale
client
API Request Locale
System Mgmt Locale
Front End
Business Logic
Data Store
Operating Env.
One request might serve
multiple purposes or be
seen in multiple
contexts
Resources and Translation
“key”, “display string”
“dialogTitle”, “Dialog Title”
“aMessage”, “This is a message.”
“key”, “ðìsplàÿ stríñg”
“dialogTitle”, “Ðîálòg Tïtlè”
“aMessage”, “Thìß ís â Mésßãgê.
Pseudo-Translation
Pseudotranslation
Keyboards
Input Method Editors
Some languages require software to
assemble keystrokes into characters
 Asian languages with vary large character sets
 Complex scripts with vowel-killers and other
contextual editing requirements
Applications that interact directly with keypressed events can disable or disrupt IME
input.
 On- and over-the-spot editing
Customization
When is it okay?
• Content should be highly
localized or have locale-specific
requirements:
– customization lets you address
this requirement in the most
localized possible manner
Externalization again
dates
Your Application
numbers
images
colors
addresses
local rules
etc.
local rules, regulatory requirements, postal addresses, default bookmark lists, your company’s customer service phone numbers
Externalization again
Locale-independent
global binary
Locale-dependent
resources
(includes code)
Large Animal Pictures
Resources
Input
Global Code
Software
Component
I/O
Output
Customization Examples
 Postal address
validation
 Postal code validation
 Telephone number
formatter
 “Personality” questions
 blood type vs. sun sign
Generic
Implementation
US
Implementation
 Personal name formatter
 first/last position, space,
highlighting, formality, etc.
 Tax codes and shipping
schedules
Generic API
DE
Implementation
Impl
Example: Postal Addresses
address1 varchar(32)
country
address2 varchar(32)
address1 varchar(64)
i18n
char(2)
city
varchar(16)
address2 varchar(64)
state
char(2)
city
zip
char(5)
province varchar(64)
varchar(64)
postcode varchar(64)
public interface Address {
public class genericAddress implements Address {
public class USAddress extends genericAddress {
public class UKAddress extends genericAddress {
country=US, postcode=‘WC2 1GH’
// error
country=UK, postcode=‘95111’
// error
country=DE, postcode=‘1A4喪’
// okay?
Building Global Software
Beyond Just Coding:
Localization, QA, and all that
The Internationalization Cycle
• Encompasses the full
development cycle:
–
–
–
–
–
–
Requirements
Design
Development
QC
Release
Support
Support
Issues
and Requests
(all
customers)
Develop
Roadmap
(where is the
product
going?)
Develop
Requirement
s
&
Architecture
RTM/GA
(by market)
Test
(nonEnglish/nonASCII)
Design
(internationa
lized)
Code
(Enable,
externalize,
modularize)
What is “internationalization QA”?
• Does the enabled product work correctly?
– Non-English configurations
– Non-ASCII data and encoding support
– Cross time zone support
– Market specific features or customizations
• Does localization appear correctly?
– Is the product localizable?
What makes this different
from “regular” QA?
Growing (and Pruning) the Matrix
Include non-English configurations in your test
matrix; include non-ASCII data in your tests.
Be prepared to prune the test
matrix.
What to Test With
– Test Non-English configurations
• Non-English locales (lying to your machine)
• Native configurations (when does it make sense?)
– Test Non-ASCII data
• Encodings, encodings, everywhere
• Non-ASCII character values
– Test Across Time Zones
• Two or more time zones; consider international date
line (“it’s tomorrow in Japan”) and DST issues
Planning Testing
Initially
• Get tools that are
enabled!
– Automation allows
greater coverage, but
only if it works.
• Plan encodings and
locales as part of the
test matrix.
• Acquire third-party
products as necessary.
Increasing Maturity
• Use test driven
development practices.
• Get developers to write
unit tests that are
internationalized.
• Put the ‘i18n’ bugs into
the regression suite.
Configuring Machines
Create both native and simulated environments:
– Native operating systems may have minor but
sometimes critical differences (folder names,
keywords, localized registry entries)
– Most features don’t run into native differences
(easier to work with English-localized machines)
– Don’t buy physical keyboards (use software
keyboards) unless your application relies on scan
codes from keys
Localization
Incorporate
Localization is part of the release process too.
– Changes to the user interface cost the localization
team time and money.
– (Changes to the product cost the documentation
and QA folks too)
• May need to institute change control or a UI
freeze
Simultaneous Shipment (Simship)
Ideally, to maximize opportunity, ship the target
languages the same day as the source language.
– It might not make sense for your product.
– But it might not be as difficult as you think it is. It
might even be good for you.
Distribution of Content
• How does the localized text get into the
running product?
– Satellite assemblies, DLLs, shared libraries
– Message catalogs
– Special directory
– Database
– Etc.
More Distribution
• “Specific Language”
(per-language)
• “Language Included”
(one or more languages)
• “Language Pack”
(product plus something)
English
English
German
German
French
French
English
Global Binary
+
German
French
Completing the Product
• Static content is often under source control and
can be localized “normally”
• Dynamic content may include the initial set of
data or other items which need to be localized
beyond software.
–
–
–
–
–
Demos and Demo Data
Dictionary, Language add-ons
Local offers, links to Web store, etc.
Packaging
Regulatory
Quality Checking and Development
Methodologies
• Translation is a human-oriented
task.
– Translation time lines are linear
with volume.
• Localized product should be
tested for functionality
– translation can break things
– usually the first language finds
most of the bugs
• Translations should be checked
for quality
• Development cycle has to
include time for translators and
quality assurance to catch up.
– This does not mean “no agile”
or “no changes”
– Do pilot language(s) or movingtarget translation; do better UI
design and usability reviews;
etc.
Summary
Internationalization
… is a fundamental architectural approach: it is
how software is built.
– Design
– Enabling
– Externalization
– Customization
– Testing and Support
– Lifecycle
Q&A
Would you write the code for I18N on the
whiteboard before you go?
#define UNICODE
#import I18N.h