Multilingual Support of WWW Applications in Ukraine and

Download Report

Transcript Multilingual Support of WWW Applications in Ukraine and

Testing multilingual support in
Mail User Agents
TERENA Pilot Project
Yuri Demchenko, TERENA
<[email protected]>
TNC’98 Dresden October 5-8, 1998
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_1
àç
TERENA Pilot Project on Testing
Multilingual MUAs
• Officially started in April 1998 till September 1998
• The project objectives can be described as:
– Develop benchmarking methodology for Multilingual MUAs, and specify
templates for collecting the results in a coherent way.
– Design a set of composite multilingual test messages
– Configure each MUA for all supported national character sets and send the
test messages to other MUAs and to themselves.
– Compile the results, analyzing how the MUA composes, sends, receives
and displays the test messages.
– Prepare recommendations for users - correct setup and operation of popular
multilingual MUAs
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_2
áóêè
The list of mail clients to be tested
• Derived from TERENA MUAs usage statistics based on analysis
of more than 3000 messages from TERENA Mail archives
collected during the period August 1997 - March 1998
Microsoft Windows (NT, 3.11, 95)
•Microsoft Outlook Express
•Netscape Mail 3.x and 4.x
•Netscape Messenger
•Qualcomm Eudora 3.0 and 4.0 beta
•Pegasus Mail
•The Bat!
•ESYS Simeon
•Alis Tango Mailer
1998. Yuri Demchenko. TNC'98, Dresden.
UNIX Terminal
•Elm
•MH
•Pine
UNIX GUI (with X11R6)
ML MUA Testing - TERENA Pilot Project
•Netscape Mail
•EXMH
•Z-Mail
ML_MUA_3
âåäè
Activity and Projects in i18n and
Multilingual Support
•
•
•
•
i18n activity (ISO, IETF, ECMA, TERENA, Unicode Consortium)
CEN/TC304 works on European character sets and keyboard
MAITS project
Internet Mail Consortium - Report on using International
Characters in Internet Mail
• Terena Pilot Project on Testing Multilingual support in MUAs
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_4
ãëàãîë
Internet Mail Consortium - i18n Report
Summary of recommendations
1. Explicit charset parameter
2. Sending UTF-8
3. Displaying UTF-8
4. Choosing charsets on creation
5. Specifying languages
6. Multi-language text
7. Non-ASCII headers
8. Handling all common charset
9. MTAs and 8-bit content
1998. Yuri Demchenko. TNC'98, Dresden.
Report strongly recommends
that all mail-creating and maildisplaying programs created or
revised after January 1, 1999,
must be able to create and
display mail using UTF-8 and
have ability to handle all
common charsets in addition
to UTF-8
ML MUA Testing - TERENA Pilot Project
ML_MUA_5
äîáðî
Standard on i18n and Character Sets
Technologies
• ISO standards
– ISO 2022 Character Set Concept and Terminology
– ISO 8859-x Character Sets
– ISO Standards on APIs i18n and FDCC
• Unicode standards
• RFC 2277 IETF Policy on Character Sets and Languages
• Recommendation of IAB Workshop on character sets technology
(RFC 2130)
• MIME format of messages (Using MIME in Internet Mail) RFC
2045-RFC 2049
• RFC 822 - Syntax of electronic messages format according
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_6
åñòü
Standards in i18n and Multilingual
Support in Internet Mail
• RFC 2045 - RFC 2049, RFC 2231 - MIME
– Coded Character Set
– Character Encoding Scheme specified by the Charset parameter to the
Content-Type header field
– Transfer Encoding Syntax like Base64, QP specified by the ContentTransfer-Encoding header field
• RFC 2277 - IETF Policy on Character Sets and Languages
– main definitions and requirement for language tagging
• RFC 2130 - Recommendation of IAB Workshop on character sets
technology
– framework for interoperability between the many characters in use
– an architecture model for on-the-wire transmission of text
– recommendations for tagging transmitted (and stored) text
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_7
æèâåòå
RFC 2130 Architecture model
• User interface issues (OS, GUI, API)
–
–
–
–
Layout
Culture
Locale
Language
• On-the-wire
– The Coded Character
– The Character Encoding Scheme
– The Transfer Encoding Syntax
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_8
çåëî
The testing and the evaluation scheme
Compose Settings
(Font (A, S, B, Q),
Mapping)
Message
Composer
Set of ML
Test Messages
Send Settings
(MIME (QP, Base64),
uuencode)
Change Settings
(Language/Encoding)
MTA
MUA
Compose Message
(Type, Cut&Paste, Reply,
Forward, Attachment)
Sending Message
Message Editor
Message Sender
OS Environment (Language, KBD, TTFs, l10n, etc.)
Read Settings
(Font (A, S, B, Q),
Mapping)
Message
Reader
(Human, User)
Receiving Settings
(MIME (QP, Base64),
uuencode)
Change Settings
(Language/Encoding)
MTA
MUA
Read Message
(Replied Msg, Forwarded
Msg, Attachment)
Message Reader
Receiving Message
Message Receiver
OS Environment (Language, KBD, TTFs, l10n, etc.)
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_9
çåìëÿ
Testing of Multilingual support in MUAs
• Includes the following phases:
–
–
–
–
Evaluation of Multilingual features/settings of MUAs
Testing Message Reading procedure
Testing Message Composing procedure
Testing Message Sending and Receiving procedure
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_10
èæå
Evaluation of Multilingual
features/settings of MUAs
• READ operation mode
– choose Language/Encoding
– choose Fonts (Optional for Address, Subject, Message Body, Quoted Text)
• Optional - Font mapping
• COMPOSE operation mode
– choose Language/Encoding Settings
• Optional - Possibility to switch Language/Encoding during composition/typing
– choose Fonts (Optional for Address, Subject, Message Body, Quoted Text)
• Optional - choose Spelling/Language/Dictionary
• SEND operation mode
– set MIME encoding (Quoted Printable, Base64)
• Optional - select/disable Uuencode mode (non standard)
– Allow/disallow 8-bit in Header Fields
– select/disable HTML in body parts
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_11
è
Message Reading procedure
• Multilingual MUAs should support the following features:
– Reading/Displaying non-ASCII characters in Message Body
– Reading/Displaying non-ASCII characters in Message Header (Address,
Subject Lines)
– Reading Forwarded Message with non-ASCII characters in Address,
Subject, Message Body, using the same or different MIME character set
attributes
– Reading Attached non-ASCII Text File (Document)
• Possible problems are detected comparing the original and the
delivered test messages appearance
– This includes the evaluation of the MUAs correct/incorrect
processing of the MIME attributes of the test message.
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_12
êàêî
Message Composing procedure
• Message composition operations to be tested
–
–
–
–
–
–
Typing message from keyboard
Copy and Paste operations
Text/File attachments
Quoted text/message
Edit different parts of message
Charset/Encoding processing by Message Composer/Editor
• Real Message composition also includes operations like:
–
–
–
–
–
Typing non-ASCII text in Message Body and Message Header
Pasting non-ASCII-Text into Body and Header fields
Reply to message with non-ASCII Text
Forward message with non-ASCII content
Attach text documents containing non-ASCII characters
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_13
ëþäè
Test messages set
Each test is performed in at least 2 character sets, one of which is US ASCII (or
ISO 8859-1), and the other with characters that are not part of US-ASCII or
ISO 8859-1.
• Mandatory
– tmsg1 - Message with non-ASCII characters/text in the Subject line
– tmsg2 - Message with non-ASCII characters/text in Mail Address freeform name
– tmsg3 - Message with non-ASCII characters/text in the Message Body text
(single part)
– tmsg4 - Message with non-ASCII characters/text in text/plain attachment
• Optionally
– tmsg6* - Message with UTF-7/UTF-8 Character set in
Message Body and Header (optional)
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_14
ìûñëåòå
Testing program map
test-1
display
test-2
print
test-3
reply to
tmsg12
test-4
reply to
tmsg3
test-5
test-6
test-7
reply to
forward all type kbd
tmsg3 Cut&
Paste
test-8
test-9
exch tmsg5 test-1-5
tmsg6
tmsg1
non-ASCII
Subject
tmsg2
non-ASCII
Address
tmsg3
non-ASCII
Body
tmsg4
non-ASCII
Attachment
tmsg5
non-Latin1
default
tmsg6
UTF8 in
Body, Header
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_15
íàø
Testing Methodology The tests to be performed
• test-1 - Receive all 4 test messages tmsg1-tmsg4 and display them correctly
(Change Language/Alphabet/Encoding Options if needed)
• test-2 - Print all 4 messages tmsg1-tmsg4 to the standard printer
• test-3 - Reply to messages tmsg1 and tmsg2, and check that information is
returned in the same character set as it arrived in
• test-4 - Reply to message tmsg3 using "reply including quote of body"
• test-5 - Reply to message tmsg3 using the environment's "cut and paste"
function to insert the non-ASCII characters into the outgoing message
• test-6 - Forward all 4 messages to the originator address
• test-7 - Generate, as completely as possible, the same messages from the
keyboard of the IUT
• test-8* - Check possible text distortion when exchanging by
tmsg1-2-3 with non-ASCII Default Language/Alphabet/Encoding
• test-9* - Provide tests 1-5 for message tmsg6* with UTF-7/UTF-8
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_16
îí
Testing Results Presentation
MS Outlook Express 97 for Windows 95 URL: http://www.microsoft.com/outlook/
Language/ Encoding
Setting
Central European
(ISO, Windows)
Cyrillic (ISO,
Windows, KOI8-R,
KOI8-RU)
……
Universal Alphabet
(UTF-7, UTF-8)
Examination:
non-ASCII text
(8-bit)
Send/Receive/
Attachment
Support of non-ASCII
text in RFC 822
message parts/fields
(As is, MIME
(QP, Base64),
UTF7/UTF8,
HTML)
Body
Subj
ect
As is
MIME (QP, Base64)
+
+
UTF7/UTF8
HTML
HTML
(Multipart/Alterna
tive)
1998. Yuri Demchenko. TNC'98, Dresden.
Addr
ess
Freeform
+
Testing:
Support of non-ASCII text
Read
Compose
Type Paste
+**
+*5
+***
+*6
+
Send
+*4
ML MUA Testing - TERENA Pilot Project
Notes
Problems
Recommendations
Forw
ard
mess
age
+
Attac
hed
text
Mess
ages
List
+
+*5
** You can’t change
encoding for Cyrillic
text when reading
message
ML_MUA_17
ïîêîé
ML MUAs Testing Results and Data
Analysis
• Testing results are documented and presented at
– http://park.kiev.ua/multiling/ml-mua/prjdocs/mlmua-repv1.html
• Standards overview on Internationalisation and Multilinguality
– http://park.kiev.ua/multiling/ml-mua/mldoc-review.html
• Test messages constructor pilot version
– http://park.kiev.ua/multiling/ml-mua/testcon.html
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_18
ðöû
Evaluation of ML MUAs
• First group - includes MUAs that support multiple
languages/alphabets by means of multiple charsets support and
use internal language/charset transformation
• Microsoft Outlook Express
– Netscape Messenger 4.04 and previous product Netscape Mail 3
– exmh for X Windows
• Second group - provides ML support by selecting proper font for
creating and displaying messages
– Eudora Pro 3.0
– Pegasus
– Forte Agent
– The Bat!
– Simeon
1998. Yuri Demchenko. TNC'98, Dresden.
UNIX Terminal Products
– pine
– elm
ML MUA Testing - TERENA Pilot Project
ML_MUA_19
ñëîâî
First group - Full Multilingual Support
• Microsoft Outlook Express
– has the best and richest multilingual support
– use effective internal conversion scheme that is good controlled by users
via setup and Alphabet/Charset selection menu
• Netscape Messenger 4.04 and Netscape Mail 3.04
– provide rich multilingual support for many charsets/encodings
– but are very inflexible for Languages that have many charsets in use (F.E.,
Cyrillic Windows CP-1251 and KOI8-R/U for Russian/Ukrainian, or ISO
8859-2 and Windows CP-1250 for Central European Languages
– Netscape products for X Windows - the same features.
• exmh for X Windows
– provides good support for main groups of European languages
using Latin 1, Latin 2 Cyrillic charsets
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_20
òâåðäî
Second group – Simplified Multilingual
Support
• Popular in Latin1 (ISO 8859-1) and English speaking community
• Languages and charsets/encodings support is provided by
selecting proper font for creating and displaying messages.
–
–
–
–
Eudora Pro 3.0
Pegasus
Forte Agent
The Bat! – provide simple conversion between Cyrillic encodings (ISO
8859-5, Windows CP-1251, KOI8-R)
– Simeon
– pine and elm for UNIX
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_21
óê
Common problems of multilingual
support in MUAs
• Conversion between different Encodings/Charsets for the same
language
• Correct processing of MIME tags in message Header fields
(Subject and Address lines) during displaying when charset name
in header is different from Message Body
• The same problems occur when user tries to change
Charset/Encoding when displaying or composing message, or use
Copy&Paste operations for different Charsets
• View message source code and/or message info (charset/encoding
for the Header and Body, Multipart MIME structure, so on)
• Using common and correct terminology
for language/charset settings in MUAs
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_22
ôåðòü
Project’s Main Results
• The international environment of the project allowed to discover
the main problems in multilingual MUAs support
• Multilingual test messages set
• Evaluation scheme for the forthcoming ML MUAs
• Project activity was conducted in coordination with other
multilingual related projects:
– IMC MAIL-I18N report on Internationalization and Character Set
technologies
– Mozilla i18n project (Netscape 5.0)
• PT members have contributed to the new Ukrainian Language enabled Mozilla
• proposed model of multilingual support in MUAs was discussed
– ESYS Simeon IMAP Mail multilingual features testing
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_23
õåð
Follow-on Projects and activity
• Testing new products using proposed methodology
– New releases of OutLook Express 98, Netscape Messenger 4.5 and 5.0
– New products of 1999 that is expected will implement recommendations of
IETF/IMC
• Another areas of further activity
– Establishing ML/i18n supporting Charsets repository for online support of
Multilingual mail (mapping reference tables download, translation,
configuration, etc.)
– Creating Web based ML test messages Constructor which pilot version is
demonstrated at project’s page
• http://park.kiev.ua/multiling/ml-mua/testcon.html
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_24
Test Messages Constructor
îò
http://park.kiev.ua/multiling/ml-mua/testcon.html
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_25
Test Messages Constructor öû
Creating test message
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_26
÷åðâü
Project Team
Yuri Demchenko, TERENA
Konstantin Chuguev, Ural Technical University, Russia
Janja Faganel, Jozef Stefan Institute, Slovenia
Vadim Shevchenko, Kiev Polytechnic Institute
Alexey Medvedev, Kiev Polytechnic Institute
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_27
øòà
Acknowledgments
• Borka Jerman-Blazic, Jozef Stefan Institute, Slovenia
• Claudio Allocchio, Sincrotrone Trieste & INFN Trieste, Italy
• Peter Heijmens Visser from TERENA for provided MUAs usage
statistics
• Harald T. Alvestrand, Maxware Norway
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_28
åð
IMPORTANT NOTE
Multilingual page will be moved and supported at TERENA
webserver
http://www.terena.nl/multiling/
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_29
åðû
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_30
åðü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_31
ÿòü
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_32
þ
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_33
èà
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_34
þñ
ìàëûé
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_35
þñ
áîëüøîé
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_36
êñè
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_37
ïñè
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_38
ôèòà
Russian/Ukrainian Languages
Historical overview
• VI-XI cent. - Ancient Rus written language
• X-XIV cent. - Cyrillic written language
– Invented by Cyrill and Methody (Saloniki) in IX cent
– First introduced in Moravia with advent of Christianity
– Introduced in Kiev Rus with advent of Christianity in X cent.
• XIV-XVII - Forming Russian literature language
– With Forming Moscow State after Mongol higo
• XVII - Developing modern Russian literature language
– Lomonosov, Puskin
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_39
èæèöà
Ukrainian Literature Language
• Common ancient roots with Russian and all Slavic languages
• Was influenced by centuries of conquerors’ languages
– features of analytical language (as English)
• 1818 - Published Gramatics of Ukrainian (malorussian) dialect
– introduced “ukr. i”, “¥´” (for “kg” sounds), spelling of “äç”, “äæ”
– Forming modern Ukrainian literature language (Taras Shevchenko)
• 1921 - Published “Main rules of Ukrainian orthography”
• 1984 - introduction of new/lost ukr. letter “¥´”
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_40
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_41
1998. Yuri Demchenko. TNC'98, Dresden.
ML MUA Testing - TERENA Pilot Project
ML_MUA_42