Internationalizing OpenClinica

Download Report

Transcript Internationalizing OpenClinica

Localizing OpenClinica

Hiroaki Honshuku: SQA 1

© What is Character Encoding?

 Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard Code for Information Interchange   Characters, Numerals, Symbols, Control Characters 7-bit: 0~127  0x41 = letter ‘A’, 0x61 = letter ‘a’  ISO-8859-n  8-bit: 0-255    iso-8859-1: Latin-1, covers most of European Language iso-8859-5: Cyrillic alphabet No CJK (Chinese, Japanese, Korean) support 2

© What is Character Encoding (cont.)  iso-8859-1 versus iso-8859-5 iso-8859-1 A B 0x65 0x66 iso-8859-5 A B 0x176 0x178 3

© What is Character Encoding (cont.)  iso-8859-1 versus iso-8859-5 iso-8859-1 A B 0x65 0x66 iso-8859-5 A B 0x176 0x178  CJK Encoding Mess    Chinese: Big5 (Traditional), GB18030 (Simplified) Japanese: iso-2022-JP, EUC-JP, Shift-JIS Korean: EUC-KR, KS C 5861 4

© What is Character Encoding (cont.)  iso-8859-1 versus iso-8859-5 iso-8859-1 A B 0x65 0x66 iso-8859-5 A B 0x176 0x178  CJK Encoding Mess    Chinese: Big5 (Traditional), GB18030 (Simplified) Japanese: iso-2022-JP, EUC-JP, Shift-JIS Korean: EUC-KR, KS C 5861  Windows propriety Encoding  CP1252, CP932, etc 5

© Unicode  1887: Apple + Xerox  1991: Unicode Consortium 6

© Unicode  1887: Apple + Xerox  1991: Unicode Consortium 

UTF-8

: 1,112,064 Code Points  Standard    ASCII Compatible Unix, Linux, Mac OS Big Endian 7

© Unicode  1887: Apple + Xerox  1991: Unicode Consortium 

UTF-8

: 1,112,064 Code Points  Standard    ASCII Compatible Unix, Linux, Mac OS Big Endian 

UTF-16

(UCS-2) : 1,112,064 Code Points  Windows Only  Little Endian: Requires BOM (Bite Order Marker) 8

© OpenClinica and i18n  i18n Support since 3.1.3

     OpenClinica i18n Work in Progress  Data Mart   Response OptionText CRF Name Discrepancy Note data passing Escaping Ctrl Chars and MS Propriety Chars  Should detect at CRF upload Hard-coded strings Missing encode declaration in some Export formats 9

© Microsoft Specific issues  Display issues on Windows  Pre-Win7, GUI was not fully UTF-8 compatible  Displayed character corruption after saving data  Viewing extracted data  Use UTF-8 compatible Text Editor  Never Copy/Paste from MSOffice 10

© Demonstration  Search Subjects and Tables  CRF and Data Entry  Discrepancy Notes  Rules  Data Import  Data Extract 11

© How to Localize  Documentation  https://docs.openclinica.com/3.1/technical documents/openclinica-and-internationalization  UTF-8 Converter  i18n strings needs to be Hex value   http://www.branah.com/unicode-converter Calendar Widget can take UTF-8 strings  Pseudo Translation  Insert one distinctive non-ASCII character   Duplicate English properties files first Search “ = “ and replace by “ = \u8a66” 12

© How to Localize (cont.) 1.

 Duplicate English properties files Exclude licensing.properties

13

© How to Localize (cont.) 1.

 Duplicate English properties files Exclude licensing.properties

2.

Rename duplicated files to your Locale NO 14

© How to Localize (cont.) 1.

 Duplicate English properties files Exclude licensing.properties

2.

Rename duplicated files to your Locale 3.

 Date Format Edit format.properties file 15

© How to Localize (cont.) 1.

 Duplicate English properties files Exclude licensing.properties

2.

Rename duplicated files to your Locale 3.

 Date Format Edit format.properties file 4.

  Translate per GUI page Avoids possible legacy strings Use Text Editor that supports global search 16

© Thank You!

17