F03 Text goes in, mxytzptlk comes out Debugging code page

Download Report

Transcript F03 Text goes in, mxytzptlk comes out Debugging code page

Exchange
2002
PROGRESS WORLDWIDE
“Text” goes in,
“mxyzptlk” comes out!
Debugging code page
problems
Tex Texin
Director, International Business
the Progress Company
‘Text” goes in,
“Mxyzptlk” comes out!
2002
Exchange
PROGRESS WORLDWIDE
Files
Text
Goes
In
mxyzptlk
mxyzptlk
Database
Server
comes
mxyzptlk
out
Exchange 2002, Chicago, IL, USA
2
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Agenda
What is mxyzptlk?
What is text to a computer?
Code pages (single-byte)
The Progress architecture for text
Code pages (multi-byte)
Problem solving
Exchange 2002, Chicago, IL, USA
3
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
What is “mxyzptlk”?
a) Garbled text (aka
“garbage”)
b) “Mojibake” in Japan
c) A magical imp that
plays tricks on
Superman
d) M6K
e) All of the above
Cartoon courtesy Warner Bros, Inc.
Mr. Mxyzptlk is a trademark of DC Comics
Exchange 2002, Chicago, IL, USA
4
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
In Japan, it’s called Mojibake
Exchange 2002, Chicago, IL, USA
5
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Agenda
What is mxyzptlk?
What is text to a computer?
Code pages (single-byte)
The Progress architecture for text
Code pages (multi-byte)
Problem solving
Exchange 2002, Chicago, IL, USA
6
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
What is text (to a computer)?
A rose by any
other name?
Exchange 2002, Chicago, IL, USA
41 20 72 6F 73 65 20 62 79
20 61 6E 79 20 6F 74 68 65 72
20 6E 61 6D 65 3F
7
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
“Text? We don’t need no stinkin’ text”


Computers (software) do not know about
text. They only know binary values.
By assigning numbers to characters, and
associating behaviors and values to those
numbers, programs give the illusion of
characters, text and their semantics.
– Behaviors: word breaking, hyphenation
– Values: glyph image, case, alphabetic,
numeric, sort
Exchange 2002, Chicago, IL, USA
8
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
What makes text, text?






Typing, Input method
Display
Print
Digits
Operators
“Words”




Collation
Word wrapping
Justification
Hyphenation
– Next, Previous


Punctuation
Upper, Lower case
Exchange 2002, Chicago, IL, USA
9
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Agenda
What is mxyzptlk?
What is text to a computer?
Code pages (single-byte)
The Progress architecture for text
Code pages (multi-byte)
Problem solving
Exchange 2002, Chicago, IL, USA
10
© 2002, Progress Software Corporation
Code pages
A collection of ordered symbols
2002
Exchange
PROGRESS WORLDWIDE
Take a collection of symbols
 Letters, Digits- a, å, 1, 2,...
 Punctuation, Arithmetic Operators
 Special Symbols- e.g. ©, ¶, ¥, £, §
 Line Drawing
 Control Codes
Assign each a unique number (code point):
 å = 229 ISO 8859-1
 å = 134
IBM 850
Exchange 2002, Chicago, IL, USA
11
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Western Europe (CP 1252) Code page
32
64
96
128
160
192
224
Characters in the range 128-159 in 1252 are not in ISO8859-1.
Don’t label text as ISO-8859-1, if it is Windows-1252.
Exchange 2002, Chicago, IL, USA
12
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Russian Windows Code Page CP1251
32
64
96
128
160
192
224
Exchange 2002, Chicago, IL, USA
13
© 2002, Progress Software Corporation
If a character is not in the current2002
code page, how can it be expressed?
Exchange
PROGRESS WORLDWIDE
?
Exchange 2002, Chicago, IL, USA
?
14
© 2002, Progress Software Corporation
If a file contains a 255, what
character does it represent?
Exchange 2002, Chicago, IL, USA
15
2002
Exchange
PROGRESS WORLDWIDE
© 2002, Progress Software Corporation
Text: numeric values with
2002
associated attributes and behaviors
Exchange
PROGRESS WORLDWIDE







Aa1+¿
Zz߮9&
“XYZ” > “ABC” < “5”
I paid $123.45 for a Z-3
He paid €1k for a 1998
4+5
Windows to Unix
Exchange 2002, Chicago, IL, USA
16







Alphabet recognition
Case
Collation
Word break
Number recognition
Arithmetic
Conversion
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Rules for “text” vary with language
Sorting for example:
English:
ABC...RSTUVWXYZ
German: AÄB...NOÖ...SßTUÜV…YZ
Swedish/Finnish: ABC...RSTUVWXYZÅÄÖ
Norwegian: ABC...VWXÜZÆØÅ
Note Y = Ü
Exchange 2002, Chicago, IL, USA
17
© 2002, Progress Software Corporation
What happens in the Operating
System when you press a key?
Font tables map code
point to Glyph (image)
115 => s
2002
Exchange
PROGRESS WORLDWIDE
s
O/S (Code page) tables
define character behavior
115 => letter, lowcase,
Keyboard driver maps to
character values+controls
Alt + 115
Keyboard generates scan
codes
Alt + Row C key 2
Exchange 2002, Chicago, IL, USA
18
sort rank = a + 18
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Agenda
What is mxyzptlk?
What is text to a computer?
Code pages (single-byte)
The Progress architecture for text
Code pages (multi-byte)
Problem solving
Exchange 2002, Chicago, IL, USA
19
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Text rules that Progress “knows”








FORMAT “A(9)”
CAPS, LC
“XYZ” > “ABC”
Where x contains y
INTEGER(mystring)
4+5
INPUT FROM
OUTPUT TO
Exchange 2002, Chicago, IL, USA
Alphabet recognition
Case
Collation
Word break
Number recognition
Arithmetic
Conversion
Conversion
20
© 2002, Progress Software Corporation
Defining text in Progress


2002
Exchange
PROGRESS WORLDWIDE
Tables in DLC/prolang/CONVMAP/*.DAT
Compiled into CONVMAP.CP
– ISALPHA


–
–
–
–

“Type 1” - define alphabetic characters
“Type 2” - define multibyte lead/tail bytes
UPPERCASE-MAP
LOWERCASE-MAP
CONVERT
COLLATION
WordBreak Tables *.WBT in …/CONVMAP/
Exchange 2002, Chicago, IL, USA
21
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
From DLC/PROLANG/CONVMAP/*.DAT
CONVERT
 SOURCE-NAME "ISO8859-1"
 TARGET-NAME "IBM850"
 TYPE "1"
 /*000-015*/ 000 001 002 003 004 005 ... 015
 …
 /*208-223*/ 209 165 227 224 226 229 ...
 /*224-239*/ 133 160 131 198 132 134 ...
 /*240-255*/ 208 164 149 162 147 228...
 ENDTABLE
 ENDCONVERT
Exchange 2002, Chicago, IL, USA
22
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
CONVMAP.CP
In Progress 9.1C:


Over 50 Character sets
25 Collations, including:
– Arabic9, Basic, Basic9, Croatian, Czech,
Danish, Finnish, German Library, German9,
Greek, Hebrew, Hungarian, Icelandic,
Latvian, Lithuanian, Norwegian, Polish,
Romanian, Russian, Spanish9, Swedish,
Thai, Turkish


2 Case rules (Basic, French)
300+ Code page conversions
Exchange 2002, Chicago, IL, USA
23
© 2002, Progress Software Corporation
ISALPHA tables map characters 2002
by code point to a true/false value
Exchange
PROGRESS WORLDWIDE
Code page = ISO8859-1
Character
a
b
Code Point
97
98
Alpha value
1
1
Exchange 2002, Chicago, IL, USA
24
’
ç
146 231
0
1
© 2002, Progress Software Corporation
Case tables map characters by code
2002
point to Upper & Lower code points
Exchange
PROGRESS WORLDWIDE
Code page = ISO8859-1
Character
a
A
c
Code Point
97
65
99
Upper case
65
65
Lower case
97
97
Exchange 2002, Chicago, IL, USA
25
é
233
70(E)?
67
201(É)?
99
233
© 2002, Progress Software Corporation
2 Collation tables map characters2002
by code point to sort weight
Exchange
PROGRESS WORLDWIDE
Code page = ISO8859-1
Character
a
A
c == ç
Code Point
97
65
99
Case Sens.
1
2
5 == 5
Case Insens.
1 == 1
5 == 5
231
ç sorts like c, having identical sort weights
Exchange 2002, Chicago, IL, USA
26
© 2002, Progress Software Corporation
Conversion tables map characters2002
by code point to other code points
Exchange
PROGRESS WORLDWIDE
Code page = ISO8859-1
’
Character
a
b
ISO 8859-1
Code Point
97
98
146 231
IBM 850
Code Point
97
98
202 135
Exchange 2002, Chicago, IL, USA
27
ç
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Code page startup parameters
-cpinternal - code page in memory and GUI
-cpstream - code page for in/out data.
-cpprint- Printer code page
-cpterm - Terminal’s code page
-cpcase - Upper/lower case rules
-cpcoll - Collation table for 4GL, not DB
-convmap - convmap.cp directory
-cprcodein - override R-code code page
-cprcodeout - R-code code page
Exchange 2002, Chicago, IL, USA
28
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Items labeled with Code page






R-code files
Progress and other databases
Dump files (.df, .d)
Promsgs files
Progress database also “labels” collation
and word break
HTML, XML files can be labeled
Exchange 2002, Chicago, IL, USA
29
© 2002, Progress Software Corporation
Russian Progress configuration
(every blue line is conversion)
2002
Exchange
PROGRESS WORLDWIDE
-CPINTERNAL 1251 –CPCASE BASIC –CPCOLL RUSSIAN
Client
-CPINTERNAL
ISO 8859-5
-CPPRINT
KOi8-R
Database
Server
1251
RUSSIAN
Dos
Files
ISO 8859-5
-CPSTREAM R-code
labeled
IBM866
Exchange 2002, Chicago, IL, USA
Database
labeled
30
© 2002, Progress Software Corporation
Startup parameters and labels
determine table choices
PROGRESS WORLDWIDE
 Collation for 4GL
cpinternal+cpcoll or
cpinternal+db collation
 Db collation
 cpinternal+cpcase
 cpinternal+cpstream
 cpinternal+cpprint
 cpinternal+cpterm
 cpinternal+item cp
 Client+Server cp

Exchange 2002, Chicago, IL, USA
2002
Exchange







31
Collation for indexes
Case
Conversion
Conversion
Conversion
Conversion
Conversion
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Agenda
What is mxyzptlk?
What is text to a computer?
Code pages (single-byte)
The Progress architecture for text
Code pages (multi-byte)
Problem solving
Exchange 2002, Chicago, IL, USA
32
© 2002, Progress Software Corporation
Traditional Chinese (CP Big-5)
Exchange 2002, Chicago, IL, USA
33
2002
Exchange
PROGRESS WORLDWIDE
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Double-byte programming

Japanese, Chinese, Korean only
– Thousands of characters require > 8 bit CP


Some characters are 1 byte, some 2 bytes
Problems are caused by:
– Assuming 1 character is 1 byte
– Assuming 1 character is 1 column
– Manipulating bytes instead of characters
Exchange 2002, Chicago, IL, USA
34
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Japanese, Chinese, Korean
Languages with >255 characters
Mixed size characters: 1 or 2 bytes
How long is a DBCS string?
A b c
Byte type:
# Chars:
# Bytes:
Exchange 2002, Chicago, IL, USA
S
1
1
S
2
2
S
3
3
日
本
語
d
e
L T L T L T S
4 4 5 5 6 6 7
4 5 6 7 8 9 10
S
8
11
35
© 2002, Progress Software Corporation
Unicode Character Set
2002
A Worldwide, Multilingual Code Page
Exchange
PROGRESS WORLDWIDE
UTF-8 is multibyte !
Example Unicode Characters
Exchange 2002, Chicago, IL, USA
36
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Agenda
What is mxyzptlk?
What is text to a computer?
Code pages (single-byte)
The Progress architecture for text
Code pages (multi-byte)
Problem solving
Exchange 2002, Chicago, IL, USA
37
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
“When I print a report I get garbage characters
in my report? Can you help me?”
John or Jill Q. Public
Exchange 2002, Chicago, IL, USA
38
© 2002, Progress Software Corporation
‘Text” in, “Mxyzptlk” out!
2002
st
Start debugging. What is the 1 step?
Exchange
PROGRESS WORLDWIDE
Files
Text
Goes
In
mxyzptlk
mxyzptlk
Database
Server
comes
mxyzptlk
out
Exchange 2002, Chicago, IL, USA
39
© 2002, Progress Software Corporation
Techniques at your disposal
Identify all the players
2002
Exchange
PROGRESS WORLDWIDE
Keyboard->client->server->database->server->client->printer
A
B
C
B
A
D+font
“ç”> client> ABconvert> BCconvert>
CBconvert> BAconvert> ADconvert> Font >“ç”

Which components are involved?
– Devices (terminal, printer)
– Operating system(s)
– Drivers, 3rd party S/W (Java, ODBC, OCX,...)
and are they internationalized...
– Progress components (client, appserver,...)


Which code pages do they use?
Which conversions are performed & when?
Exchange 2002, Chicago, IL, USA
40
© 2002, Progress Software Corporation
Determine expected and actual
results
2002
Exchange
PROGRESS WORLDWIDE
Identify, as much as possible:




Characters that went in and resulting
characters
Determine their code points if possible.
Evaluate inputting other characters and
results
How they were input (keyboard, paste, file,
net,…)
Exchange 2002, Chicago, IL, USA
41
© 2002, Progress Software Corporation
Determine expected and actual
results
2002
Exchange
PROGRESS WORLDWIDE
Identify, as much as possible:




How they were output (Terminal, printer,
file,…)
All steps (processing) in-between
Related information (e.g. OS, fonts,
regional settings,…)
Caution- 3rd party S/W may require
regional settings in unique ways or with
different values
Exchange 2002, Chicago, IL, USA
42
© 2002, Progress Software Corporation
Confirm the Progress environment2002
Use 4GL statements for verification
Exchange
PROGRESS WORLDWIDE
–
–
–
–
–
–
–
–
DBCODEPAGE (db_id)
DBCOLLATION (db_id)
GET-CODEPAGES
GET-COLLATIONS (codepage)
SESSION:CPINTERNAL, CPSTREAM, ETC.
RCODE-INFO
ASC
CHR
Exchange 2002, Chicago, IL, USA
43
© 2002, Progress Software Corporation
Techniques at your disposal
Compare with other code pages



2002
Exchange
PROGRESS WORLDWIDE
“ç” went in. Which value came out?
Does the pattern match a conversion
table?
The letter “ç” in different code pages
–
–
–
–
–
1252:
ISO 8859-1:
ROMAN-8:
IBM 850:
IBM 273:
Exchange 2002, Chicago, IL, USA
231
231
181
135
072
44
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
How do I find out about code pages?



Windows Accessory utility: Charmap
DLC/prolang/convmap/*.dat
4GL functions:
– GET-CODEPAGES, CONVERT-CODEPAGE

Popular code pages are on the web
Exchange 2002, Chicago, IL, USA
45
© 2002, Progress Software Corporation
Printing IBM 850 vs. ISO 8859-1
Print 135, when 231 is needed...
Exchange 2002, Chicago, IL, USA
46
2002
Exchange
PROGRESS WORLDWIDE
© 2002, Progress Software Corporation
Compare with other code pages 2002
Conversion scenarios to consider
Exchange
PROGRESS WORLDWIDE
Keyboard->client->server->database->server->client->printer
A
B
C
B
A
D+font
“ç”> client> ABconvert> BCconvert>
CBconvert> BAconvert> ADconvert> Font >“ç”






Correct source to wrong target
A-X
Correct target from wrong source X-B
Inverse conversion
B-A
Extra conversion
A-B-X
Missing conversion
A-A
Caution: Multiple wrongs can seem right
Exchange 2002, Chicago, IL, USA
47
© 2002, Progress Software Corporation
Compare with other code pages
Consider mislabeling

2002
Exchange
PROGRESS WORLDWIDE
Mislabeling of the data’s code page
– E.g. new Euro code pages (ISO 8859-15)
– Asian and other code pages have vendor
variations (e.g. “\” as currency symbol)

Especially data labeled “ISO 8859-1”
– Windows code pages are commonly
misrepresented
– Unlabeled Web pages presumed as ISO
8859-1

Ask an I18n expert for other similar
pages, and problematic code points.
Exchange 2002, Chicago, IL, USA
48
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Consider fonts




Fonts are codepagebased
Code point may be
right, and the
character image
wrong.
abcdefghij
Verify the font (check
device, configuration,
cartridges, etc.)
On Windows, check
script setting
Exchange 2002, Chicago, IL, USA
49
© 2002, Progress Software Corporation
4GL can generate possibilities or 2002
provide verification
Exchange
PROGRESS WORLDWIDE

Statements that specify conversions
– ASC, CHR
– CODEPAGE-CONVERT
– INPUT, OUTPUT, INPUT-OUTPUT

Use these during analysis to:
–
–
–
–
Insert additional conversions
Replace existing conversions
Undo or invert extra conversions
Evaluate alternatives
Exchange 2002, Chicago, IL, USA
50
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Use 4GL to generate possibilities
/* take code point integer make it a character */
schar = chr(sval).
do i = 1 to num-entries(GET-CODEPAGES):
target = entry(i, GET-CODEPAGES). /*code page*/
if source = target then next.
if target = "undefined" then next.
/* Convert it to the current target */
tval = asc(schar,target,source) no-error.
/* ignore illegal conversions */
if tval = -1 then next.
display sval target tval. /* List results*/
end.
Exchange 2002, Chicago, IL, USA
51
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Use 4GL to generate possibilities
Conversions for ISO8859-1 code point 231:
Source-Tgt Codepage------ -----------Target
231
IBM850
135
231
ISO8859-15
231
231
1252
231
231
ROMAN-8
181
231
IBM861
135
231
IBM437
135
231
IBM037
072
231
IBM500
072
231
IBM297
224
231
IBM284
072
231
IBM280
224
231
UTF-8
50,087
231
UCS2
59,136
Exchange 2002, Chicago, IL, USA
52
© 2002, Progress Software Corporation
Techniques at your disposal
Reduce variables
2002
Exchange
PROGRESS WORLDWIDE
Keyboard->client->server->database->server->client->printer
A
B
C
B
A
D+font
“ç”> client> ABconvert> BCconvert>
CBconvert> BAconvert> ADconvert> Font >“ç”


Print character values at after each step
Replace each step with known or
hardcoded input and repeat comparisons
– E.g. Replace UPDATE with CHR(231)
– Replace returned values with CHR(231)
– Use pattern: CHR(128)+CHR(129)...CHR(255)
Exchange 2002, Chicago, IL, USA
53
© 2002, Progress Software Corporation
Techniques at your disposal
Pattern analysis


2002
Exchange
PROGRESS WORLDWIDE
Identify “bad” records using 4GL
Identify records with incorrect, infrequent
characters using CHARSCAN
proutil <db> -C convchar charscan 1252
"188,189,190"
Charscan searching for iso8859-1 character: 188
0xbc. (6570)
Charscan found a character match in
Customer.Comments, recid 103. (6569)
Charscan match count: 1 (6568)
Exchange 2002, Chicago, IL, USA
54
© 2002, Progress Software Corporation
Techniques at your disposal
Additional checks

2002
Exchange
PROGRESS WORLDWIDE
Differentiate conversion problems from
case or collation problems
– CAPS(é) = “E” vs. CAPS(é) = “É”
– “Å” < “B” or “Å” > “Z”



Review DLC/prolang/convmap/*.dat
Try a different widget (occasional bug)
Watch for third party software/hardware
PUT CONTROL "~033E~033(10U~033&l2A~033&l1O~033".
Exchange 2002, Chicago, IL, USA
55
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
“My customer is screaming ‘Mojibake!
Mojibake!’ Can you help me?”
John or Jill Q. Public
Exchange 2002, Chicago, IL, USA
56
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
What causes Mojibake?
Not treating all bytes as one character
 Lead-byte and tail-byte tables define valid
bytes and column widths
 Keep all bytes of a character together.
 Don’t insert in the middle, don’t delete 1
w/o the other, caution at blk boundaries
 Tail-bytes can be syntax-significant
e.g. “\” in pathnames, “~” in 4GL
Exchange 2002, Chicago, IL, USA
57
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Making Mojibake
Splitting multibyte characters
日
Byte type:
Bytes:
本
語
L T L T L T
9 F 9 7 8 E
3 A 6 B C A
Inserting “a” (61) in second byte
殿
Byte type:
Bytes:
Exchange 2002, Chicago, IL, USA
L T
9 6
3 1
a
坙 {
語
L T S L T
F 9 7 8 E
A 6 B C A
58
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Making Mojibake
Splitting multibyte characters
日
Byte type:
Bytes:
本
語
L T L T L T
9 F 9 7 8 E
3 A 6 B C A
Deleting second byte
当
Byte type:
Bytes:
Exchange 2002, Chicago, IL, USA
{
語
L T S L T
9 9 7 8 E
3 6 B C A
59
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Considerations for Mojibake

At the 4GL level:
–
–
–
–
–
–
“RAW” vs. “CHARACTER”, “COLUMN”?
Is –cpinternal a multi-byte code page?
Is the software DBE (double-byte enabled)?
Are bytes in Lead-byte, Tail-byte tables?
Test with IS-LEAD-BYTE function.
Windows requires Default Language
Setting (actually a conversion issue.)
Exchange 2002, Chicago, IL, USA
60
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
“I want to sell my application in Poland. Which
codepage supports Polish?
Also, I have an opportunity in Viet Nam. Which
code page should I use there?”
John or Jill Q. Public
Exchange 2002, Chicago, IL, USA
61
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Choosing a code page





For most languages, there is a clear
choice based on the platform.
Consider the source of the data (ie any
legacy data in a known code page?)
There is no conversion to GUI, so must
use Windows code page on Windows.
Progress will add code pages for
customers if there is a business need.
Use Unicode wherever possible.
Exchange 2002, Chicago, IL, USA
62
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
“I disagree with the conversion table. Should I
create my own conversion table? I also
disagree with other tables, should I make my
own?”
John or Jill Q. Public
Exchange 2002, Chicago, IL, USA
63
© 2002, Progress Software Corporation
Should customers define code
page tables?
2002
Exchange
PROGRESS WORLDWIDE
In general, ask Support first.


OK for 1-way CPPRINT, CPTERM code
pages
For data storage code pages, collations,
conversions, refer to support first.
– Conversions must be 1-1
– Round trip compatibility is required among
families of code pages
– There can be subtle dependencies for
ISALPHA, Indexes, Word break tables
Exchange 2002, Chicago, IL, USA
64
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
“Hey thanks. I found the problem. Some of my
data is now in the wrong code page in my
database. How can I correct these records?”
John or Jill Q. Public
Exchange 2002, Chicago, IL, USA
65
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Replacing “mxyzptlk” with “text”





Sometimes it is not possible.
It may be possible to look for illegal
values, or identify misconverted records.
Heuristics exist for identifying code page,
if the text is large enough.
If the misconversion is well-identified, it
may be possible to find records and
correct the conversion.
Experience/expertise is helpful here.
Exchange 2002, Chicago, IL, USA
66
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Want to know more?




Globalization Empowerment
http://www.progress.com/consulting/globa
lization_empowerment_solutions.htm
Progress Internationalization Guide
OS vendors have international web pages
– http://www.microsoft.com/globaldev
– http://www.microsoft.com/globaldev/dis_v1
/disv1.asp
– http://www.sun.com/globalization
Exchange 2002, Chicago, IL, USA
67
© 2002, Progress Software Corporation
2002
Exchange
PROGRESS WORLDWIDE
Text goes in, Text comes out!




Progress code page architecture is
straightforward, yet powerful and flexible
Debugging code page problems is easy
with knowledge of the architecture
The 4GL has diagnostic functions that can
help
A little knowledge goes a long way, sign
up for Globalization Empowerment
Exchange 2002, Chicago, IL, USA
68
© 2002, Progress Software Corporation
Questions
Exchange 2002, Chicago, IL, USA
2002
Exchange
PROGRESS WORLDWIDE
69
© 2002, Progress Software Corporation