Title Slide: First Slide in a Presentation

Download Report

Transcript Title Slide: First Slide in a Presentation

IDN
Patrik Fältström
[email protected]
DNSSEC
© 2000, Cisco Systems, Inc.
1
The Domain Name System
• It is a distributed database with only
limited lookup mechanism
• It is a protocol
• Often the two get mixed up
IDN
© 2001, Cisco Systems, Inc.
2
Protocol is “safe”
• In each label you can have any octets
Including for example ‘.’
• So, what is the problem?
• Well, people try to register words and
phrases in DNS, when DNS is
designed for registration of
identifiers
IDN
© 2001, Cisco Systems, Inc.
3
Protocol - conclusion
• Even though we can handle 8bit
octets in the DNS protocol, many
applications have problems
• Is it not the applications that have to
be fixed (also?)?
• Can a solution be backward
compatible with old protocols?
IDN
© 2001, Cisco Systems, Inc.
4
A real solution
• The user types in information he knows,
i.e. a search query and a context
“Patrik Fältström”, person, lives in Sweden
Gets back alternative(s)
Selects correct alternative
• What the domain name is, which later is
used, doesn’t matter. Just like IP address
doesn’t matter -- it is hidden for the user
• Keyword Systems work almost like this
IDN
© 2001, Cisco Systems, Inc.
5
DNS is about equality
• Sweden:
http://www.torbjörn.com
• Norway:
http://www.torbjørn.com
• Are they the same site?
IDN
© 2001, Cisco Systems, Inc.
6
DNS is about equality
• About Swedish lakes:
http://Å.com
• About a physical unit:
http://Å.com
• Are they the same site?
IDN
© 2001, Cisco Systems, Inc.
7
What can we do in DNS?
DNSSEC
IDN
© 2001,
2000, Cisco Systems, Inc.
8
Let someone (else) decide
• We have one algorithm, and one only
• Given this algorithm, people can
register whatever they want
IDN
© 2001, Cisco Systems, Inc.
9
Unicode Consortium
• The Unicode Consortium have
produced a couple of interesting
things:
A character set
Also accepted by ISO as 10646
Technical reports
Normalization
Case Folding
IDN
© 2001, Cisco Systems, Inc.
10
Example: Normalization
• A description of what characters are
to be treated as the same when
comparing them
• Example, U+00C4, Ä
U+00C4: LATIN CAPITAL LETTER A WITH DIAERESIS
Equivalent with U+0041 followed by U+0308
U+0041: LATIN CAPITAL LETTER A
U+0308: COMBINING DIAERESIS
IDN
© 2001, Cisco Systems, Inc.
11
Decisions in the IETF
• IETF don’t have knowledge of characters
• Discussions in ISO and Unicode
consortium have so far existed in 25 years
• Why should IETF be more successful?
• IETF because of this inherits results from
other organizations
In this case the Unicode Consortium
IDN
© 2001, Cisco Systems, Inc.
12
Ultimate goal
• DNS is designed for registration of
identifiers
• Users (believe they are) registering
words
• IETF can solve the problem of not
being able to use local characters in
identifier
• IETF can NOT solve the problem of
using words in DNS
IDN
© 2001, Cisco Systems, Inc.
13
One more step…
• One more step is taken
• Distinguish between the generic
stringprep, which defines in what order
the various translations are to be done,
and application specific profiles
Example of decisions made in the profiles
include
Case sensitivity
Special groups of characters which are
mapped out (forbidden)
IDN
© 2001, Cisco Systems, Inc.
14
Profiles?
• So far profiles are created for:
Domain Names (IDN)
iSCSI units
Kerberos Realms (and other things)
IDN
© 2001, Cisco Systems, Inc.
15
Standard / Test?
•
There is a big confusion on the
state of various test beds and
products
i.
.nu
ii. Verisign Global Registry System
iii. ICANN policy
IDN
© 2001, Cisco Systems, Inc.
16
.nu
• Only handle WWW, (i.e. URL’s)
• Microsoft Internet Explorer happen to
send, when using Windows, non-ascii
characters in UTF-8 encoded Unicode
• As the DNS protocol is “8-bit clean”, the
query in UTF-8 reaches the server
• Why bad?
Only one application, one vendor
Other applications can not handle UTF-8
No “Normalization” is done
IDN
© 2001, Cisco Systems, Inc.
17
VGRS
• Follows, after a lot of discussions,
the process in the IDN working group
• Nothing is allowed to happen if
ICANN is objecting
• Used RACE, but is now changing to
ACE-Z
• Most “correct” testbed out there
IDN
© 2001, Cisco Systems, Inc.
18
ICANN
• Points at IETF (so far)
IDN
© 2001, Cisco Systems, Inc.
19
Objections…
• Objections exists
• Why not UTF-8?
Backward compatibility is important
Even if UTF-8 is used, nameprep is needed
• Simplified/Traditional Chinese (GB/BIG5)
Unicode Consortium objects to trying to do
something
Groups which have been working on SC/TC
issues object to IETF doing anything
It is “easy” for 90% of the problem
IDN
© 2001, Cisco Systems, Inc.
20
IDNA proposal
(Nameprep in detail)
DNSSEC
IDN
© 2001,
2000, Cisco Systems, Inc.
21
A few steps
User interface
Local Character set
Application
1. Conversion to Unicode
2. Nameprep Algorithm
3. ACE Encoding
Application
Protocol
IDN
© 2001, Cisco Systems, Inc.
DNS
A-Z, 0-9 etc
22
Nameprep (order is important)
1. Mapping of characters
Case Mapping (UTR 21)
Additional Folding
Mapped out (deleted)
2. Normalizing characters
Normalization (KC in UTR15)
3. Prohibition of code points
IDN
© 2001, Cisco Systems, Inc.
Currently prohibited Characters
Space Characters
Control Characters
Private Use and Replacement
Non-character codepoints
Surrogate codes
Inappropriate for text
Inappropriate for domainnames
Change display property marks
Inappropriate for some input systems
23
Mapping
• Case Mapping (UTR 21)
• Additional Folding
Greek characters
Symbols which include latin characters
b = NormalizeWithKC(Fold(a));
c = NormalizeWithKC(Fold(b));
if c is not the same as b, add a mapping for "a to c”;
• Mapped out (deleted)
Only interesting in line-based text (zero-width space etc)
Variation selectors (Mongolian) and cursive selectors which doesn’t
bear any semantics (zero width joiner)
IDN
© 2001, Cisco Systems, Inc.
24
Normalizing characters
• Normalization (KC in UTR15)
Sorting also described in ISO/IEC 14651
IDN
© 2001, Cisco Systems, Inc.
25
Prohibition of code points
• Currently prohibited Characters
Control characters, braces and brackets etc in ASCII
• Space Characters
Various space characters (including em space etc)
• Control Characters
Control characters, line separators etc
• Private Use and Replacement
Private character code points and replacement character
• Non-character code points
IDN
© 2001, Cisco Systems, Inc.
26
Prohibition of code points
• Surrogate codes
• Inappropriate for plain text
Interlinear annotation anchor etc
• Inappropriate for domain names
Ideographic description characters
• Change display property marks
Left-To-Right Mark, Activate Arabic Form Shaping etc
• Inappropriate for some input systems
Ideographic Full Stop
IDN
© 2001, Cisco Systems, Inc.
27
Classes of characters
• AO - Code points that may be in the
output
• MN - Code points that cannot be in
the output because they are mapped
to nothing or never appear as output
from normalization
• D - Code points that cannot be in the
output because they are disallowed
in the prohibition step
• U - Unassigned code points
IDN
© 2001, Cisco Systems, Inc.
28
Versioning
• New versions of nameprep will move
code points from class U to one of
AO, MN or D
• Only class AO code points will exist
in authoritative name servers
• Applications seeing class U code
points must treat them as AO
(Lots of more explanation in the document...)
IDN
© 2001, Cisco Systems, Inc.
29
Conclusion...
This is not easy...
…and what IDN wg is doing is not a
perfect, but working, solution…
…for the problem of being able to use
local script in identifiers, not the
interest of storing words in DNS…
IDN
© 2001, Cisco Systems, Inc.
30
Patrik Fältström
[email protected]
DNSSEC
IDN
© 2001,
2000, Cisco Systems, Inc.
31