International Domain Name

Download Report

Transcript International Domain Name

International Domain Name
TWNIC
Nai-Wen Hsu
[email protected]
Domain name

RFC 1035




A label can not longer than 63 characters
A domain name can not longer than 255
characters
Maximum labels: 127
Only accept a-z,0-9,’-’ as domain name

Limited ASCII character code point, 37 LDH
(Letter-Digit-Hyphen)
International Domain Name

IETF IDN WG adopt UNICODE 3.2


Greek, Cyrillic, Armenian, Hebrew, Arabic,
Syriac, Thaana, Devanagari, Bengali,
Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
Kannada, Malayalam, Sinhala, Thai, …
95,156 characters
International Domain Name
sample






レコード会社.jp
gwmöbler.com
慎昌鐘錶.tw
阿克苏诺贝尔油漆公司.cn
소프트웨어.kr
‫ םוק‬. ‫לארשי‬
IETF IDN Standard

IDNA (RFC3490)


NAMEPREP(RFC3491)


A Stringprep Profile for Internationalized Domain
Names
PUNYCODE(RFC3492)


Internationalizing Domain Names in Applications
A Bootstring encoding of Unicode for
Internationalized Domain Names in Applications
STRINGPREP(RFC3454)

Preparation of Internationalized Strings
IDNA components
and interfaces
User
Input and display: local interface methods
(pen, keyboard, ...)
IDNA
IDNA-aware Application
(ToASCII and ToUnicode
operations may be
called here)
Call to resolver
ACE
xn--de-jg4avhby1noc0d
Resolver
Application-specific
Protocol: ACE
Unless the protocol
Is updated to handle
Other encodings
DNS Protocol
ACE
"Application" is where the
application splits a host
name into labels, sets the
appropriate flags, and performs
the ToASCII
and ToUnicode operations.
DNS Servers
End system
Application
Servers
IDNA Structure
Nameprep:
A Stringprep Profile for
Internationalized Domain
Names
User input
(UNICODE)
IDNA
NAMEPREP
STRINGPREP
• Mapping
• Normalization
• Prohibit
ToASCII
ToUnicode
ACE(PUNYCODE)
ACE
To resolver
NAMEPREP

A Stringprep Profile for Internationalized
Domain Names

Mapping


Normalization


Stringprep table B.1,B.2
Form KC
Prohibited Output

Stringprep table C.1.2,2.2,3,4,5,6,7,8,9
NAMEPREP -- Mapping

Commonly mapped to nothing: 27


Ex:
Mapping for case-folding used with
NFKC: 1371

Ex:
A  a (U+0041U+0061)

(U+03ABU+03CB)

(U+3371U+0068 U+0070
U+0061)
NAMEPREP -- Normalization

Unicode normalization with form KC
NAMEPREP -- Normalization


‘u’+‘‥’  ‘ü’
‘a’‘a’
NAMEPREP – Prohibited
output

Non-ASCII space characters: 17




(NO-BREAK SPACE)
Non-ASCII control characters: 54


Ex:
Ex:
(DEVICE CONTROL STRING)
Private use: 133371
Non-character code points: 49
Surrogate codes: 2048
NAMEPREP – Prohibited
output




Inappropriate for plain text: 4
Inappropriate for canonical
representation: 12
Change display properties or are
deprecated: 13
Tagging characters: 97
PUNYCODE


A Bootstring encoding of Unicode for
IDNA
One of the ACE(ASCII Compatible Encoding)


Translate non-ASCII characters to ASCII
characters
Prefix: xn-
Ex:
慎昌鐘錶.tw  xn--ciun9hb52c2za.tw
Insufficient in IDN standard

Current IDN standard (IDNA,
NAMEPREP, PUNYCODE) can not solve
Chinese domain name requirement

Tradition/Simplify Chinese mapping


Ex: 台  臺
Writing variant mapping

Ex: 峰  峯
Insufficient in IDN standard


They are the same meaning but it is
different character in different countries
In China:


In Japan:


劝(529D)
勧(52E7)
In Taiwan:

勸(52F8)
IDN administration guide line


Registration policy to solve those
problems listed above
Every language has a variant table with
3 fields:



valid code point
recommended variant
character variant
Variant Table sample
Valid code point
(VCP)
Recommended
variants by .tw
(twRV)
Recommended
variants by .cn
(cnRV)
Character
Variant(s)
(CV)
Remarks
Singular-relation
character(1)
丁(4E01)
丁(4E01)
丁(4E01)
丁(4E01)
丄(4E04)
上(4E0A)
上(4E0A)
丄(4E04)
上(4E0A)
上(4E0A)
上(4E0A)
上(4E0A)
丄(4E04)
上(4E0A)
万(4E07)
万(4E07)
万(4E07)
万(4E07)
萬(842C)
萬(842C)
萬(842C)
万(4E07)
万(4E07)
萬(842C)
Pair-relation characters
(2.1)
Pair-relation characters
(2.2)
Variant Table sample
Valid code point
(VCP)
Recommended
variants by .tw
(twRV)
Recommended
variants by .cn
(cnRV)
Character
Variant(s)
(CV)
叶(53F6)
葉(8449)
叶(53F6)
叶(53F6)
葉(8449)
葉(8449)
葉(8449)
叶(53F6)
叶(53F6)
葉(8449)
个(4E2A)
个(4E2A)
個(500B)
箇(7B87)
个(4E2A)
个(4E2A)
個(500B)
箇(7B87)
个(4E2A)
个(4E2A)
個(500B)
箇(7B87)
个(4E2A)
個(500B)
箇(7B87)
個(500B)
個(500B)
個(500B)
remarks
Pair-relation characters
(2.3)
Multiple-relation
Characters
Variant Table





Singular-relation character
(VCP=twRV=cnRV=CV): 13888(66.4%)
VCP=twRV≠cnRV: 2783 (13.3%)
VCP=cnRV≠twRV: 2453(11.7%)
VCP≠(twRV=cnRV): 333(1.6%)
VCP≠twRV≠SCR: 387(1.9%)
Variant Table
Number of
character
variant(s)
1
2
3
4
5
6
7
8
Number of
Characters
13888
66.4%
5156
24.7%
1158
5.5%
424
2.0%
165
0.79%
60
0.29%
35
0.17%
16
0.08%
Variant Table
• The table draft is prepared by the CCMT Task force
organized by TWNIC from January, 2002.
• Task force members have 9 experts from
language linguist, computer experts and DNS experts.
• The table draft has submitted to the Bureau of Standards,
Ministry of Economic Affairs to final review.
Registration procedure



A Registrant should select the language(s)
Activation of the requested domain name(s)
& Reservation of the equivalence(s) should be
provided by the Registry, within the
language-based character set
The registrant can require the activation of
the reserved equivalent domain name(s) at
any time
Registration sample

A user select zh-tw and zh-cn language
with domain name 丁上萬.com




丁上萬.com
zh-tw)
丁上万.com
zh-cn)
丁丄万.com
丁丄萬.com
(Recommended variants for
(Recommended variants for
(Character Variant)
(Character Variant)
Q&A