Globalization Features in Whidbey’s CLR

Download Report

Transcript Globalization Features in Whidbey’s CLR

Globalization Features in Whidbey’s CLR Michael Kaplan

Technical Lead Globalization Infrastructure, Fonts and Tools Microsoft Windows International Division http://blogs.msdn.com/michkap April 25, 2005

Customized Cultures and Regions

  

CultureAndRegionInfoBuilder class

   Create an override to an existing culture Create based on an existing culture Create from scratch

Must be an administrator to register Can register the file on multiple machines

April 25, 2005

CultureAndRegionInfoBuilder sample

CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder(“de-DE-MineMine”, CultureAndRegionModifiers.None); // load up all of the existing data for German and for Germany....

carib.LoadDataFromCultureInfo(new CultureInfo(“de-DE", false)); carib.LoadDataFromRegionInfo(new RegionInfo(“de”); // Change a property carib.ThreeLetterISORegionName = “ZZZ”; // Register the culture on the machine carib.Register(); // Use the new culture CultureInfo ci = new CultureInfo(“de-DE-MineMine”); April 25, 2005

CaRIB serialization with LDML

    

Locale Data Markup Language Described in UTS#35 at

http://unicode.org/reports/tr35/

CaRIB objects can be saved as LDML files Data can be loaded from LDML files

CaRIB will do its best with files it did not create April 25, 2005

LDML Sample

string file1 = Path.GetTempFileName(); File.Delete(file1); CultureInfo ci = new CultureInfo("ar-EG"); RegionInfo ri = new RegionInfo("de-DE"); CultureAndRegionInfoBuilder carib = new CultureAndRegionInfoBuilder("x-en-US-Pepsi", CultureAndRegionModifiers.None); carib.LoadDataFromCultureInfo(ci); carib.LoadDataFromRegionInfo(ri); carib.Save(file1); carib = CultureAndRegionInfoBuilder.CreateFromLdml(file1); carib.Register(); April 25, 2005

When Windows knows more than .NET

   As of XPSP2, there are 25 new locales in Windows:                         Bengali - India Croatian - Bosnia and Herzegovina Bosnian - Bosnia and Herzegovina Serbian - Bosnia and Herzegovina (Latin) Serbian - Bosnia and Herzegovina (Cyrillic) Welsh - United Kingdom (more info in English, in Welsh) Maori - New Zealand Malayalam - India Maltese - Malta Quechua - Bolivia Quechua - Ecuador Quechua - Peru Setswana / Tswana - South Africa isiXhosa / Xhosa - South Africa isiZulu / Zulu - South Africa Sesotho sa Leboa / Northern Sotho - South Africa Northern Sami - Norway Northern Sami - Sweden Northern Sami - Finland Lule Sami - Norway Lule Sami - Sweden Southern Sami - Norway Southern Sami - Sweden Skolt Sami - Finland Inari Sami - Finland  There will be more in future service packs In Longhorn, there will be 75 or more new locales April 25, 2005

Windows-only Cultures

The solution: Windows-only cultures!

 Synthesizes a CultureInfo object when Windows supports a locale that the .NET Framework does not know how to create itself April 25, 2005

Windows only culture test

foreach(CultureInfo culture in CultureInfo.GetCultures(CultureTypes.WindowsOnlyCultures)) { Console.WriteLine(ci.Name); } // New cultures on XP SP2 include: // mt-MT, bs-BA-Latn, smn-FI, smj-NO, smj-SE, sms-FI, sma-NO, // sma-SE, quz-BO, quz-EC, quz-PE, ml-IN, bn-IN, cy-GB, and more April 25, 2005

Special CultureInfo support for SQL Server 2005 (Yukon)

   SQL Server locale semantics:  One setting for UI and formatting  Another setting for collation/encoding .NET/Windows semantics   One setting for UI Another setting for formatting/collation Solution  Special GetCultureInfo override that takes two CultureInfo names for the two SQL Server settings April 25, 2005

How Yukon uses this support

 Microsoft.ReportingServices.Diagnostics.Localization

 CatalogCulture     ClientPrimaryCulture DefaultReportServerCulture FallbackUICulture InstalledCultureNames   ReportParameterCulture SqlCulture April 25, 2005

     

New locale properties/methods

TextInfo   CultureName LCID CompareInfo  Name DateTimeFormatInfo    ShortestDayNames MonthGenitiveNames AbbreviatedMonthGenitiveNames NumberFormatInfo   NativeDigits DigitSubstitution CultureInfo      IsCustomCulture IetfLanguageTag CultureTypes GetCultureInfo() GetCultureInfoByIetfLanguageTag() RegionInfo     GeoId NativeName CurrencyEnglishName (Can now create via full culture names) April 25, 2005

Updates to encodings

     Now built into the BCL    Improved performance more flexibility consistent results across supported platforms Encoding enumeration API UTF-32 support (little endian and big endian) UTF-16 big endian support Encoding/decoding fallbacks     Exception Replacement “Best fit” Custom April 25, 2005

public class NumericEntitiesFallback : EncoderFallback { public override EncoderFallbackBuffer CreateFallbackBuffer() { return new NEFallbackBuffer(); } } public override int MaxCharCount { get { return 8; } } public class NEFallbackBuffer : EncoderFallbackBuffer { // Store our default string private String strEntity; int fallbackCount = -1; int fallbackIndex = 0; // Fallback Methods public override bool Fallback(char charUnknown, int index) { // If we had a buffer already we're being recursive, throw, // it's probably at the suspect character in our array.

if (fallbackCount >= 0) ThrowLastCharRecursive(unchecked((int)charUnknown)); // Go ahead and get our fallback strEntity = String.Format("&#{0};", (int)charUnknown); fallbackCount = strEntity.Length; fallbackIndex = 0; } return fallbackCount != 0; public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index) { // Double check input surrogate pair if (!Char.IsHighSurrogate(charUnknownHigh)) throw new ArgumentOutOfRangeException("charUnknownHigh", “supposed to be between 0xD800 and 0xDBFF"); if (!Char.IsLowSurrogate(charUnknownLow)) throw new ArgumentOutOfRangeException("CharUnknownLow", “supposed to be between 0xD800 and 0xDBFF"); // If we had a buffer already we're being recursive, throw, it's // probably at the suspect character in our array.

if (fallbackCount >= 0) ThrowLastCharRecursive(Char.ConvertToUtf32(charUnknownHigh, charUnknownLow)); // Go ahead and get our fallback strEntity = String.Format("&#{0};", Char.ConvertToUtf32(charUnknownHigh, charUnknownLow)); fallbackCount = strEntity.Length; fallbackIndex = 0; } return fallbackCount != 0; public override char GetNextChar() { // We want it to get < 0 because == 0 means that the current/last // character is a fallback and we need to detect recursion. We // could have a flag but we already have this counter.

fallbackCount--; // Do we have anything left? 0 is now last fallback char, negative // is nothing left if (fallbackCount < 0) return (char)0; // Need to get it out of the buffer.

return strEntity[fallbackIndex++]; } public override bool MovePrevious() { fallbackCount++; fallbackIndex--; return true; } public override int Remaining { get { return (fallbackCount < 0) ? 0 : fallbackCount; } } // private helper methods private void ThrowLastCharRecursive(int charRecursive) { // Throw it, using our complete character throw new ArgumentException( String.Format("Last character \\u{0:4X} was a recursive fallback", charRecursive), "chars"); } } April 25, 2005

Collation Improvements

   OrdinalIgnoreCase   Same results as ToUpper/Ordinal Matches OS file system results Correct Serbian collation  Fixed in Windows XPSP2  Customer reported (MSDN Feedback Center) Better handling of ignored/ignorable characters  IndexOf/LastIndexOf/IsPrefix/IsSuffix  StartsWith/EndsWith, too April 25, 2005

OrdinalIgnoreCase sample

string strTest1 = "IamAString"; string strTest2 = "STRING"; if(strTest1.EndsWith(strTest2, StringComparison.OrdinalIgnoreCase)) { Console.WriteLine(“Successful test!”); }; April 25, 2005

  

Unicode normalization

Described in UAX#15 at http://www.unicode.org/reports/tr15/ String.IsNormalized() String.IsNormalized(NormalizationForm normalizationForm) String.Normalize() String.Normalize(NormalizationForm normalizationForm)   NormalizationForm enumeration FormC, FormD, FormKC, FormKD       õĥµ¨ (U+00f5 U+0068 U+0302 U+00b5 U+00a8) LATIN SMALL LETTER O WITH TILDE; LATIN SMALL LETTER H; COMBINING CIRCUMFLEX ACCENT; MICRO SIGN; DIAERESIS FormC: õĥµ¨ FormD: õĥµ¨ FormKC: õĥμ ̈ FormKD: õĥμ ̈ (U+00f5 U+0125 U+00b5 U+00a8) (U+006f U+0303 U+0068 U+0302 U+00b5 U+00a8) (U+00f5 U+0125 U+03bc U+0020 U+0308) (U+006f U+0303 U+0068 U+0302 U+03bc U+0020 U+0308) In collation, õĥµ¨ ≅ õĥµ¨ ≅ õĥμ ̈ ≅ õĥμ ̈ April 25, 2005

namespace àáâãäå { using System; using System.Text; using System.Globalization; class àáâãäå { [STAThread] static void Main(string[] args) { àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå(); } static void àáâãäå(string àáâãäå) { StringBuilder àáâãäå = new StringBuilder(); StringInfo àáâãäå = new StringInfo(àáâãäå); àáâãäå .Append(àáâãäå.Normalize(NormalizationForm.FormC)); àáâãäå .Append(": "); for(int àáâãäå=0; àáâãäå < àáâãäå.LengthInTextElements; àáâãäå++) { string àáâãäå = àáâãäå.SubstringByTextElements(àáâãäå, 1); if(àáâãäå.IsNormalized(NormalizationForm.FormC)) { àáâãäå .Append("C"); } else if(àáâãäå.IsNormalized(NormalizationForm.FormD)) { àáâãäå .Append("D"); } else { àáâãäå .Append("_"); } } Console.WriteLine(àáâãäå .ToString()); return; } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } static void àáâãäå() { àáâãäå .àáâãäå("àáâãäå"); } } } April 25, 2005

IDN Mapping APIs

     IdnMapping class Based on three RFCs (standard based on Unicode 3.2)    3490 - Internationalizing Domain Names in Applications (IDNA) 3491 - Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN) 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)

\u5B89\u5BA4\u5948\u7F8E\u6075-with-SUPER-MONKEYS

becomes

xn---with-SUPER-MONKEYS-pc58ag80a8qai00g7n9n

Properties   AllowUnassigned (allows new Unicode characters) UseStd3AsciiRules (more like DNS rules) Methods   GetAscii - Gets ASCII (Punycode) version of the string GetUnicode - Gets Unicode version of the string, normalized and limited to IDNA characters.

April 25, 2005

Unicode property information

   New CharUnicodeInfo class Extends methods on Char Offical data from the Unicode Character Database at http://www.unicode.org/ucd/       IsWhiteSpace GetNumericValue GetDigitValue GetDecimalDigitValue GetUnicodeCategory GetBidiCategory April 25, 2005

New text element support in the StringInfo class

    

StringInfo ctor that takes a string StringInfo.String

StringInfo.LengthInTextElements

StringInfo.SubstringByTextElements() Both use ParseCombiningCharacters() to get their results

April 25, 2005

New StringInfo props/methods sample

StringInfo si = New StringInfo("A\u0300\u0301\u0300e\u0300\u0301\u0300“); Console.WriteLine(si.LengthInTextElements); // Length is two for(int ich = 0; ich < si.LengthInTextElements; ich++) { Console.WriteLine(si.SubstringByTextElements(ich, 1); } April 25, 2005

New supplementary character support in lots of methods

 New signature -- (String s, int index)   IsControl, IsDigit, IsLetter, IsLetterOrDigit, IsLower, IsNumber, IsPunctuation, IsSeparator, IsSurrogate, IsSymbol, IsUpper, IsWhiteSpace, GetUnicodeCategory, GetNumericValue, IsHighSurrogate, IsLowSurrogate, IsSurrogatePair ConvertToUtf32, ConvertFromUtf32 methods April 25, 2005

    

References

MSDN Magazine Article  Make the .NET World a Friendlier Place with the Many Faces of the CultureInfo Class March 2005 http://msdn.microsoft.com/msdnmag/issues/05/03/CultureInfo/ SQL Server Books Online “ International Considerations for SQL Server

4772-46a8-a8ef-bc134502b4e0.asp

http://whidbey.msdn.microsoft.com/library/en-us/icsql9/html/50dc4fa8-

My Blog  http://blogs.msdn.com/michkap Some other blogs for int’l support in Whidbey    http://blogs.msdn.com/AchimR http://www.dasblonde.net/ http://blogs.msdn.com/BCLTeam Other useful sites    http://www.microsoft.com/globaldev/ http://lab.msdn.microsoft.com/productfeedback/ http://www.unicode.org/ April 25, 2005

Globalization Features in Whidbey’s CLR

Questions

April 25, 2005