CharUnicodeInfo.GetUnicodeCategory 方法     
定义
重要
一些信息与预发行产品相关,相应产品在发行之前可能会进行重大修改。 对于此处提供的信息,Microsoft 不作任何明示或暗示的担保。
获取 Unicode 字符的 Unicode 类别。
重载
| GetUnicodeCategory(Char) | 获取指定字符的 Unicode 类别。 | 
| GetUnicodeCategory(Int32) | 获取指定字符的 Unicode 类别。 | 
| GetUnicodeCategory(String, Int32) | 获取位于指定字符串的指定索引处的字符的 Unicode 类别。 | 
GetUnicodeCategory(Char)
- Source:
- CharUnicodeInfo.cs
- Source:
- CharUnicodeInfo.cs
- Source:
- CharUnicodeInfo.cs
获取指定字符的 Unicode 类别。
public:
 static System::Globalization::UnicodeCategory GetUnicodeCategory(char ch);public static System.Globalization.UnicodeCategory GetUnicodeCategory (char ch);static member GetUnicodeCategory : char -> System.Globalization.UnicodeCategoryPublic Shared Function GetUnicodeCategory (ch As Char) As UnicodeCategory参数
- ch
- Char
要获取其 Unicode 类别的 Unicode 字符。
返回
指示指定字符类别的 UnicodeCategory 值。
示例
下面的代码示例显示每个方法为不同类型的字符返回的值。
using namespace System;
using namespace System::Globalization;
void PrintProperties( Char c );
int main()
{
   Console::WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );
   Console::Write( "U+0061 LATIN SMALL LETTER A            " );
   PrintProperties( L'a' );
   Console::Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
   PrintProperties( L'\u0393' );
   Console::Write( "U+0039 DIGIT NINE                      " );
   PrintProperties( L'9' );
   Console::Write( "U+00B2 SUPERSCRIPT TWO                 " );
   PrintProperties( L'\u00B2' );
   Console::Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
   PrintProperties( L'\u00BC' );
   Console::Write( "U+0BEF TAMIL DIGIT NINE                " );
   PrintProperties( L'\u0BEF' );
   Console::Write( "U+0BF0 TAMIL NUMBER TEN                " );
   PrintProperties( L'\u0BF0' );
   Console::Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
   PrintProperties( L'\u0F33' );
   Console::Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
   PrintProperties( L'\u2788' );
}
void PrintProperties( Char c )
{
   Console::Write( " {0,-3}", c );
   Console::Write( " {0,-5}", CharUnicodeInfo::GetNumericValue( c ) );
   Console::Write( " {0,-5}", CharUnicodeInfo::GetDigitValue( c ) );
   Console::Write( " {0,-5}", CharUnicodeInfo::GetDecimalDigitValue( c ) );
   Console::WriteLine( "{0}", CharUnicodeInfo::GetUnicodeCategory( c ) );
}
/*
This code produces the following output.  Some characters might not display at the console.
                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       Γ   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  ²   2     2     -1   OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      ¼   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 ௯   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 ௰   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          ༳   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    ➈   9     9     -1   OtherNumber
*/
using System;
using System.Globalization;
public class SamplesCharUnicodeInfo  {
   public static void Main()  {
      Console.WriteLine( "                                        c  Num   Dig   Dec   UnicodeCategory" );
      Console.Write( "U+0061 LATIN SMALL LETTER A            " );
      PrintProperties( 'a' );
      Console.Write( "U+0393 GREEK CAPITAL LETTER GAMMA      " );
      PrintProperties( '\u0393' );
      Console.Write( "U+0039 DIGIT NINE                      " );
      PrintProperties( '9' );
      Console.Write( "U+00B2 SUPERSCRIPT TWO                 " );
      PrintProperties( '\u00B2' );
      Console.Write( "U+00BC VULGAR FRACTION ONE QUARTER     " );
      PrintProperties( '\u00BC' );
      Console.Write( "U+0BEF TAMIL DIGIT NINE                " );
      PrintProperties( '\u0BEF' );
      Console.Write( "U+0BF0 TAMIL NUMBER TEN                " );
      PrintProperties( '\u0BF0' );
      Console.Write( "U+0F33 TIBETAN DIGIT HALF ZERO         " );
      PrintProperties( '\u0F33' );
      Console.Write( "U+2788 CIRCLED SANS-SERIF DIGIT NINE   " );
      PrintProperties( '\u2788' );
   }
   public static void PrintProperties( char c )  {
      Console.Write( " {0,-3}", c );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( c ) );
      Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( c ) );
      Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( c ) );
   }
}
/*
This code produces the following output.  Some characters might not display at the console.
                                        c  Num   Dig   Dec   UnicodeCategory
U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
U+0393 GREEK CAPITAL LETTER GAMMA       Γ   -1    -1    -1   UppercaseLetter
U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
U+00B2 SUPERSCRIPT TWO                  ²   2     2     -1   OtherNumber
U+00BC VULGAR FRACTION ONE QUARTER      ¼   0.25  -1    -1   OtherNumber
U+0BEF TAMIL DIGIT NINE                 ௯   9     9     9    DecimalDigitNumber
U+0BF0 TAMIL NUMBER TEN                 ௰   10    -1    -1   OtherNumber
U+0F33 TIBETAN DIGIT HALF ZERO          ༳   -0.5  -1    -1   OtherNumber
U+2788 CIRCLED SANS-SERIF DIGIT NINE    ➈   9     9     -1   OtherNumber
*/
Imports System.Globalization
Public Class SamplesCharUnicodeInfo
   Public Shared Sub Main()
      Console.WriteLine("                                        c  Num   Dig   Dec   UnicodeCategory")
      Console.Write("U+0061 LATIN SMALL LETTER A            ")
      PrintProperties("a"c)
      Console.Write("U+0393 GREEK CAPITAL LETTER GAMMA      ")
      PrintProperties(ChrW(&H0393))
      Console.Write("U+0039 DIGIT NINE                      ")
      PrintProperties("9"c)
      Console.Write("U+00B2 SUPERSCRIPT TWO                 ")
      PrintProperties(ChrW(&H00B2))
      Console.Write("U+00BC VULGAR FRACTION ONE QUARTER     ")
      PrintProperties(ChrW(&H00BC))
      Console.Write("U+0BEF TAMIL DIGIT NINE                ")
      PrintProperties(ChrW(&H0BEF))
      Console.Write("U+0BF0 TAMIL NUMBER TEN                ")
      PrintProperties(ChrW(&H0BF0))
      Console.Write("U+0F33 TIBETAN DIGIT HALF ZERO         ")
      PrintProperties(ChrW(&H0F33))
      Console.Write("U+2788 CIRCLED SANS-SERIF DIGIT NINE   ")
      PrintProperties(ChrW(&H2788))
   End Sub
   Public Shared Sub PrintProperties(c As Char)
      Console.Write(" {0,-3}", c)
      Console.Write(" {0,-5}", CharUnicodeInfo.GetNumericValue(c))
      Console.Write(" {0,-5}", CharUnicodeInfo.GetDigitValue(c))
      Console.Write(" {0,-5}", CharUnicodeInfo.GetDecimalDigitValue(c))
      Console.WriteLine("{0}", CharUnicodeInfo.GetUnicodeCategory(c))
   End Sub
End Class
'This code produces the following output.  Some characters might not display at the console.
'
'                                        c  Num   Dig   Dec   UnicodeCategory
'U+0061 LATIN SMALL LETTER A             a   -1    -1    -1   LowercaseLetter
'U+0393 GREEK CAPITAL LETTER GAMMA       Γ   -1    -1    -1   UppercaseLetter
'U+0039 DIGIT NINE                       9   9     9     9    DecimalDigitNumber
'U+00B2 SUPERSCRIPT TWO                  ²   2     2     -1   OtherNumber
'U+00BC VULGAR FRACTION ONE QUARTER      ¼   0.25  -1    -1   OtherNumber
'U+0BEF TAMIL DIGIT NINE                 ௯   9     9     9    DecimalDigitNumber
'U+0BF0 TAMIL NUMBER TEN                 ௰   10    -1    -1   OtherNumber
'U+0F33 TIBETAN DIGIT HALF ZERO          ༳   -0.5  -1    -1   OtherNumber
'U+2788 CIRCLED SANS-SERIF DIGIT NINE    ➈   9     9     -1   OtherNumber
注解
Unicode 字符分为多个类别。 字符的类别是其属性之一。 例如,字符可以是大写字母、小写字母、十进制数字、字母数字、连接器标点符号、数学符号或货币符号。 类 UnicodeCategory 返回 Unicode 字符的类别。 有关 Unicode 字符的详细信息,请参阅 Unicode 标准。
方法 GetUnicodeCategory 假定 对应于 ch 单个语言字符并返回其类别。 这意味着,对于代理项对,它返回 UnicodeCategory.Surrogate 而不是代理项所属的类别。 例如,Ugaritic 字母占到 U+1039F 的码位 U+10380。 以下示例使用 ConvertFromUtf32 方法实例化表示 UGARITIC 字母 ALPA (U+10380) 的字符串,这是 Ugaritic 字母的第一个字母。 如示例输出所示, IsNumber(Char) 如果传递此字符的高代理项或低代理项,方法将返回 false 。
int utf32 = 0x10380;       // UGARITIC LETTER ALPA
string surrogate = Char.ConvertFromUtf32(utf32);
foreach (var ch in surrogate)
    Console.WriteLine($"U+{(ushort)ch:X4}: {System.Globalization.CharUnicodeInfo.GetUnicodeCategory(ch):G}");
// The example displays the following output:
//       U+D800: Surrogate
//       U+DF80: Surrogate
Dim utf32 As Integer = &h10380       ' UGARITIC LETTER ALPA
Dim surrogate As String = Char.ConvertFromUtf32(utf32)
For Each ch In surrogate
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(ch), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(ch))
Next
' The example displays the following output:
'       U+D800: Surrogate
'       U+DF80: Surrogate
请注意, CharUnicodeInfo.GetUnicodeCategory 在将特定字符作为参数传递时,不会始终返回与 方法相同的 UnicodeCategory 值 Char.GetUnicodeCategory 。 方法 CharUnicodeInfo.GetUnicodeCategory 旨在反映 Unicode 标准的当前版本。 相比之下,尽管 Char.GetUnicodeCategory 方法通常反映 Unicode 标准的当前版本,但它可能会返回基于以前版本的标准字符的类别,或者可能会返回与当前标准不同的类别,以保持向后兼容性。
另请参阅
适用于
GetUnicodeCategory(Int32)
- Source:
- CharUnicodeInfo.cs
- Source:
- CharUnicodeInfo.cs
- Source:
- CharUnicodeInfo.cs
获取指定字符的 Unicode 类别。
public:
 static System::Globalization::UnicodeCategory GetUnicodeCategory(int codePoint);public static System.Globalization.UnicodeCategory GetUnicodeCategory (int codePoint);static member GetUnicodeCategory : int -> System.Globalization.UnicodeCategoryPublic Shared Function GetUnicodeCategory (codePoint As Integer) As UnicodeCategory参数
- codePoint
- Int32
一个数字,表示 Unicode 字符的 32 位码位值。
返回
指示指定字符类别的 UnicodeCategory 值。
适用于
GetUnicodeCategory(String, Int32)
- Source:
- CharUnicodeInfo.cs
- Source:
- CharUnicodeInfo.cs
- Source:
- CharUnicodeInfo.cs
获取位于指定字符串的指定索引处的字符的 Unicode 类别。
public:
 static System::Globalization::UnicodeCategory GetUnicodeCategory(System::String ^ s, int index);public static System.Globalization.UnicodeCategory GetUnicodeCategory (string s, int index);static member GetUnicodeCategory : string * int -> System.Globalization.UnicodeCategoryPublic Shared Function GetUnicodeCategory (s As String, index As Integer) As UnicodeCategory参数
- index
- Int32
要获取其 Unicode 类别的 Unicode 字符的索引。
返回
指示位于指定字符串的指定索引处的字符类别的 UnicodeCategory 值。
例外
              s 为 null。
              index 超出 s 中的有效索引范围。
示例
下面的代码示例显示每个方法为不同类型的字符返回的值。
using namespace System;
using namespace System::Globalization;
int main()
{
   // The String to get information for.
   String^ s = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788";
   Console::WriteLine( "String: {0}", s );
   // Print the values for each of the characters in the string.
   Console::WriteLine( "index c  Num   Dig   Dec   UnicodeCategory" );
   for ( int i = 0; i < s->Length; i++ )
   {
      Console::Write( "{0,-5} {1,-3}", i, s[ i ] );
      Console::Write( " {0,-5}", CharUnicodeInfo::GetNumericValue( s, i ) );
      Console::Write( " {0,-5}", CharUnicodeInfo::GetDigitValue( s, i ) );
      Console::Write( " {0,-5}", CharUnicodeInfo::GetDecimalDigitValue( s, i ) );
      Console::WriteLine( "{0}", CharUnicodeInfo::GetUnicodeCategory( s, i ) );
   }
}
/*
This code produces the following output.  Some characters might not display at the console.
String: a9Γ²¼௯௰➈
index c  Num   Dig   Dec   UnicodeCategory
0     a   -1    -1    -1   LowercaseLetter
1     9   9     9     9    DecimalDigitNumber
2     Γ   -1    -1    -1   UppercaseLetter
3     ²   2     2     -1   OtherNumber
4     ¼   0.25  -1    -1   OtherNumber
5     ௯   9     9     9    DecimalDigitNumber
6     ௰   10    -1    -1   OtherNumber
7     ➈   9     9     -1   OtherNumber
*/
using System;
using System.Globalization;
public class SamplesCharUnicodeInfo  {
   public static void Main()  {
      // The String to get information for.
      String s = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788";
      Console.WriteLine( "String: {0}", s );
      // Print the values for each of the characters in the string.
      Console.WriteLine( "index c  Num   Dig   Dec   UnicodeCategory" );
      for ( int i = 0; i < s.Length; i++ )  {
         Console.Write( "{0,-5} {1,-3}", i, s[i] );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetNumericValue( s, i ) );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetDigitValue( s, i ) );
         Console.Write( " {0,-5}", CharUnicodeInfo.GetDecimalDigitValue( s, i ) );
         Console.WriteLine( "{0}", CharUnicodeInfo.GetUnicodeCategory( s, i ) );
      }
   }
}
/*
This code produces the following output.  Some characters might not display at the console.
String: a9Γ²¼௯௰➈
index c  Num   Dig   Dec   UnicodeCategory
0     a   -1    -1    -1   LowercaseLetter
1     9   9     9     9    DecimalDigitNumber
2     Γ   -1    -1    -1   UppercaseLetter
3     ²   2     2     -1   OtherNumber
4     ¼   0.25  -1    -1   OtherNumber
5     ௯   9     9     9    DecimalDigitNumber
6     ௰   10    -1    -1   OtherNumber
7     ➈   9     9     -1   OtherNumber
*/
Imports System.Globalization
Public Class SamplesCharUnicodeInfo
   Public Shared Sub Main()
      ' The String to get information for.
      Dim s As [String] = "a9\u0393\u00B2\u00BC\u0BEF\u0BF0\u2788"
      Console.WriteLine("String: {0}", s)
      ' Print the values for each of the characters in the string.
      Console.WriteLine("index c  Num   Dig   Dec   UnicodeCategory")
      Dim i As Integer
      For i = 0 To s.Length - 1
         Console.Write("{0,-5} {1,-3}", i, s(i))
         Console.Write(" {0,-5}", CharUnicodeInfo.GetNumericValue(s, i))
         Console.Write(" {0,-5}", CharUnicodeInfo.GetDigitValue(s, i))
         Console.Write(" {0,-5}", CharUnicodeInfo.GetDecimalDigitValue(s, i))
         Console.WriteLine("{0}", CharUnicodeInfo.GetUnicodeCategory(s, i))
      Next i
   End Sub
End Class
'This code produces the following output.  Some characters might not display at the console.
'
'String: a9Γ²¼௯௰➈
'index c  Num   Dig   Dec   UnicodeCategory
'0     a   -1    -1    -1   LowercaseLetter
'1     9   9     9     9    DecimalDigitNumber
'2     Γ   -1    -1    -1   UppercaseLetter
'3     ²   2     2     -1   OtherNumber
'4     ¼   0.25  -1    -1   OtherNumber
'5     ௯   9     9     9    DecimalDigitNumber
'6     ௰   10    -1    -1   OtherNumber
'7     ➈   9     9     -1   OtherNumber
注解
Unicode 字符分为多个类别。 字符的类别是其属性之一。 例如,字符可以是大写字母、小写字母、十进制数字、字母数字、连接器标点符号、数学符号或货币符号。 类 UnicodeCategory 返回 Unicode 字符的类别。 有关 Unicode 字符的详细信息,请参阅 Unicode 标准。
              Char如果位于 位置index的对象是有效代理项对的第一个字符,则 GetUnicodeCategory(String, Int32) 方法返回代理项对的 Unicode 类别,而不是返回 UnicodeCategory.Surrogate。 例如,Ugaritic 字母占到 U+1039F 的码位 U+10380。 以下示例使用 ConvertFromUtf32 方法实例化表示 UGARITIC 字母 ALPA (U+10380) 的字符串,这是 Ugaritic 字母的第一个字母。 如示例输出所示, GetUnicodeCategory(String, Int32) 如果方法传递了此字符的高代理项,则该方法将返回 UnicodeCategory.OtherLetter ,这表示它考虑代理项对。 但是,如果传递了低代理项,则它单独只考虑低代理项,并返回 UnicodeCategory.Surrogate。
int utf32 = 0x10380;       // UGARITIC LETTER ALPA
string surrogate = Char.ConvertFromUtf32(utf32);
for (int ctr = 0; ctr < surrogate.Length; ctr++)
    Console.WriteLine($"U+{(ushort)surrogate[ctr]:X4}: {System.Globalization.CharUnicodeInfo.GetUnicodeCategory(surrogate, ctr):G}");
// The example displays the following output:
//       U+D800: OtherLetter
//       U+DF80: Surrogate
Dim utf32 As Integer = &h10380       ' UGARITIC LETTER ALPA
Dim surrogate As String = Char.ConvertFromUtf32(utf32)
For ctr As Integer = 0 To surrogate.Length - 1
   Console.WriteLine("U+{0:X4}: {1:G}", 
                     Convert.ToUInt16(surrogate(ctr)), 
                     System.Globalization.CharUnicodeInfo.GetUnicodeCategory(surrogate, ctr))
Next
' The example displays the following output:
'       U+D800: OtherLetter
'       U+DF80: Surrogate
请注意, CharUnicodeInfo.GetUnicodeCategory 将特定字符作为参数传递时,方法并不总是返回与 方法相同的 UnicodeCategory 值 Char.GetUnicodeCategory 。 方法 CharUnicodeInfo.GetUnicodeCategory 旨在反映 Unicode 标准的当前版本。 相比之下,尽管 Char.GetUnicodeCategory 方法通常反映 Unicode 标准的当前版本,但它可能会返回基于以前版本的标准字符的类别,或者可能会返回与当前标准不同的类别,以保持向后兼容性。