Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Introduced in iOS 12, the Natural Language framework enables on-device natural language processing. It supports language recognition, tokenization, and tagging. Tokenization splits text into its component words, sentences, or paragraphs; tagging identifies parts of speech, people, places, and organizations.
The Natural Language framework can also use custom Core ML models to classify and tag text in specialized contexts.
The NSLinguisticTagger class is still available. However, the Natural Language framework is the preferred mechanism to use for Natural Language processing.
Sample app: XamarinNL
To learn how to use the Natural Language framework with Xamarin.iOS, explore the following concepts:
- Recognize languages.
- Tokenize text into words and sentences.
- Tag named entities and parts of speech.
Recognizing languages
The Recognizer tab of the sample app demonstrates how to use an
NLLanguageRecognizer
to determine the language for a block of text.
Note
Language recognition is a specific type of text classification. The Natural Language framework also supports custom text classification via developer-provided Core ML models. For more information, take a look at the Introducing Natural Language Framework session from WWDC 2018.
Dominant language
Tap the Language button to identify the dominant language in the user input.
The HandleDetermineLanguageButtonTap method of the LanguageRecognizerViewController uses the
GetDominantLanguage
method of an NLLanguageRecognizer to fetch the
NLLanguage
for the primary language found in the text:
partial void HandleDetermineLanguageButtonTap(UIButton sender)
{
UserInput.ResignFirstResponder();
if (!String.IsNullOrWhiteSpace(UserInput.Text))
{
NLLanguage lang = NLLanguageRecognizer.GetDominantLanguage(UserInput.Text);
DominantLanguageLabel.Text = lang.ToString();
}
}
Language probabilities
Tap the Language probabilities button to fetch a list of language hypotheses for the user input.
The HandleLanguageProbabilitiesButtonTap method of the
LanguageRecognizerViewController class instantiates an
NLLanguageRecognizer and asks it to
Process
the user's text. It then calls the language recognizer's
GetNativeLanguageHypotheses
method, which fetches a dictionary of languages and associated
probabilities. The LanguageRecognizerTableViewController class then
renders these languages and probabilities.
partial void HandleLanguageProbabilitiesButtonTap(UIButton sender)
{
UserInput.ResignFirstResponder();
if (!String.IsNullOrWhiteSpace(UserInput.Text))
{
var recognizer = new NLLanguageRecognizer();
recognizer.Process(UserInput.Text);
NSDictionary<NSString, NSNumber> probabilities = recognizer.GetNativeLanguageHypotheses(10);
PerformSegue(ShowLanguageProbabilitiesSegue, this);
}
}
Potential NLLanguage values include:
AmharicArabicArmenianBengaliBulgarianBurmeseCatalanCherokeeCroatianCzechDanishDutchEnglishFinnishFrenchGeorgianGermanGreekGujaratiHebrewHindiHungarianIcelandicIndonesianItalianJapaneseKannadaKhmerKoreanLaoMalayMalayalamMarathiMongolianNorwegianOriyaPersianPolishPortuguesePunjabiRomanianRussianSimplifiedChineseSinhaleseSlovakSpanishSwedishTamilTeluguThaiTibetanTraditionalChineseTurkishUkrainianUndeterminedUrduVietnamese
A full list of supported languages is available as part of the
NLLanguage
enum API documentation.
Tokenizing text into words, sentences, and paragraphs
The Tokenizer tab of the sample app demonstrates how to separate
a block of text into its component words or sentences with an
NLTokenizer.
Tap the Words or Sentences button to fetch a list of tokens. Each token is associated with a word or sentence in the original text.
ShowTokens splits the user's input into tokens by calling the
GetTokens
method of an NLTokenizer. This method returns an array of
NSValue
objects, each wrapping an NSRange value corresponding to a token in
the original text.
void ShowTokens(NLTokenUnit unit)
{
if (!String.IsNullOrWhiteSpace(UserInput.Text))
{
var tokenizer = new NLTokenizer(unit);
tokenizer.String = UserInput.Text;
var range = new NSRange(0, UserInput.Text.Length);
NSValue[] tokens = tokenizer.GetTokens(range);
PerformSegue(ShowTokensSegue, this);
}
}
LanguageTokenizerTableViewController renders a single token in each table
cell. It extracts an NSRange from a token NSValue, finds the
corresponding string in the original text, and sets a label on the table
view cell:
public override UITableViewCell GetCell(UITableView tableView, NSIndexPath indexPath)
{
var cell = TableView.DequeueReusableCell(TokenCell);
NSRange range = Tokens[indexPath.Row].RangeValue;
cell.TextLabel.Text = Text.Substring((int)range.Location, (int)range.Length);
return cell;
}
Tagging named entities and parts of speech
The Tagger tab of the XamarinNL sample app demonstrates how to use the
NLTagger
class to associate categories with tokens of an input string.
The Natural Language framework includes built-in support for recognizing
people, places, organizations, and parts of speech.
Note
The Natural Language framework also supports custom tagging schemes via developer-provided Core ML models. For more information, take a look at the Introducing Natural Language Framework session from WWDC 2018.
Tap the Named entities or Parts of speech button to fetch:
- An array of
NSValueobjects, each wrapping anNSRangefor a token in the original text. - An array of
NLTagvalues – categories for theNSValuetokens at the same array index.
In LanguageTaggerViewController, HandlePartsOfSpeechButtonTap and
HandleNamedEntitiesButtonTap each call ShowTags, passing along an
NLTagScheme –
either NLTagScheme.LexicalClass (for parts of speech) or
NLTagScheme.NameType (for named entities).
ShowTags creates an NLTagger, instantiating it with an array of
NLTagScheme types for which it will be queried (in this case, only the
passed-in NLTagScheme value). It then uses the
GetTags
method on the NLTagger to determine the tags relevant to the text in the
user input.
void ShowTags(NLTagScheme tagScheme)
{
if (!String.IsNullOrWhiteSpace(UserInput.Text))
{
var tagger = new NLTagger(new NLTagScheme[] { tagScheme });
var range = new NSRange(0, UserInput.Text.Length);
tagger.String = UserInput.Text;
NLTag[] tags = tagger.GetTags(range, NLTokenUnit.Word, tagScheme, NLTaggerOptions.OmitWhitespace, out NSValue[] ranges);
NSValue[] tokenRanges = ranges;
detailViewTitle = tagScheme == NLTagScheme.NameType ? "Named Entities" : "Parts of Speech";
PerformSegue(ShowEntitiesSegue, this);
}
}
The tags are then displayed in a table by the LanguageTaggerTableViewController.
Potential NLTag values include:
AdjectiveAdverbClassifierCloseParenthesisCloseQuoteConjunctionDashDeterminerIdiomInterjectionNounNumberOpenParenthesisOpenQuoteOrganizationNameOtherOtherPunctuationOtherWhitespaceOtherWordParagraphBreakParticlePersonalNamePlaceNamePrepositionPronounPunctuationSentenceTerminatorVerbWhitespaceWordWordJoiner
A full list of supported tags is available as part of the
NLTag
enum API documentation.