Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The PII Detection skill extracts personal information from an input text and gives you the option of masking it. This skill uses the detection models provided in Azure AI Language.
Note
This skill is bound to Azure AI services and requires a billable resource for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing Azure AI services Standard price.
@odata.type
Microsoft.Skills.Text.PIIDetectionSkill
Data limits
The maximum size of a record should be 50,000 characters as measured by String.Length. You can use Text Split skill for data chunking. Set the page length to 5000 for the best results.
Skill parameters
Parameters are case-sensitive and all are optional.
| Parameter name | Description | 
|---|---|
| defaultLanguageCode | (Optional) The language code to apply to documents that don't specify language explicitly.  If the default language code isn't specified,  English (en) is the default language code. See the full list of supported languages. | 
| minimumPrecision | A value between 0.0 and 1.0. If the confidence score (in the piiEntitiesoutput) is lower than the setminimumPrecisionvalue, the entity isn't returned or masked. The default is 0.0. | 
| maskingMode | A parameter that provides various ways to mask the personal information detected in the input text. The following options are supported: 
 | 
| maskingCharacter | The character used to mask the text if the maskingModeparameter is set toreplace. The following option is supported:*(default). This parameter can only benullifmaskingModeisn't set toreplace. | 
| domain | (Optional) A string value, if specified, sets the domain to a subset of the entity categories. Possible values include: "phi"(detect confidential health information only),"none". | 
| piiCategories | (Optional) If you want to specify which entities are detected and returned, use this optional parameter (defined as a list of strings) with the appropriate entity categories. This parameter can also let you detect entities that aren't enabled by default for your document language. See Supported Personally Identifiable Information entity categories for the full list. | 
| modelVersion | (Optional) Specifies the version of the model to use when calling personally identifiable information detection. It defaults to the most recent version when not specified. We recommend you don't specify this value unless it's necessary. | 
Skill inputs
| Input name | Description | 
|---|---|
| languageCode | A string indicating the language of the records. If this parameter isn't specified, the default language code is used to analyze the records. See the full list of supported languages. | 
| text | The text to analyze. | 
Skill outputs
| Output name | Description | 
|---|---|
| piiEntities | An array of complex types that contains the following fields: 
 See Supported Personally Identifiable Information entity categories for the full list. | 
| maskedText | This output varies depending maskingMode. IfmaskingModeisreplace, output is the string result of the masking performed over the input text, as described by themaskingMode. IfmaskingModeisnone, there's no output. | 
Sample definition
  {
    "@odata.type": "#Microsoft.Skills.Text.PIIDetectionSkill",
    "defaultLanguageCode": "en",
    "minimumPrecision": 0.5,
    "maskingMode": "replace",
    "maskingCharacter": "*",
    "inputs": [
      {
        "name": "text",
        "source": "/document/content"
      }
    ],
    "outputs": [
      {
        "name": "piiEntities"
      },
      {
        "name": "maskedText"
      }
    ]
  }
Sample input
{
    "values": [
      {
        "recordId": "1",
        "data":
           {
             "text": "Microsoft employee with ssn 859-98-0987 is using our awesome API's."
           }
      }
    ]
}
Sample output
{
  "values": [
    {
      "recordId": "1",
      "data" : 
      {
        "piiEntities":[ 
           { 
              "text":"859-98-0987",
              "type":"U.S. Social Security Number (SSN)",
              "subtype":"",
              "offset":28,
              "length":11,
              "score":0.65
           }
        ],
        "maskedText": "Microsoft employee with ssn *********** is using our awesome API's."
      }
    }
  ]
}
The offsets returned for entities in the output of this skill are directly returned from the Language Service APIs, which means if you're using them to index into the original string, you should use the StringInfo class in .NET in order to extract the correct content. For more information, see Multilingual and emoji support in Language service features.
Errors and warnings
If the language code for the document is unsupported, a warning is returned and no entities are extracted. If your text is empty, a warning is returned. If your text is larger than 50,000 characters, only the first 50,000 characters are analyzed and a warning is issued.
If the skill returns a warning, the output maskedText may be empty, which can impact any downstream skills that expect the output. For this reason, be sure to investigate all warnings related to missing output when writing your skillset definition.