Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
If you're trying to import your data into custom NER, it has to follow a specific format. If you don't have data to import, you can create your project and use Azure AI Foundry to label your documents.
Labels file format
Your Labels file should be in json format for use in importing your labels into a project.
{
  "projectFileVersion": "2022-05-01",
  "stringIndexType": "Utf16CodeUnit",
  "metadata": {
    "projectKind": "CustomEntityRecognition",
    "storageInputContainerName": "{CONTAINER-NAME}",
    "projectName": "{PROJECT-NAME}",
    "multilingual": false,
    "description": "Project-description",
    "language": "en-us",
    "settings": {}
  },
  "assets": {
    "projectKind": "CustomEntityRecognition",
    "entities": [
      {
        "category": "Entity1"
      },
      {
        "category": "Entity2"
      }
    ],
    "documents": [
      {
        "location": "{DOCUMENT-NAME}",
        "language": "{LANGUAGE-CODE}",
        "dataset": "{DATASET}",
        "entities": [
          {
            "regionOffset": 0,
            "regionLength": 500,
            "labels": [
              {
                "category": "Entity1",
                "offset": 25,
                "length": 10
              },
              {
                "category": "Entity2",
                "offset": 120,
                "length": 8
              }
            ]
          }
        ]
      },
      {
        "location": "{DOCUMENT-NAME}",
        "language": "{LANGUAGE-CODE}",
        "dataset": "{DATASET}",
        "entities": [
          {
            "regionOffset": 0,
            "regionLength": 100,
            "labels": [
              {
                "category": "Entity2",
                "offset": 20,
                "length": 5
              }
            ]
          }
        ]
      }
    ]
  }
}
| Key | Placeholder | Value | Example | 
|---|---|---|---|
| multilingual | true | A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See language support to learn more about multilingual support. | true | 
| projectName | {PROJECT-NAME} | Project name | myproject | 
| storageInputContainerName | {CONTAINER-NAME} | Container name | mycontainer | 
| entities | Array containing all the entity types you have in the project. Entity types extracted from your documents. | ||
| documents | Array containing all the documents in your project and list of the entities labeled within each document. | [] | |
| location | {DOCUMENT-NAME} | The location of the documents in the storage container. Since all the documents are in the root of the container, this location should be the document name. | doc1.txt | 
| dataset | {DATASET} | The test set to which this file goes to when split before training. Learn more about data splitting here . Possible values for this field are TrainandTest. | Train | 
| regionOffset | The inclusive character position of the start of the text. | 0 | |
| regionLength | The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region. | 500 | |
| category | The type of entity associated with the span of text specified. | Entity1 | |
| offset | The start position for the entity text. | 25 | |
| length | The length of the entity in terms of UTF16 characters. | 20 | |
| language | {LANGUAGE-CODE} | A string specifying the language code for the document used in your project. If your project is a multilingual project, choose the language code for most of the documents. For more information, see Language support. | en-us | 
Next steps
- You can import your labeled data into your project directly. Learn how to import project
- See the how-to article more information about labeling your data. When you're done labeling your data, you can train your model.