Issues with Azure recognizing which row text corresponds to in table fields
Hello, one of our project involves using a custom model to extract data from tables on scans. Typically, this works well and the model is able to correctly associate the text for a field within a given row.
But with some documents the model fails to correctly parse the data for all of the fields within the table. For example for each row we collect address and name data, and in some cases the model will return the first_name value for the first row as the first name for both the first and second row.
I am not adding a scan to avoid sharing PII, but imagine there is a table on the scan like the one below. In some cases our model is returning "Will Bill" for the first_name field in the first row of the table.
| First Name | Last Name |
|---|---|
| Will | Test |
| Bill | Test |
We have retrained our model several times with these edge cases, but are not noticing improvement. Even when we test the new model with some of the documents we just trained on, the model will produce results with the same issue.
I think in some of these cases certain characters from the first name extend into the bottom row. For example the tail of a"y" may cross into row 2, but usually this does not cause the model to fail.
I was wondering if this as a pattern that has been noticed before and if there were any solutions or best practices in training on table fields specifically that you could please point me to.
Thanks,
Will