Share via


DocumentPage Class

Content and layout elements extracted from a page from the input.

Constructor

DocumentPage(*args: Any, **kwargs: Any)

Variables

Name Description
page_number
int

1-based page number in the input document. Required.

angle

The general orientation of the content in clockwise direction, measured in degrees between (-180, 180].

width

The width of the image/PDF in pixels/inches, respectively.

height

The height of the image/PDF in pixels/inches, respectively.

unit

The unit used by the width, height, and polygon properties. For images, the unit is "pixel". For PDF, the unit is "inch". Known values are: "pixel" and "inch".

spans

Location of the page in the reading order concatenated content. Required.

words

Extracted words from the page.

selection_marks

Extracted selection marks from the page.

lines

Extracted lines from the page, potentially containing both textual and visual elements.

barcodes

Extracted barcodes from the page.

formulas

Extracted formulas from the page.

Methods

as_dict

Return a dict that can be JSONify using json.dump.

clear
copy
get
items
keys
pop
popitem
setdefault
update
values

as_dict

Return a dict that can be JSONify using json.dump.

as_dict(*, exclude_readonly: bool = False) -> Dict[str, Any]

Keyword-Only Parameters

Name Description
exclude_readonly

Whether to remove the readonly properties.

Default value: False

Returns

Type Description

A dict JSON compatible object

clear

clear() -> None

copy

copy() -> Model

get

get(key: str, default: Any = None) -> Any

Parameters

Name Description
key
Required
default
Default value: None

items

items() -> ItemsView[str, Any]

keys

keys() -> KeysView[str]

pop

pop(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name Description
key
Required
default

popitem

popitem() -> Tuple[str, Any]

setdefault

setdefault(key: str, default: ~typing.Any = <object object>) -> Any

Parameters

Name Description
key
Required
default

update

update(*args: Any, **kwargs: Any) -> None

values

values() -> ValuesView[Any]

Attributes

angle

The general orientation of the content in clockwise direction, measured in degrees between (-180, 180].

angle: float | None

barcodes

Extracted barcodes from the page.

barcodes: List[_models.DocumentBarcode] | None

formulas

Extracted formulas from the page.

formulas: List[_models.DocumentFormula] | None

height

The height of the image/PDF in pixels/inches, respectively.

height: float | None

lines

Extracted lines from the page, potentially containing both textual and visual elements.

lines: List[_models.DocumentLine] | None

page_number

1-based page number in the input document. Required.

page_number: int

selection_marks

Extracted selection marks from the page.

selection_marks: List[_models.DocumentSelectionMark] | None

spans

Location of the page in the reading order concatenated content. Required.

spans: List[_models.DocumentSpan]

unit

The unit used by the width, height, and polygon properties. For images, the unit is "pixel". For PDF, the unit is "inch". Known values are: "pixel" and "inch".

unit: str | _models.LengthUnit | None

width

The width of the image/PDF in pixels/inches, respectively.

width: float | None

words

Extracted words from the page.

words: List[_models.DocumentWord] | None