|
| 1 | +The Python OCR SDK supports [custom-built API](https://developers.mindee.com/docs/build-your-first-document-parsing-api) from the API Builder. If your document isn't covered by one of Mindee's Off-the-Shelf APIs, you can create your own API using the [API Builder](https://developers.mindee.com/docs/overview). |
| 2 | + |
| 3 | +If your document isn't covered by one of Mindee's Off-the-Shelf APIs, you can create your own API using the |
| 4 | +[API Builder](https://developers.mindee.com/docs/overview). |
| 5 | + |
| 6 | +For the following examples, we are using our own [W9s custom API](https://developers.mindee.com/docs/w9-forms-ocr), |
| 7 | +created with the [API Builder](https://developers.mindee.com/docs/overview). |
| 8 | + |
| 9 | +```python |
| 10 | +from mindee import Client, documents |
| 11 | + |
| 12 | +# Init a new client and add your custom endpoint (document) |
| 13 | +mindee_client = Client(api_key="my-api-key").add_endpoint( |
| 14 | + account_name="john", |
| 15 | + endpoint_name="wsnine", |
| 16 | + # version="1.2", # optional, see configuring client section below |
| 17 | +) |
| 18 | + |
| 19 | +# Load a file from disk and parse it. |
| 20 | +# The endpoint name must be specified since it can't be determined from the class. |
| 21 | +result = mindee_client.doc_from_path( |
| 22 | + "/path/to/the/w9.jpg" |
| 23 | +).parse(documents.TypeCustomV1, endpoint_name="wnine") |
| 24 | + |
| 25 | +# Print a brief summary of the parsed data |
| 26 | +print(result.document) |
| 27 | +``` |
| 28 | + |
| 29 | +## Adding the Endpoint |
| 30 | +Below are the arguments for adding a custom endpoint using the `add_endpoint` method. |
| 31 | + |
| 32 | +**`endpoint_name`**: The endpoint name is the API name from [Settings](https://developers.mindee.com/docs/build-your-first-document-parsing-api#settings-api-keys-and-documentation) page |
| 33 | + |
| 34 | +**`account_name`**: Your organization's or user's name in the API Builder. |
| 35 | + |
| 36 | +**`version`**: If set, locks the version of the model to use, you'll be required to update your code every time a new model is trained. |
| 37 | + This is probably not needed for development but essential for production use. |
| 38 | + If not set, uses the latest version of the model. |
| 39 | + |
| 40 | +## Parsing Documents |
| 41 | +The client calls the `parse` method when parsing your custom document, which will return an object containing the prediction results of sent file. |
| 42 | +The `endpoint_name` must be specified when calling the `parse` method for a custom endpoint. |
| 43 | + |
| 44 | +```python |
| 45 | +result = mindee_client.doc_from_path("/path/to/receipt.jpg").parse( |
| 46 | + documents.TypeCustomV1, endpoint_name="wnine" |
| 47 | +) |
| 48 | + |
| 49 | +print(result.document) |
| 50 | +``` |
| 51 | + |
| 52 | +> 📘 **Info** |
| 53 | +> |
| 54 | +> If your custom document has the same name as an [off-the-shelf APIs](https://developers.mindee.com/docs/what-is-off-the-shelf-api) document, |
| 55 | +> you **must** specify your account name when calling the `parse` method: |
| 56 | +
|
| 57 | +```python |
| 58 | +from mindee import Client, documents |
| 59 | + |
| 60 | +mindee_client = Client(api_key="johndoe-receipt-api-key").add_endpoint( |
| 61 | + endpoint_name="receipt", |
| 62 | + account_name="JohnDoe", |
| 63 | +) |
| 64 | + |
| 65 | +result = mindee_client.doc_from_path("/path/to/receipt.jpg").parse( |
| 66 | + documents.TypeCustomV1, |
| 67 | + endpoint_name="wnine", |
| 68 | + account_name="JohnDoe", |
| 69 | +) |
| 70 | +``` |
| 71 | + |
| 72 | +## Document Fields |
| 73 | +All the fields defined in the API Builder when creating your custom document are available. |
| 74 | + |
| 75 | +In custom documents, each field will hold an array of all the words in the document which are related to that field. |
| 76 | +Each word is an object that has the text content, geometry information, and confidence score. |
| 77 | + |
| 78 | +Value fields can be accessed via the `fields` attribute. |
| 79 | + |
| 80 | +Classification fields can be accessed via the `classifications` attribute. |
| 81 | + |
| 82 | +> 📘 **Info** |
| 83 | +> |
| 84 | +> Both document level and page level objects work in the same way. |
| 85 | +
|
| 86 | +### Fields Attribute |
| 87 | +The `fields` attribute is a dictionary with the following structure: |
| 88 | + |
| 89 | +* key: the API name of the field, as a `str` |
| 90 | +* value: a `ListField` object which has a `values` attribute, containing a list of all values found for the field. |
| 91 | + |
| 92 | +Individual field values can be accessed by using the field's API name, in the examples below we'll use the `address` field. |
| 93 | + |
| 94 | +```python |
| 95 | +# raw data, list of each word object |
| 96 | +print(result.document.fields["address"].values) |
| 97 | + |
| 98 | +# list of all values |
| 99 | +print(result.document.fields["address"].contents_list) |
| 100 | + |
| 101 | +# default string representation |
| 102 | +print(str(result.document.fields["address"])) |
| 103 | + |
| 104 | +# custom string representation |
| 105 | +print(result.document.fields["address"].contents_string(separator="_")) |
| 106 | +``` |
| 107 | + |
| 108 | +To iterate over all the fields: |
| 109 | +```python |
| 110 | +for name, info in result.document.fields.items(): |
| 111 | + print(name) |
| 112 | + print(info.values) |
| 113 | +``` |
| 114 | + |
| 115 | +### Classifications Attribute |
| 116 | +The `classifications` attribute is a dictionary with the following structure: |
| 117 | + |
| 118 | +* key: the API name of the field, as a `str` |
| 119 | +* value: a `ClassificationField` object which has a `value` attribute, containing a string representation of the detected classification. |
| 120 | + |
| 121 | +```python |
| 122 | +# raw data, list of each word object |
| 123 | +print(result.document.classifications["doc_type"].values) |
| 124 | +``` |
| 125 | + |
| 126 | +To iterate over all the classifications: |
| 127 | +```python |
| 128 | +for name, info in result.document.classifications.items(): |
| 129 | + print(name) |
| 130 | + print(info.values) |
| 131 | +``` |
| 132 | + |
| 133 | +## Questions? |
| 134 | +[Join our Slack](https://join.slack.com/t/mindee-community/shared_invite/zt-1jv6nawjq-FDgFcF2T5CmMmRpl9LLptw) |
0 commit comments