Skip to content

Commit fcef49e

Browse files
committed
(release): A2I custom and IDP Mortgage
1 parent 087f06c commit fcef49e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+46542
-1
lines changed

04-idp-document-a2i.ipynb

Lines changed: 882 additions & 0 deletions
Large diffs are not rendered by default.

04.01-idp-a2i-with-custom-rules.ipynb

Lines changed: 912 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Once the SageMaker Studio IDE has fully loaded in your browser, you can clone th
6363
* Next, clone this repository using
6464

6565
```
66-
git clone <repo_url> idp_workshop
66+
git clone https://github.com/aws-samples/aws-ai-intelligent-document-processing idp_workshop
6767
```
6868

6969
* Once the repository is cloned, a direcotry named `idp_workshop` will appear in the "File Browser" on the left panel of SageMaker Studio IDE

a2idata/990-sample-page-1.jpg

175 KB
Loading

a2idata/__init__.py

Whitespace-only changes.

a2idata/a2i-bi-sample-data.csv

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
timestamp,doc_id,condition_category,condition_setting,field_name,field_value,human_loop_name,reviewer,process_method
2+
2022/07/01,9.34932E+13,LengthCheck,^[0-9a-zA-Z]{16}$,dln,9.34932E+13,custom-loop-a8e97e82-2b71-43cc-9b9f-9f3cc1f17bc5,Reviewer 1,manu
3+
2022/07/01,93493188018523,Confidence,99,d.employer_id,98.06,custom-loop-a8e97e82-2b71-43cc-9b9f-9f3cc1f17bc5,Reviewer 1,manu
4+
2022/07/02,93493188018524,Required,,dln,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1d2,Reviewer 2,manu
5+
2022/07/02,93493188018525,LengthCheck,,e.phone_number,123,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1w3,Reviewer 1,manu
6+
2022/07/02,93493188018525,Required,,dln,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1w3,Reviewer 1,manu
7+
2022/07/03,93493188018526,Required,,dln,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a13d,Reviewer 1,manu
8+
2022/07/03,93493188018527,Confidence,,dln,94,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a134,Reviewer 3,manu
9+
2022/07/04,93493188018528,LengthCheck,,omb_no,,custom-loop-8703c751-4a83-4cd8-b19e-81ea8d09a1xs,Reviewer 3,manu
10+
2022/07/01,93493188018529,,,,,,,auto
11+
2022/07/01,93493188018530,,,,,,,auto
12+
2022/07/01,93493188018531,,,,,,,auto
13+
2022/07/01,93493188018532,,,,,,,auto
14+
2022/07/01,93493188018533,,,,,,,auto
15+
2022/07/02,93493188018534,,,,,,,auto
16+
2022/07/02,93493188018535,,,,,,,auto
17+
2022/07/02,93493188018536,,,,,,,auto
18+
2022/07/02,93493188018537,,,,,,,auto
19+
2022/07/02,93493188018538,,,,,,,auto
20+
2022/07/02,93493188018539,,,,,,,auto
21+
2022/07/03,93493188018540,,,,,,,auto
22+
2022/07/03,93493188018541,,,,,,,auto
23+
2022/07/03,93493188018542,,,,,,,auto
24+
2022/07/03,93493188018543,,,,,,,auto
25+
2022/07/03,93493188018544,,,,,,,auto
26+
2022/07/03,93493188018545,,,,,,,auto
27+
2022/07/03,93493188018546,,,,,,,auto
28+
2022/07/03,93493188018546,,,,,,,auto
29+
2022/07/03,93493188018547,,,,,,,auto
30+
2022/07/03,93493188018548,,,,,,,auto
31+
2022/07/03,93493188018549,,,,,,,auto
32+
2022/07/04,93493188018550,,,,,,,auto
33+
2022/07/04,93493188018551,,,,,,,auto
34+
2022/07/04,93493188018552,,,,,,,auto
35+
2022/07/04,93493188018553,,,,,,,auto
36+
2022/07/04,93493188018554,,,,,,,auto

a2idata/a2i-custom-ui.html

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
2+
3+
<link rel="stylesheet" href="https://s3.amazonaws.com/smgtannotation/web/static/css/1.3fc3007b.chunk.css">
4+
<link rel="stylesheet" href="https://s3.amazonaws.com/smgtannotation/web/static/css/main.9504782e.chunk.css">
5+
<link href="/static/css/1.fe2e351b.chunk.css" rel="stylesheet">
6+
<link href="/static/css/main.2b80d815.chunk.css" rel="stylesheet">
7+
<style>
8+
.wrapper {
9+
position:relative;
10+
display:block; /* <= shrinks container to image size */
11+
overflow-y: scroll;
12+
max-height:1000px;
13+
background-color: #e9ecec;
14+
padding: 30px;
15+
border:red 10px;
16+
}
17+
.img-overlay-wrap {
18+
position: relative;
19+
display: inline-block; /* <= shrinks container to image size */
20+
transition: transform 150ms ease-in-out;
21+
overflow-y: scroll;
22+
background-color: #e9ecec;
23+
}
24+
25+
.img-overlay-wrap img { /* <= optional, for responsiveness */
26+
display: block;
27+
max-width: 800;
28+
height: auto;
29+
box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
30+
}
31+
32+
.img-overlay-wrap svg {
33+
position: absolute;
34+
top: 0;
35+
left: 0;
36+
}
37+
38+
.img-overlay-wrap svg rect {
39+
stroke:#009879;
40+
stroke-width: 2;
41+
fill: #009879;
42+
fill-opacity: 20%;
43+
}
44+
45+
.styled-table input {
46+
width:250px;
47+
height: 100px;
48+
vertical-align: top;
49+
}
50+
.styled-table {
51+
border-collapse: collapse;
52+
margin: 10px 0;
53+
font-size: 0.9em;
54+
font-family: sans-serif;
55+
width:100%;
56+
box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
57+
}
58+
.styled-table thead tr {
59+
background-color: #009879;
60+
color: #ffffff;
61+
text-align: left;
62+
}
63+
.styled-table th,
64+
.styled-table td {
65+
padding: 12px 15px;
66+
vertical-align: top;
67+
}
68+
.styled-table tbody tr {
69+
border-bottom: 1px solid #dddddd;
70+
}
71+
72+
.styled-table tbody tr:nth-of-type(even) {
73+
background-color: #f3f3f3;
74+
}
75+
76+
.styled-table tbody tr:last-of-type {
77+
border-bottom: 2px solid #009879;
78+
}
79+
.styled-table tbody tr.active-row {
80+
font-weight: bold;
81+
color: #009879;
82+
}
83+
</style>
84+
<script>
85+
function condition_over(idx) {
86+
document.getElementById("rectm_" + idx).style = "stroke-width:2px; fill: transparent; stroke: #9e4064; fill: #c5a7be;";
87+
document.getElementById("tr_" + idx).class = "active-row"
88+
}
89+
function condition_out(idx) {
90+
document.getElementById("rectm_" + idx).style = "stroke-width: 2;fill: #009879; fill-opacity: 20%;";
91+
document.getElementById("tr_" + idx).class = ""
92+
}
93+
</script>
94+
<div id='document-text' style="display: none;">
95+
{{ task.input.text }}
96+
</div>
97+
<div id='document-image' style="display: none;">
98+
{{ task.input.s3.url | grant_read_access }}
99+
</div>
100+
101+
<table>
102+
<tr>
103+
<td style="vertical-align: top;">
104+
<div class="wrapper">
105+
<div class="img-overlay-wrap">
106+
<img src="{{ task.input.s3.url | grant_read_access }}">
107+
<svg viewBox="0 0 {{task.input.s3.image_width}} {{task.input.s3.image_height}}">
108+
{% for b in task.input.Results.ConditionMissed %}
109+
{% if b.block != null %}
110+
<rect id="rectm_{{b.index}}" width="{{ b.block.Geometry.BoundingBox.Width | times: task.input.s3.image_width }}" height="{{ b.block.Geometry.BoundingBox.Height | times: task.input.s3.image_height }}" x="{{ b.block.Geometry.BoundingBox.Left | times: task.input.s3.image_width }}" y="{{ b.block.Geometry.BoundingBox.Top | times: task.input.s3.image_height }}"></rect>
111+
{% endif %}
112+
{% endfor %}
113+
</svg>
114+
</div>
115+
</div>
116+
</td>
117+
<td>&nbsp;&nbsp;&nbsp;</td>
118+
<td style="vertical-align: top; padding: 20px;">
119+
<crowd-form>
120+
<div>
121+
<h3>Instructions</h3>
122+
<p>Please review the extracted result, and make corrections where appropriate. </p>
123+
</div>
124+
<br>
125+
<h3> Missed Conditions </h3>
126+
<table class="styled-table">
127+
<thead>
128+
<tr>
129+
<th style="width:250px">DESCRIPTION</th>
130+
<th>ACTUAL VALUE</th>
131+
<th>YOUR VALUE</th>
132+
<th>CHANGE REASON</th>
133+
</tr>
134+
</thead>
135+
<tbody>
136+
{% for r in task.input.Results.ConditionMissed %}
137+
138+
<tr id="tr_{{r.index}}" onmouseover="javascript:condition_over( {{r.index}} )" onmouseout="javascript:condition_out({{r.index}})">
139+
<td title="Field name: {{r.field_name}} ({{ r.condition_category }})">{{ r.message }}</td>
140+
<td>{{ r.field_value }}</td>
141+
<td>
142+
<p>
143+
<input type="text" name="True Value {{r.index}}" placeholder="Enter your value" />
144+
</p>
145+
</td>
146+
<td>
147+
<p>
148+
<input type="text" name="Change Reason {{r.index}}" placeholder="Explain why you changed the value" />
149+
</p>
150+
</td>
151+
</tr>
152+
153+
{% endfor %}
154+
</tbody>
155+
</table>
156+
</crowd-form>
157+
</td>
158+
</tr>
159+
</table>

a2idata/condition.py

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
from enum import Enum
2+
import re
3+
4+
class Condition:
5+
_data = None
6+
_conditions = None
7+
_result = None
8+
9+
def __init__(self, data, conditions):
10+
self._data = data
11+
self._conditions = conditions
12+
13+
def check(self, field_name, obj):
14+
r,s = [],[]
15+
for c in self._conditions:
16+
# Matching field_name or field_name_regex
17+
condition_setting = c.get("condition_setting")
18+
if c["field_name"] == field_name \
19+
or (c.get("field_name") is None and c.get("field_name_regex") is not None and re.search(c.get("field_name_regex"), field_name)):
20+
field_value, block = None, None
21+
if obj is not None:
22+
field_value = obj.get("value")
23+
block = obj.get("block")
24+
confidence = obj.get("confidence")
25+
26+
if c["condition_type"] == "Required" \
27+
and (obj is None or field_value is None or len(str(field_value)) == 0):
28+
r.append({
29+
"message": f"The required field [{field_name}] is missing.",
30+
"field_name": field_name,
31+
"field_value": field_value,
32+
"condition_type": str(c["condition_type"]),
33+
"condition_setting": condition_setting,
34+
"condition_category":c["condition_category"],
35+
"block": block
36+
})
37+
elif c["condition_type"] == "ConfidenceThreshold" \
38+
and c["condition_setting"] is not None and float(confidence) < float(c["condition_setting"]):
39+
r.append({
40+
"message": f"The field [{field_name}] confidence score {confidence} is lower than the threshold {c['condition_setting']}",
41+
"field_name": field_name,
42+
"field_value": field_value,
43+
"condition_type": str(c["condition_type"]),
44+
"condition_setting": condition_setting,
45+
"condition_category":c["condition_category"],
46+
"block": block
47+
})
48+
elif field_value is not None and c["condition_type"] == "ValueRegex" and condition_setting is not None \
49+
and re.search(condition_setting, str(field_value)) is None:
50+
r.append({
51+
"message": f"{c['description']}",
52+
"field_name": field_name,
53+
"field_value": field_value,
54+
"condition_type": str(c["condition_type"]),
55+
"condition_setting": condition_setting,
56+
"condition_category":c["condition_category"],
57+
"block": block
58+
})
59+
60+
# field has condition defined and sastified
61+
s.append(
62+
{
63+
"message": f"{c['description']}",
64+
"field_name": field_name,
65+
"field_value": field_value,
66+
"condition_type": str(c["condition_type"]),
67+
"condition_setting": condition_setting,
68+
"condition_category":c["condition_category"],
69+
"block": block
70+
})
71+
72+
return r, s
73+
74+
def check_all(self):
75+
if self._data is None or self._conditions is None:
76+
return None
77+
78+
broken_conditions = []
79+
satisfied_conditions = []
80+
for key, obj in self._data.items():
81+
value = None
82+
if obj is not None:
83+
value = obj.get("value")
84+
85+
if value is not None and type(value)==str:
86+
value = value.replace(' ','')
87+
88+
r, s = self.check(key, obj)
89+
if r and len(r) > 0:
90+
broken_conditions += r
91+
if s and len(s) > 0:
92+
satisfied_conditions += s
93+
94+
95+
# apply index
96+
idx = 0
97+
for r in broken_conditions:
98+
idx += 1
99+
r["index"] = idx
100+
return broken_conditions, satisfied_conditions

dist/idp-deploy.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,10 +106,18 @@ Resources:
106106
- textract:GetDocumentTextDetection
107107
- textract:GetDocumentAnalysis
108108
- textract:AnalyzeDocument
109+
- textract:AnalyzeID
110+
- textract:AnalyzeExpense
109111
- textract:DetectDocumentText
110112
- textract:StartDocumentAnalysis
111113
- textract:StartDocumentTextDetection
112114
- comprehend:DetectEntities
115+
- comprehend:DetectPiiEntities
116+
- comprehend:ContainsPiiEntities
117+
- comprehend:DescribePiiEntitiesDetectionJob
118+
- comprehend:ListPiiEntitiesDetectionJobs
119+
- comprehend:StartPiiEntitiesDetectionJob
120+
- comprehend:StopPiiEntitiesDetectionJob
113121
- comprehend:StartEntitiesDetectionJob
114122
- comprehend:ClassifyDocument
115123
- comprehend:DescribeDocumentClassificationJob

images/.DS_Store

-6 KB
Binary file not shown.

0 commit comments

Comments
 (0)