Skip to content

Commit 3832479

Browse files
Update tutorial (#133)
1 parent ad04189 commit 3832479

20 files changed

+310
-676
lines changed

images/tutorial_amount.png

224 KB
Loading

images/tutorial_amount_run.png

253 KB
Loading

images/tutorial_division.png

206 KB
Loading

images/tutorial_division_run.png

301 KB
Loading

images/tutorial_metric.png

179 KB
Loading

images/tutorial_preposition.png

194 KB
Loading

images/tutorial_revbydiv.png

183 KB
Loading

images/tutorial_revbydiv_run.png

662 KB
Loading
229 KB
Loading
699 KB
Loading

images/tutorial_revofdiv.png

128 KB
Loading

images/tutorial_revofdiv1.png

146 KB
Loading

images/tutorial_revofdiv1_run.png

191 KB
Loading

images/tutorial_revofdiv2.png

171 KB
Loading

images/tutorial_revofdiv2_run.png

279 KB
Loading

images/tutorial_revofdiv_rename.png

253 KB
Loading

images/tutorial_revofdiv_run.png

380 KB
Loading

sample-flows/tutorial-flow.json

+278-96
Large diffs are not rendered by default.

sample-flows/tutorial_flow.json

-574
This file was deleted.

tutorial.md

+32-6
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Under **Extractors**, drag and drop **Input Documents** on the canvas. Configure
2929
## Create a dictionary of division names
3030

3131
Under **Extractors**, drag **Dictionary** on the canvas. Connect its input to the output of **Input Documents**.
32-
Rename the node to `Division` and enter the terms: `Software`, `Global Business Services`, and `Global Technology Services`. Click **Save**.
32+
Rename the node to `Division` and enter the terms: `Software`, `Hardware`, `Global Business Services`, and `Global Technology Services`. Click **Save**.
3333

3434
![Creating a dictionary of division names](images/tutorial_division.png)
3535

@@ -41,11 +41,11 @@ Select the `Division` node, and click **Run**.
4141

4242
## Create a second dictionary of metric names
4343

44-
Similar to the prior step, create a dictionary called `Metric` with a single term `revenue`. Select **Ignore case** and **Lemma Match**. Don't forget to click **Save**.
44+
Similar to the prior step, create a dictionary called `Metric` with a single term `revenue`. Select **Lemma Match**. Don't forget to click **Save**.
4545

4646
![Creating a dictionary of metrics](images/tutorial_metric.png)
4747

48-
## Create a second dictionary of prepositions
48+
## Create a third dictionary of prepositions
4949

5050
Create a dictionary `Preposition` with terms `for`, and `from`. Select **Ignore case**. Click **Save**.
5151

@@ -74,15 +74,23 @@ Click **Save** and **Run**.
7474
## Create a union
7575

7676
Under **Generation**, drag **Union** to the canvas. Connect its inputs to the outputs of `RevenueOfDivision1` and `RevenueOfDivision2`. Rename the union to `RevenueOfDivision`. Click **Close** and **Run**.
77-
You will see 6 results: one result from `RevenueOfDivision1`, and five results `RevenueOfDivision2`.
7877

7978
![Create a union](images/tutorial_revofdiv.png)
8079

80+
You will see an error _"Union node requires attribute aligned"_ because the two attributes of the two input nodes have different names. You must make the input nodes union compatible by renaming the attributes.
81+
82+
For this, open the node `RevenueOfDivision1` and rename the first attribute `RevenueOfDivision` and click **Save**.
83+
Do the same for the node `RevenueOfDivision2`: rename the first attribute `RevenueOfDivision` and **Save**.
84+
85+
![Renaming an attribute](images/tutorial_revofdiv_rename.png)
86+
87+
Now select the Union node `RevenueOfDivision` and run it. You will see 6 results: one result from `RevenueOfDivision1`, and five results `RevenueOfDivision2`.
88+
8189
![Running a union](images/tutorial_revofdiv_run.png)
8290

8391
## Create a regular expression to capture currency amounts
8492

85-
Under **Extractors**, drag **ReGex** to the canvas. Name it `Amount` and specify the regular expression as `\d+(\.\d+)?\s+billion`.
93+
Under **Extractors**, drag **ReGex** to the canvas. Name it `Amount` and specify the regular expression as `\$\d+(\.\d+)?\s+billion`.
8694
Click **Save**, then **Run**.
8795
The regular expression captures mentions of currency amounts.
8896

@@ -92,10 +100,28 @@ The regular expression captures mentions of currency amounts.
92100

93101
## Create a sequence to combine the division, metric and amount
94102

95-
Create a sequence called `RevenueByDivision` and specify the pattern as `(<RevenueOfDivision.RevenueOfDivision>)<Token>{0,35}(<Amount.Amount>)`. Click **Save**.
103+
Create a sequence called `RevenueByDivision` and specify the pattern as `(<RevenueOfDivision.RevenueOfDivision>)<Token>{0,35}(<Amount.Amount>)`. Ensure the name of the first attribute is also `RevenueByDivision`, renaming it if necessary. Click **Save** and **Run**.
96104

97105
![Combining division, metric and amount](images/tutorial_revbydiv.png)
98106

107+
![Running a larger sequence to find the actual revenue amount of a division](images/tutorial_revbydiv_run.png)
108+
109+
## Remove overlapping results with Consolidate
110+
111+
In the result, we notice a few overlapping results: the second result `revenues from Global Technology Services ... $8.6 billion` overlaps with the third results `revenues from Global Technology Services ... $8.6 billion ... $4.2 billion`.
112+
The third result is incorrect, as `$4.2 billion` is the revenue of a different division.
113+
114+
We can remove such overlaps using the Consolidate node.
115+
Under **Refinement**, drag **Consolidate** on the canvas and connect its input with `RevenueByDivision`.
116+
Rename it to `RevenueConsolidated` and configure it using the `NotContainedWithin` policy, as shown below. Click **Save**.
117+
118+
![Remove overlaps with Consolidate](images/tutorial_revenueconsolidated.png)
119+
120+
Run `RevenueConsolidated`. The incorrect overlapping results have been removed.
121+
122+
![Running a Consolidate node](images/tutorial_revenueconsolidated_run.png)
123+
124+
99125

100126

101127

0 commit comments

Comments
 (0)