You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorial.md
+32-6
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ Under **Extractors**, drag and drop **Input Documents** on the canvas. Configure
29
29
## Create a dictionary of division names
30
30
31
31
Under **Extractors**, drag **Dictionary** on the canvas. Connect its input to the output of **Input Documents**.
32
-
Rename the node to `Division` and enter the terms: `Software`, `Global Business Services`, and `Global Technology Services`. Click **Save**.
32
+
Rename the node to `Division` and enter the terms: `Software`, `Hardware`, `Global Business Services`, and `Global Technology Services`. Click **Save**.
33
33
34
34

35
35
@@ -41,11 +41,11 @@ Select the `Division` node, and click **Run**.
41
41
42
42
## Create a second dictionary of metric names
43
43
44
-
Similar to the prior step, create a dictionary called `Metric` with a single term `revenue`. Select **Ignore case** and **Lemma Match**. Don't forget to click **Save**.
44
+
Similar to the prior step, create a dictionary called `Metric` with a single term `revenue`. Select **Lemma Match**. Don't forget to click **Save**.
45
45
46
46

47
47
48
-
## Create a second dictionary of prepositions
48
+
## Create a third dictionary of prepositions
49
49
50
50
Create a dictionary `Preposition` with terms `for`, and `from`. Select **Ignore case**. Click **Save**.
51
51
@@ -74,15 +74,23 @@ Click **Save** and **Run**.
74
74
## Create a union
75
75
76
76
Under **Generation**, drag **Union** to the canvas. Connect its inputs to the outputs of `RevenueOfDivision1` and `RevenueOfDivision2`. Rename the union to `RevenueOfDivision`. Click **Close** and **Run**.
77
-
You will see 6 results: one result from `RevenueOfDivision1`, and five results `RevenueOfDivision2`.
78
77
79
78

80
79
80
+
You will see an error _"Union node requires attribute aligned"_ because the two attributes of the two input nodes have different names. You must make the input nodes union compatible by renaming the attributes.
81
+
82
+
For this, open the node `RevenueOfDivision1` and rename the first attribute `RevenueOfDivision` and click **Save**.
83
+
Do the same for the node `RevenueOfDivision2`: rename the first attribute `RevenueOfDivision` and **Save**.
84
+
85
+

86
+
87
+
Now select the Union node `RevenueOfDivision` and run it. You will see 6 results: one result from `RevenueOfDivision1`, and five results `RevenueOfDivision2`.
88
+
81
89

82
90
83
91
## Create a regular expression to capture currency amounts
84
92
85
-
Under **Extractors**, drag **ReGex** to the canvas. Name it `Amount` and specify the regular expression as `\d+(\.\d+)?\s+billion`.
93
+
Under **Extractors**, drag **ReGex** to the canvas. Name it `Amount` and specify the regular expression as `\$\d+(\.\d+)?\s+billion`.
86
94
Click **Save**, then **Run**.
87
95
The regular expression captures mentions of currency amounts.
88
96
@@ -92,10 +100,28 @@ The regular expression captures mentions of currency amounts.
92
100
93
101
## Create a sequence to combine the division, metric and amount
94
102
95
-
Create a sequence called `RevenueByDivision` and specify the pattern as `(<RevenueOfDivision.RevenueOfDivision>)<Token>{0,35}(<Amount.Amount>)`. Click **Save**.
103
+
Create a sequence called `RevenueByDivision` and specify the pattern as `(<RevenueOfDivision.RevenueOfDivision>)<Token>{0,35}(<Amount.Amount>)`. Ensure the name of the first attribute is also `RevenueByDivision`, renaming it if necessary. Click **Save** and **Run**.
96
104
97
105

98
106
107
+

108
+
109
+
## Remove overlapping results with Consolidate
110
+
111
+
In the result, we notice a few overlapping results: the second result `revenues from Global Technology Services ... $8.6 billion` overlaps with the third results `revenues from Global Technology Services ... $8.6 billion ... $4.2 billion`.
112
+
The third result is incorrect, as `$4.2 billion` is the revenue of a different division.
113
+
114
+
We can remove such overlaps using the Consolidate node.
115
+
Under **Refinement**, drag **Consolidate** on the canvas and connect its input with `RevenueByDivision`.
116
+
Rename it to `RevenueConsolidated` and configure it using the `NotContainedWithin` policy, as shown below. Click **Save**.
117
+
118
+

119
+
120
+
Run `RevenueConsolidated`. The incorrect overlapping results have been removed.
121
+
122
+

0 commit comments