Contextual19 is a markup standard to save rules for transformation based tagging (the most simple example described in this NLTK book). It's just a lighter way to save them and make human-readable.
Every rule consists of if and then part where you can specify conditions whether token's tagging must be changed.
Here's the syntax for rules:
if
...
then
...
if part consists of selectors (just like in CSS markup) and their properties:
if
next
gender is fem
tense is past
previous
pos is noun
then
...
You don't need to use brackets or quotes, but no field can contain spaces. As comparison operators you can use is or is not:
gender is fem
voice is not active
Using selectors you can get oncoming and preceding context (next and previous selectors) or reach second next/previous or even n-th next/previous tokens using this technique:
if
first next
# Rules that will be applied to first next token
first previous
# Rules for first previous token
second next/previous
...
third next/previous
...
fourth next/previous
...
fifth next/previous
...
[0-9]th next/previous
# Using this selector you can reach n-next and n-preceding token.
# Yes, selectors like '1th' and '22th' will be absolutely legally,
# furthermore, endings like 'st', 'nd' and 'rd' is out of standard.
then
# Rules to be applied
In order to reach the very beginning or the end of the sentence, you can use end and beginning selectors.
In then part you should specify new tagging properties for selected token. Use becomes as assigning operator:
if
...
then
case becomes accusative
voice becomes passive
Your code can be recognized with the following regex:
(?(DEFINE)
(?<absolute_token> (token|beginning|end) )
(?<num_position> (first|second|third|fourth|fifth|\d+th) )
(?<sel_name> (previous|next) )
(?<sel_full>
\b(
(?&absolute_token) |
( (?&num_position) \s )? (?&sel_name)
)\b
)
(?<comparison> (is ( \s not )? ) )
(?<property_name> \w+(?=\s(?:is|becomes)) )
(?<property_value> (\w+) )
(?<property> ( (?&property_name) \s (?&comparison) \s (?&property_value) ) )
(?<assign> (becomes) )
(?<assigning> ( (?&property_name) \s (?&assign) \s (?&property_value) ) )
(?<if> (if) )
(?<then> (then) )
(?<properties_block> ( \t{2} (?&property)\n )+ )
(?<assignings_block> ( \t (?&assigning)\n )+ )
(?<selector_block> \t (?&sel_full) \n (?&properties_block) )
(?<selectors_block> (?&selector_block)+ )
(?<if_block> ( (?&if) \n (?&selectors_block) ) )
(?<then_block> ( (?&then) \n (?&assignings_block) ) )
(?<rule> (?&if_block) (?&then_block) )
(?<rules_block> ( (?&rule) \n? )* )
)
^(?&rules_block)$Use it with /gmx flags.
You can also test it on regex101.
This standard can be easily represented in object notation. Here's the example using JSON with comments.
[
{
"if": [
{
"__name": "next",
"__position": 1,
"name": [true, "value"],
"name": [false, "value"]
}
],
"then": {
"name": "value"
}
}
]- Rules must be enclosed into array, even though there's only one rule.
- Every rule should contain
ifandthenproperties. - Selectors and their comparisons must be stored in one dictionary.
- Position of token should be written in
__positionproperty, inside of selector dictionary. E.g.: forsecond previous__namebecomespreviousand__positionwill be2. - Statement
name is valueshould be encoded like[true, "value"]andname is not valuebecomes[false, "value"].
You can also use a tojson.c program in the converters in this repo (go there). Just compile it with GCC compiler and run with the following parameters:
tojson --file rules.ctx19 --output rules.json
Here file is a path to your .ctx19 file and output is the path to file to parse in. If it's not exist, it'll be created.
Contextual19 can be converted to YAML as well. It'll have the same structure as the JSON document. To perform it automatically, compile toyaml.c (go there) with GCC compiler and run with the following parameters:
toyaml --file rules.ctx19 --output rules.yml
Here parameters is the same as in paragraph above.