Running Unitex CasSys cascades with Python

Overview

This repository provides Python code for the execution of Unitex CasSys cascades for annotating texts.

The Unitex/Gramlab program should be installed: https://unitexgramlab.org/

The code available in this repository is working with a Unitex CasSys pipeline composed of two cascades of transducers (i.e., graphs):

analysis.csc
synthesis.csc

Analysis cascade

The analysis cascade (analysis.csc) is the core of the annotation process, it executes a sequence of transducers which annotate elements in a specific order. The annotations added by transducers can be reused as patterns in the next transducers of the cascade. The POS tagset used in patterns shoud be adapted depending the POS tagger used in input.

Synthesis cascade

The synthesis cascade (synthesis.csc) transforms the output of the first cascade (XML-CasSys) into a valid XML markup language following the annotation tagset defined in the analysis cascade. By default, all tokens are embedded in a <w> xml element with their pos and lemma in attributes.

Input

The input should be a POS tagged txt-file where each token is encapsulated within brackets: {token,lemma,pos}

Example:

The following sentence:

Mark your calendars for an exciting event at 35 Charles Street, London, on the 3rd of June 2023.

should be tagged as follows:

{Mark,mark.VERB} {your,your.PRON} {calendars,calendar.NOUN} {for,for.ADP} {an,an.DET} {exciting,exciting.ADJ} {event,event.NOUN} {at,at.ADP} {35,35.NUM} {Charles,Charles.PROPN} {Street,Street.PROPN} {,,.PUNCT} {London,London.PROPN} {,,.PUNCT} {on,on.ADP} {the,the.DET} {3rd,3rd.NOUN} {of,of.ADP} {June,June.PROPN} {2023,2023.NUM} {.,.PUNCT}

You can use the functions from scripts/posTagger_to_unitex.py to produce this format depending on the POS tagger you use. At the moment, functions are given for converting spaCy doc object. Functions for converting Treetagger and Stanza formats will be available soon.

Directory structure

The Unitex path folders should have the following structure:

{unitex-directory}/{language}/CasSys/{project-name}/analysis.csc
{unitex-directory}/{language}/CasSys/{project-name}/synthesis.csc
{unitex-directory}/{language}/Graphs/{project-name}/analysis/
{unitex-directory}/{language}/Graphs/{project-name}/synthesis/

{unitex-directory} refers to your Unitex personal working directory. It might be different from the Unitex installation directory and is usually installed in your $HOME directory.

The synthesis cascade and graphs can be downloaded here: Unitex-CasSys

Demonstration

An example of cascades and graphs can be downloaded here: Unitex-CasSys.

Notebook: usage-demo.ipynb

Configure a conda environment

Clone this repository

git clone https://github.com/ludovicmoncla/cascades-transducers-unitex.git

Configure the environment with all dependencies

Create a new environment: unitex-py39

conda create -n unitex-py39 python=3.9

Activate the environment

conda activate unitex-py39

Install dependencies with pip

pip install -r requirements.txt

Acknowledgement

The authors are grateful to the ASLAN project (ANR-10-LABX-0081) of the Université de Lyon, for its financial support within the French program "Investments for the Future" operated by the National Research Agency (ANR).

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
Unitex-CasSys		Unitex-CasSys
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
usage-demo.ipynb		usage-demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Running Unitex CasSys cascades with Python

Overview

Analysis cascade

Synthesis cascade

Input

Directory structure

Demonstration

Configure a conda environment

Clone this repository

Configure the environment with all dependencies

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ludovicmoncla/python-unitex-cassys

Folders and files

Latest commit

History

Repository files navigation

Running Unitex CasSys cascades with Python

Overview

Analysis cascade

Synthesis cascade

Input

Directory structure

Demonstration

Configure a conda environment

Clone this repository

Configure the environment with all dependencies

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages