Skip to content

An open-source collaborative web-based application for multi-task lexical normalisation

License

Notifications You must be signed in to change notification settings

nlp-tlp/lexiclean

Repository files navigation

LexiClean: An annotation tool for rapid multi-task lexical normalisation

Live Demo Documentation Video Demo License Docker pre-commit

Important

LexiClean v2 is currently being released. While it is currently testable, an updated video walk through is still pending and will be re-released ASAP. In the meantime, please consult the original systems demonstration or documentation for more information.

LexiClean is a rapid annotation tool for acquiring parallel corpora for lexical normalisation built with MongoDB, React and FastAPI. <<<<<<< HEAD

📌 Quick Links:

Annotation Interface

📦 Dependencies

To run LexiClean using Docker, you'll need:

🚀 Quick Start with Docker Compose

  1. Clone the repository:
git clone https://github.com/nlp-tlp/lexiclean.git
cd lexiclean
  1. Start the application:
docker compose up --build

Available Services

Service URL Description
Frontend http://localhost:3000 User interface
Backend API http://localhost:8000 API server
Documentation http://localhost:4000 User documentation
MongoDB localhost:27018 Database

🏗️ Architecture

The application consists of four main services:

  • Frontend (React): User interface running on port 3000
  • Backend (FastAPI): API server running on port 8000
  • MongoDB: Database running on port 27018
  • Documentation (React, Docasaurus): Service running on port 4000

⚙️ Environment Variables

Backend (FastAPI)

MONGODB__URI=mongodb://root:example@mongodb:27017/lexiclean?authSource=admin
MONGODB__DB_NAME=lexiclean
AUTH__SECRET_KEY=secret
AUTH__ALGORITHM=HS256
AUTH__ACCESS_TOKEN_EXPIRE_MINUTES=360
API__PREFIX=/api

Frontend

VITE_API_URL=http://localhost:8000/api
VITE_DOCS_URL=http://localhost:4000
VITE_GITHUB_URL=https://github.com/nlp-tlp/lexiclean
NODE_ENV=development

MongoDB

MONGO_INITDB_ROOT_USERNAME=root
MONGO_INITDB_ROOT_PASSWORD=example

🛠️ Manual Installation

If you prefer to run the application without Docker, follow these steps:

  1. Install MongoDB (v4.4.6 or later):
  1. Verify MongoDB is running:
service mongod status
  1. Create and activate a Python virtual environment:
# Create a virtual environment in the server directory
cd server
python -m venv venv

# Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
  1. Install dependencies:
# Install backend dependencies
cd server
pip install -r requirements.txt

# Install frontend dependencies
cd ../client
npm install

# Optional: Install documentation dependencies
cd ../docs
npm install
  1. Set up environment variables using the .env.example files as examples

  2. Start the services manually:

# Start backend
cd server
uvicorn main:app --reload

# Start frontend (in a new terminal)
cd client
npm run dev

# Optiona: Start the documentation server (in a new terminal)
cd docs
npm run start

Development

To run the application in development mode with hot-reloading enabled:

# Start all services in development mode
docker compose -f docker-compose.dev.yml up

# To run in detached mode (background)
docker compose -f docker-compose.dev.yml up -d

# To rebuild containers after dependency changes
docker compose -f docker-compose.dev.yml up --build

The development configuration (docker-compose.dev.yml) enables:

  • Hot-reloading for both frontend and backend changes
  • Volume mounting of local files into containers
  • Development server configurations
  • Exposed ports for debugging

Your changes to the following directories will automatically reflect in the running containers:

  • ./client - Frontend React application
  • ./server - Backend FastAPI application
  • ./docs - Documentation site

To stop the development servers:

# If running in foreground, use Ctrl+C
# If running in detached mode:
docker compose -f docker-compose.dev.yml down

📝 Attribution

Please cite our [conference paper] if you find it useful in your research:

@inproceedings{bikaun2021lexiclean,
  title={LexiClean: An annotation tool for rapid multi-task lexical normalisation},
  author={Bikaun, Tyler and French, Tim and Hodkiewicz, Melinda and Stewart, Michael and Liu, Wei},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={212--219},
  year={2021}
}

📫 Feedback

Please email any feedback or questions to Tyler Bikaun ([email protected])

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Bug Reports & Feature Requests

  • Open an issue using the appropriate template
  • Provide clear description and steps to reproduce (for bugs)
  • Include relevant environment details or examples

Pull Requests

  1. Fork and create a branch
  2. Make changes following our code style
  3. Test your changes
  4. Submit a PR with a clear description

About

An open-source collaborative web-based application for multi-task lexical normalisation

Topics

Resources

License

Stars

Watchers

Forks