AnnoABSA: A Web-Based Annotation Tool for Aspect-Based Sentiment Analysis with Retrieval-Augmented Suggestions
Accepted at LREC 2026 (15th edition) · Palma, Mallorca (Spain)
Nils Constantin Hellwig¹✉ · Jakob Fehle¹ · Udo Kruschwitz² · Christian Wolff¹
¹Media Informatics Group, University of Regensburg, Germany
²Information Science Group, University of Regensburg, Germany
✉ Correspondence to: nils-constantin.hellwig@ur.de
{nils-constantin.hellwig, jakob.fehle, udo.kruschwitz, christian.wolff}@ur.de
Abstract: We introduce AnnoABSA, the first web-based annotation tool to support the full spectrum of Aspect-Based Sentiment Analysis (ABSA) tasks. The tool is highly customizable, enabling flexible configuration of sentiment elements and task-specific requirements. Alongside manual annotation, AnnoABSA provides optional Large Language Model (LLM)-based retrieval-augmented generation (RAG) suggestions that offer context-aware assistance while keeping the human annotator in control. To improve prediction quality over time, the system retrieves the ten most similar examples that are already annotated and adds them as few-shot examples in the prompt, ensuring that suggestions become increasingly accurate as the annotation process progresses. Released as open-source software under the MIT License, AnnoABSA is freely accessible and easily extendable for research and practical applications.
tbaThis tool helps you annotate text data for Aspect-Based Sentiment Analysis (ABSA) through a modern web interface built with React, TypeScript, and Vite. You can select text phrases by clicking, assign sentiment labels (positive, negative, neutral) to specific aspects, and categorize them into predefined or custom categories. The tool supports configuring any number of sentiment elements - choose from the standard aspect_term, aspect_category, sentiment_polarity, and opinion_term. It handles both CSV files (UTF-8 encoded with text,label,translation structure) and JSON files (flexible object structure), supports multilingual data with optional translation display, and provides progress tracking through navigation, session IDs, and real-time annotation status.
- Modern TypeScript Frontend - Built with React, TypeScript, and Vite for fast development and type safety
- Dark Mode Support - Toggle between light and dark themes with persistent localStorage settings
- Intuitive UI - Clean, modern interface for efficient annotation with smooth transitions
- Smart Phrase Selection - Click-to-select text spans with visual feedback
- Visual Phrase Highlighting - Annotated phrases are highlighted directly in the text with unique colors
- Color-Coded Annotations - Each annotation gets a unique color with visual indicators in the annotation list
- Intelligent Color Mixing - Overlapping phrases show mixed colors to visualize annotation overlaps
- Automatic Phrase Cleaning - Removes punctuation from start/end of selected phrases (configurable)
- Click-on-Token Selection - Smart token-based text selection that snaps to word boundaries (configurable)
- Automatic Position Filling - Automatically adds missing character positions for existing phrases on startup
- Combined Annotation Popup - When both aspect and opinion terms are configured, annotate both in a single, unified dialog
- Separate Text Selection - Independent phrase selection for aspect terms and opinion terms
- Progress Tracking - Real-time annotation progress and navigation
- Flexible Configuration - Customizable sentiment elements and categories
- Translation Support - Optional translations displayed below original text
- Session Management - Optional session IDs for tracking annotation sessions
- Timing Analytics - Optional timing data collection for annotation behavior analysis
- CLI Tool - Command-line configuration for different domains
- Per-example aspect categories — support for
aspect_category_listin/data/{index}to render sample-specific categories (falls back to defaults). - AI-Powered Predictions - Optional AI assistance with
--ai-suggestionsflag for automated annotation suggestions based on your previous annotations - Manual AI Control - Use
--disable-ai-automatic-predictionflag to disable automatic AI triggering while keeping manual AI button functionality - Annotation Guidelines - Display PDF guidelines with
--annotation-guideline <file.pdf>for consistent annotation standards - LLM Integration - Uses Gemma 3:4B model (local) or OpenAI GPT-4o (cloud) for intelligent aspect and sentiment prediction
- Smart Similarity Matching - Uses BM25 for finding relevant examples with keyword-based retrieval
- Optional timing tracking - Enable with
--store-timeflag to collect annotation performance metrics - Duration measurement - Records time spent on each annotation from viewing to saving
- Change detection - Tracks whether annotations were modified during the session
- Per-example data - Maintains a list of timing entries for each text example, supporting multiple annotation attempts
- JSON storage - Timing data saved as
[{"duration": <seconds>, "change": true/false}, ...]per example - Privacy-focused - Timing collection is disabled by default and must be explicitly enabled
The tool includes optional AI assistance for automated annotation suggestions using Large Language Models (LLMs). You can choose between local processing with Ollama or cloud-based processing with OpenAI:
All processing happens locally on your machine via the Ollama API, ensuring complete data privacy. Enable with the --ai-suggestions CLI flag. Requires Ollama installed and a compatible model (e.g., Gemma 3:4B) downloaded.
Use OpenAI's advanced language models (like GPT-4o) by providing your API key with --openai-key your-api-key. This option provides more sophisticated predictions but sends data to OpenAI's servers.
Example usage:
# Local AI with Ollama
annoabsa data.csv --ai-suggestions --llm-model gemma3:4b
# Cloud AI with OpenAI
annoabsa data.csv --ai-suggestions --openai-key sk-your-api-key --llm-model gpt-4o-2024-08-06By default, when --ai-suggestions is enabled, the AI automatically triggers predictions when:
- Navigating to a new annotation that hasn't been annotated yet
- The current item has no existing annotations
If you prefer manual control, use the --disable-ai-automatic-prediction flag in combination with --ai-suggestions. This keeps the AI button functional for manual triggering while disabling automatic predictions.
For finding relevant examples to provide context to the LLM, the tool uses BM25 (Robertson and Zaragoza 2009), a sparse retrieval function that calculates relevance scores between sentences based on term frequency (TF), inverse document frequency (IDF), and sentence length. This provides efficient keyword-based retrieval that identifies sentences sharing common keywords.
- UI Integration - Click the ✨ AI button next to "Text to annotate" to get suggestions
- Few-Shot Learning - The AI analyzes existing annotations in your dataset
You can display PDF guidelines directly in the annotation interface to ensure consistent annotation standards across annotators. Use the --annotation-guideline <path/to/guidelines.pdf> flag to specify a PDF file containing annotation instructions, examples, or coding guidelines.
- Collapsible Card - Guidelines are displayed in a collapsible card between the text and annotation form
- Embedded PDF Viewer - PDF is displayed directly in the interface using an iframe
- Always Available - Guidelines remain accessible throughout the annotation session
- Standard Compliance - Helps maintain consistent annotation quality and standards
./annoabsa examples/restaurant_reviews.json --annotation-guideline docs/annotation_guidelines.pdf# Install Python dependencies
pip install -r requirements.txt
# Install frontend dependencies
cd frontend && npm install && cd ..# Start with example data
./annoabsa examples/restaurant_reviews.csv
./annoabsa examples/restaurant_reviews.json
# Use configuration file
./annoabsa examples/restaurant_reviews.json --load-config examples/example_config.json
# Enable AI suggestions
./annoabsa examples/restaurant_reviews.json --ai-suggestions# Backend
pip install fastapi uvicorn pandas rank-bm25
uvicorn main:app --reload --port 8000
# Frontend (in new terminal)
cd frontend && npm install && npm run devThe app will open at http://localhost:3000
The examples/ folder contains sample data to get you started:
| File | Format | Description |
|---|---|---|
restaurant_reviews.csv |
CSV | 10 restaurant reviews in CSV format with English text and German translations |
restaurant_reviews.json |
JSON | Same reviews in JSON format with additional metadata |
example_config.json |
JSON | Example configuration file with restaurant domain settings |
The annoabsa CLI tool configures and runs your annotation environment:
# Basic usage
./annoabsa examples/restaurant_reviews.csv
# Load configuration from file
./annoabsa examples/restaurant_reviews.json --load-config examples/example_config.json
# Custom ports and session
./annoabsa examples/restaurant_reviews.json --backend-port 8080 --frontend-port 3001 --session-id "study_2024"Save and reuse configurations with JSON files:
# Save current configuration
./annoabsa examples/restaurant_reviews.csv --elements aspect_term sentiment_polarity --save-config examples/my_config.json
# Load and use saved configuration
./annoabsa examples/restaurant_reviews.csv --load-config examples/example_config.json| Option | Description | Default |
|---|---|---|
--backend |
Start only backend server | - |
--backend-port |
Backend server port | 8000 |
--frontend-port |
Frontend server port | 3000 |
--backend-ip |
Backend server IP address | 127.0.0.1 |
--frontend-ip |
Frontend server IP address | 127.0.0.1 |
--session-id |
Session identifier for annotation tracking | None |
--elements |
Sentiment elements to annotate | aspect_term, aspect_category, sentiment_polarity, opinion_term |
--polarities |
Available sentiment polarities | positive, negative, neutral |
--categories |
Available aspect categories | Restaurant domain (13 categories) |
--implicit-aspect |
Allow implicit aspect terms | True |
--disable-implicit-aspect |
Disable implicit aspect terms | - |
--implicit-opinion |
Allow implicit opinion terms | False |
--disable_implicit_opinion |
Disable implicit opinion terms | True (default) |
--disable_clean_phrases |
Disable automatic punctuation cleaning from phrase start/end | Enabled by default |
--disable-save-positions |
Disable saving phrase positions (at_start, at_end, ot_start, ot_end) for faster processing | Enabled by default |
--disable-click-on-token |
Disable click-on-token feature (precise character clicking instead of token snapping) | Enabled by default |
--auto-positions |
Enable automatic position filling** on startup for existing phrases without positions | Disabled by default |
--store-time |
Store timing data for annotation sessions (duration and change detection) | Disabled by default |
--display-avg-annotation-time |
Display average annotation time in the interface (requires timing data) | Disabled by default |
--ai-suggestions |
Enable AI-powered prediction for automated annotation suggestions using LLM | Disabled by default |
--disable-ai-automatic-prediction |
Disable automatic AI prediction triggering (AI button still works manually) | Disabled by default |
--annotation-guideline |
Path to PDF file containing annotation guidelines to display in the UI | Disabled by default |
--llm-model |
Language model for predictions (e.g., gemma3:4b for Ollama, gpt-4o-2024-08-06 for OpenAI) | gemma-3:4b |
--openai-key |
OpenAI API key for using OpenAI models instead of local LLM | None |
--n-few-shot |
Maximum number of few-shot examples to include in LLM prompts | 10 |
--save-config |
Save config to JSON file | - |
--load-config |
Load config from JSON file | - |
The tool supports both CSV and JSON formats with UTF-8 encoding:
Your CSV file should contain at least these columns:
| Column | Type | Description |
|---|---|---|
text |
string | Text to be annotated |
label |
string | JSON array of annotations (auto-generated) |
translation |
string | Optional: Translation of the text |
Example CSV (with UTF-8 encoding):
text,translation,label
"The food was amazing but service was slow.","Das Essen war fantastisch, aber der Service war langsam.",""
"Schönes Ambiente und günstiger Preis!","Nice atmosphere and affordable price!",""
"El servicio fue excelente 👍","The service was excellent 👍",""Alternative JSON structure for more flexibility:
Example JSON (examples/restaurant_reviews.json):
[
{
"text": "The food was amazing but service was slow.",
"translation": "Das Essen war fantastisch, aber der Service war langsam.",
"label": [
{
"aspect_term": "food",
"aspect_category": "food quality",
"sentiment_polarity": "positive",
"opinion_term": "amazing",
"at_start": 4,
"at_end": 7,
"ot_start": 13,
"ot_end": 19
},
{
"aspect_term": "service",
"aspect_category": "service general",
"sentiment_polarity": "negative",
"opinion_term": "slow",
"at_start": 25,
"at_end": 31,
"ot_start": 37,
"ot_end": 40
}
]
},
{
"text": "Great atmosphere and reasonable prices!",
"translation": "Tolles Ambiente und vernünftige Preise!",
"label": []
},
{
"text": "This sentence has not been annotated yet."
}
]Key States:
- Not annotated: No
labelkey present - No aspects found:
labelis an empty array[] - Aspects found:
labelcontains annotation objects
When timing data collection is enabled with --store-time, the tool adds timing analytics:
{
"text": "The food was amazing but service was slow.",
"label": [...],
"timings": [
{"duration": 15.2, "change": true},
{"duration": 3.8, "change": false},
{"duration": 22.1, "change": true}
]
}Timing Fields:
- duration: Time spent in seconds from viewing the text to saving annotations
- change: Whether the annotation was modified (
true) or left unchanged (false) - Multiple entries: Each annotation session appends a new timing entry, supporting re-annotation analysis
When both timing data collection and average time display are enabled, the interface shows annotation performance metrics:
# Enable timing collection and average time display
./annoabsa examples/restaurant_reviews.csv --store-time --display-avg-annotation-timeThis displays the average annotation time between the dark mode toggle and index input field. The statistic is calculated by:
- Collecting all
durationvalues from all examples that have timing data - Computing the average across all annotation sessions
- Displaying the result as "Ø {time}s per annotation" in the interface
The average time helps researchers understand annotation efficiency and can guide training or process improvements.
When phrase position saving is enabled (default), the tool automatically adds character position information:
| Field | Description |
|---|---|
at_start |
Start character position of aspect term in text |
at_end |
End character position of aspect term in text |
ot_start |
Start character position of opinion term in text |
ot_end |
End character position of opinion term in text |
Position indices are 0-based and inclusive. This data is useful for downstream processing and analysis.
Optional Feature: When enabled with --auto-positions, the tool automatically scans existing annotations and fills missing position data for phrases that have values but no position information. This is useful when:
- Importing existing annotation data from other tools
- Working with datasets that lack position information
- Migrating between different annotation formats
Example: If your data contains annotations like this:
{
"text": "The pasta was excellent",
"label": [
{
"aspect_term": "pasta",
"opinion_term": "excellent"
}
]
}The tool will automatically add position data on startup:
{
"text": "The pasta was excellent",
"label": [
{
"aspect_term": "pasta",
"opinion_term": "excellent",
"at_start": 4,
"at_end": 8,
"ot_start": 14,
"ot_end": 22
}
]
}Usage:
- Default: Auto-position filling is disabled
- Enable: Use
--auto-positionsflag to enable this preprocessing step - Algorithm: Uses first occurrence of each phrase in the text
Important: Position data is only saved when the corresponding term has an actual value (not NULL or empty). This ensures data consistency and prevents storing meaningless position information for implicit aspects/opinions.
To disable position saving entirely, use the --disable-save-positions CLI option.
By default, the tool automatically cleans selected phrases by:
- Trimming whitespace from start and end
- Removing common punctuation marks:
. , ; : ! ? ¡ ¿ " '´ ' ' " " „ « » ( ) [ ] { }` - Adjusting saved positions to match the cleaned phrase
Examples:
"amazing!"→amazing(exclamation mark removed), great,→great(whitespace and commas removed)(excellent)→excellent(parentheses removed)
This ensures consistent annotation quality and removes common annotation errors. To disable phrase cleaning, use the --disable_clean_phrases CLI option.
Important: Both CSV and JSON files must be saved with UTF-8 encoding to support international characters and emojis.
The tool supports optional translations to help annotators understand text in foreign languages:
- CSV: Add a
translationcolumn - JSON: Add a
translationkey to each object
When available, translations are displayed below the original text in a blue-tinted box. This feature is especially useful for multilingual datasets or when annotating text in languages the annotator may not fully understand.
aspect_term- Specific aspect mentioned in textaspect_category- General aspect categorysentiment_polarity- Sentiment towards aspectopinion_term- Opinion expression about aspect
The annotation interface displays fields in this order for optimal workflow:
- Aspect term (phrase selection)
- Opinion term (phrase selection) - displayed next to aspect term
- Aspect category (dropdown)
- Sentiment polarity (dropdown)
When both Aspect term and Opinion term are configured:
- Clicking "Select phrase" on either field opens a combined popup
- The popup shows two separate text areas for independent phrase selection
- Both fields must be completed (either by phrase selection or marking as implicit) before proceeding
- Each field has its own "Implicit" checkbox when implicit terms are allowed
Food, Service, Price, Ambience, Location, Restaurant
Feel free to open issues or submit pull requests to improve the tool!
For questions, suggestions, or support, please reach out:
Nils Constantin Hellwig
📧 Nils-Constantin.Hellwig@ur.de
