text-cleaning-ai-agent

A FastAPI agent that automatically cleans and normalizes raw text data.

🧠 Text Cleaning AI Agent

This is a FastAPI-based AI micro-agent that automatically cleans raw text input for NLP or data preprocessing purposes. It supports both single and batch processing, including:

✅ Contraction expansion (e.g. "can't" → "cannot")
✅ Lowercasing and punctuation removal
✅ Stopword filtering
✅ Lemmatization
✅ Duplicate removal for batch inputs

🚀 Features

Feature	Supported
Contraction Expansion	✅
Lowercasing	✅
Special Character Removal	✅
Stopword Removal	✅
Lemmatization	✅
Duplicate Detection (Batch)	✅

🔧 How It Works

The agent exposes two endpoints:

POST /clean: Accepts a single string and returns its cleaned form.
POST /clean_batch: Accepts a list of strings, cleans each one, and removes duplicate results.

🧱 Libraries Used

Library	Purpose
`fastapi`	API framework
`uvicorn`	ASGI server to run the app
`nltk`	Tokenization, lemmatization, stopword removal
`contractions`	Expands contractions in English
`pandas`	Used to drop duplicate rows in batch cleanup

Install dependencies via:

pip install fastapi uvicorn nltk contractions pandas

Don’t forget to download the necessary NLTK data once:

import nltk
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('omw-1.4')

🧪 How We Tested

Run the server locally:

uvicorn main:app --reload

Then test in Swagger UI:
📍 http://127.0.0.1:8000/docs
Example for /clean:

{
  "text": "I can't believe it's working!"
}

Example for /clean_batch:

{
  "texts": [
    "I can't believe it's working!",
    "It's working!",
    "This is another test.",
    "This is another test."
  ]
}

Expected Output:

{
  "original_count": 4,
  "unique_cleaned_texts": [
    "cannot believe working",
    "working",
    "another test"
  ]
}

🌍 Real-World Use Cases
1. Customer Feedback & Review Analysis

Clean messy product reviews before sentiment or keyword analysis.
2. Survey Data Preprocessing

Standardize open-ended answers for aggregation and clustering.
3. NLP Model Training

Normalize raw text data before feeding it into models for better generalization.
4. CRM Ticket Deduplication

Detect repeated support queries more reliably after cleaning.
5. Web-Scraped Text or Transcription Cleanup

Process noisy scraped/transcribed input for cleaner datasets.
6. Social Media Text Normalization

Filter hashtags, slang, and contractions for trend analysis or moderation.


📂 Repo Structure
├── main.py                     # Original FastAPI version (/clean, /clean_batch)
├── app/
│   └── main.py                 # LLM-enhanced router
├── llm_router/
│   └── intent_router.py        # Uses Ollama to route intent
├── utils/
│   └── cleaner.py              # Text cleaning logic
├── streamlit_ui/
│   └── app.py                  # Streamlit frontend
├── oracle_use_cases/           # CLI demos for 3 enterprise use cases
├── screenshots/
│   └── streamlit_result.gif    # Sample output
├── Readme.md                    # Setup & instructions
├── Oracle Text Cleaning AI Agent  # demo for oracle use-case 

⚠️ License

This project is NOT open-source.
You may NOT copy, use, distribute, or modify this code for any purpose — personal or commercial — without express written permission from the creator.

    © All rights reserved by the author.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
agents		agents
app		app
llm_router		llm_router
oracle_use_cases		oracle_use_cases
screenshots		screenshots
streamlit_ui		streamlit_ui
utils		utils
.gitignore		.gitignore
Agentic_ai_agents_toggle_on_and_off.md		Agentic_ai_agents_toggle_on_and_off.md
Changelog.md		Changelog.md
README.md		README.md
main.py		main.py
oracle-Text-Cleaning-AI-Agent.md		oracle-Text-Cleaning-AI-Agent.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-cleaning-ai-agent

🧠 Text Cleaning AI Agent

🚀 Features

🔧 How It Works

🧱 Libraries Used

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

text-cleaning-ai-agent

🧠 Text Cleaning AI Agent

🚀 Features

🔧 How It Works

🧱 Libraries Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages