Skip to content

sunilnarayan419-ui/llm_data_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Trustworthy LLM Data Pipeline

> Internship Project demonstrating production-grade AI engineering

🎯 Project Philosophy

This project implements the 8 principles of trustworthy AI systems:

  1. Don't trust AI - Strict validation layers
  2. Control AI - Zero temperature, strict prompts
  3. Verify everything - Schema validation + knowledge grounding
  4. Design for failure - Retry mechanisms with exponential backoff
  5. Ground in reality - RAG with domain constraints
  6. Scale horizontally - Async processing + batch pipelines
  7. Distribute load - Redis + Celery workers
  8. Measure reality - Prometheus + Grafana observability

🏗️ Architecture

llm_data_pipeline/ ├── app/ │ ├── init.py │ ├── main.py # FastAPI service │ ├── config.py # Configuration │ ├── models/ │ │ ├── init.py │ │ ├── schemas.py # Pydantic schemas (validation layer) │ │ └── requests.py # Request/Response models │ ├── services/ │ │ ├── init.py │ │ ├── llm_client.py # LLM with guardrails │ │ ├── validator.py # Schema validation │ │ ├── cleaner.py # Data cleaning logic │ │ └── rag_service.py # Knowledge grounding │ ├── workers/ │ │ ├── init.py │ │ └── celery_worker.py # Background processing │ ├── core/ │ │ ├── init.py │ │ ├── security.py # Input sanitization │ │ ├── retries.py # Self-healing mechanisms │ │ └── metrics.py # Prometheus metrics │ └── api/ │ ├── init.py │ └── routes.py # API endpoints ├── tests/ │ ├── test_validation.py │ ├── test_llm_client.py │ └── test_integration.py ├── docker-compose.yml # Redis + App + Worker ├── requirements.txt ├── Dockerfile ├── prometheus.yml # Observability config └── README.md # Documentation for internship

🚀 Quick Start

# 1. Clone and setup
git clone <repo>
cd llm_data_pipeline
cp .env.example .env  # Add your OPENAI_API_KEY

# 2. Start infrastructure
docker-compose up -d redis prometheus grafana

# 3. Run API
uvicorn app.main:app --reload

# 4. Start worker (another terminal)
celery -A app.workers.celery_worker worker --loglevel=info

# 5. Test
curl -X POST http://localhost:8000/clean \
  -H "Content-Type: application/json" \
  -d '{
    "raw_data": {
      "id": "emp_001",
      "name": "john smith",
      "email": "john@company.com",
      "dept": "engineering",
      "pay": "120k",
      "start": "2023-01-15",
      "active": "yes"
    }
  }' 

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages