> Internship Project demonstrating production-grade AI engineering
This project implements the 8 principles of trustworthy AI systems:
- Don't trust AI - Strict validation layers
- Control AI - Zero temperature, strict prompts
- Verify everything - Schema validation + knowledge grounding
- Design for failure - Retry mechanisms with exponential backoff
- Ground in reality - RAG with domain constraints
- Scale horizontally - Async processing + batch pipelines
- Distribute load - Redis + Celery workers
- Measure reality - Prometheus + Grafana observability
llm_data_pipeline/ ├── app/ │ ├── init.py │ ├── main.py # FastAPI service │ ├── config.py # Configuration │ ├── models/ │ │ ├── init.py │ │ ├── schemas.py # Pydantic schemas (validation layer) │ │ └── requests.py # Request/Response models │ ├── services/ │ │ ├── init.py │ │ ├── llm_client.py # LLM with guardrails │ │ ├── validator.py # Schema validation │ │ ├── cleaner.py # Data cleaning logic │ │ └── rag_service.py # Knowledge grounding │ ├── workers/ │ │ ├── init.py │ │ └── celery_worker.py # Background processing │ ├── core/ │ │ ├── init.py │ │ ├── security.py # Input sanitization │ │ ├── retries.py # Self-healing mechanisms │ │ └── metrics.py # Prometheus metrics │ └── api/ │ ├── init.py │ └── routes.py # API endpoints ├── tests/ │ ├── test_validation.py │ ├── test_llm_client.py │ └── test_integration.py ├── docker-compose.yml # Redis + App + Worker ├── requirements.txt ├── Dockerfile ├── prometheus.yml # Observability config └── README.md # Documentation for internship
# 1. Clone and setup
git clone <repo>
cd llm_data_pipeline
cp .env.example .env # Add your OPENAI_API_KEY
# 2. Start infrastructure
docker-compose up -d redis prometheus grafana
# 3. Run API
uvicorn app.main:app --reload
# 4. Start worker (another terminal)
celery -A app.workers.celery_worker worker --loglevel=info
# 5. Test
curl -X POST http://localhost:8000/clean \
-H "Content-Type: application/json" \
-d '{
"raw_data": {
"id": "emp_001",
"name": "john smith",
"email": "john@company.com",
"dept": "engineering",
"pay": "120k",
"start": "2023-01-15",
"active": "yes"
}
}'