Skip to content

jwteeba/news_stream_recommender

Repository files navigation

📰 News Stream Recommender (Topic Clustering With OpenAI)

Real-Time NLP Data Pipeline using Docker, FastAPI, Kafka, Spark, MongoDB, Streamlit, and Poetry


OpenAI Docker FastAPI Apache Spark MongoDB Streamlit Poetry


🧠 Overview

The News Stream Recommender continuously ingests breaking news from live sources (NewsAPI), applies NLP topic modeling in real time using OpenAI, stores results in MongoDB, and visualizes insights via a FastAPI + Streamlit frontend.


🚀 Features

  • 🌍 Real-Time Ingestion – fetches live headlines via Kafka producers
  • 🧩 Streaming NLP Pipeline – OpenAI
  • 💾 Data Persistence – stores clustered topics in MongoDB
  • RESTful API – built with FastAPI for frontend data access
  • 📊 Interactive Dashboard – live topic view via Streamlit
  • 🐳 Fully Containerized – built and orchestrated using Docker Compose
  • 📦 Poetry-Managed – clean and reproducible Python environments

🏗️ Architecture

Design


⚙️ Tech Stack

Layer Technology
Ingestion Kafka, NewsAPI, RSS Feeds
Processing Apache Spark (OPENAI for deterministic mapping between title → topic)
Storage MongoDB
Backend FastAPI
Frontend Streamlit
Management Docker Compose, Poetry

🧪 Testing

This project includes a comprehensive test suite with both simple and advanced testing options.

🚀 Quick Start

Run Simple Tests (Recommended)

# Run all simple tests
make test
pytest --verbose

# Run with coverage
pytest --coverage --verbose

# Run specific component
make test-spark
pytest tests/test_spark_streaming.py --verbose

🎯 Test Coverage

The simple test suite covers:

  • Core Business Logic - Data processing, filtering, validation
  • API Structure - Endpoint responses and data formats
  • Utility Functions - Date formatting, text processing, URL validation
  • Error Handling - Exception handling patterns
  • Data Flow - Basic integration between components

🔧 Install Test dependencies

poetry install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors