Frederick is a chatbot designed to provide course descriptions in University of Naples Federico II. It is built using two web scrapers (one based on traditional scraping, one based on LLM scraping) and a RAG pipeline.
-
Clone the repository:
git clone https://github.com/msolki/Frederick.git cd Frederick
-
To start the chatbot, open the notebook in your browser and follow the instructions to interact with the chatbot.
The code retrieves API keys from environment variables or prompts the user to input them if they are not found. It supports both Google Colab and local environments.
You'll need to set OPENAI_API_KEY
for the LLM scraper (gpt-3.5-turbo), groq_api_key
for the chatbot's LLM (llama3-8b-8192) and, huggingface_token
for the embedder (all-MiniLM-L6-v2).
You are welcome to modify the models as you see fit to achieve better results.
Contributions are welcome! Please open an issue or submit a pull request for any changes.
This project is licensed under the Apache License 2.0.
- Lewis et al. (2021) – Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (arXiv).
- ScrapeGraphAI – Python library for scraping with LLMs (GitHub).
- all-MiniLM-L6-v2 – Sentence transformer for semantic search (Hugging Face).
- Reimers & Gurevych (2019) – Sentence-BERT: Siamese BERT-Networks (arXiv).
- Comparison of Sentence Transformers (SBERT).
- Wang et al. (2020) – MiniLM: Self-Attention Distillation for Transformer Compression (arXiv).
- Chroma – AI-native open-source vector database (LangChain).
- Meta Llama 3 – Most capable open LLM (Groq, Ollama).
- LangChain Q&A with RAG (Docs).
- Galileo – AI chatbot assistant for the University of Padova (GitHub).
- Wei et al. (2023) – Chain-of-Thought Prompting for LLM Reasoning (arXiv).
- LangChain & Chain of Thought Prompting (Article).