PhD Research Assistant Agent 🎓🤖

An intelligent AI-powered research assistant designed to streamline PhD research workflows by integrating with multiple platforms and automating common academic tasks.

🌟 Key Features

📚 Literature Management

Automated Paper Search: Search academic papers across ArXiv, Google Scholar, and other databases
Smart Paper Analysis: Extract key insights, methodology, and findings from research papers using AI
Zotero Integration: Seamlessly manage your reference library with automated paper organization and tagging

💬 Collaboration & Communication

Slack Integration: Interactive research assistant bot for team collaboration
Paper Monitoring: Real-time alerts for new papers in your research domain
Discussion Facilitation: AI-powered paper discussions and brainstorming sessions

📝 Productivity Tools

Meeting Agenda Generation: Automatically generate supervisor meeting agendas based on recent work
GitHub Integration: Track research code progress and commits
Notion Integration: Organize research notes, papers, and ideas in structured databases
Weekly Reports: Automated progress reporting and task tracking
DeepWiki Codebase Indexing: Index and search paper implementation codebases for deep understanding
Conference Schedule Planner ⭐ NEW (Oct 11, 2025): RAG-based personalized conference scheduling with thesis integration

🔗 MCP (Model Context Protocol) Support

Advanced integration with Claude AI through MCP
Context-aware assistance for research tasks
Seamless workflow automation

🚀 Quick Start

Prerequisites

Python 3.8+
API keys for:
- Anthropic Claude API
- Slack (optional)
- Notion (optional)
- GitHub (optional)
- Zotero (optional)

Installation

Clone the repository:

git clone https://github.com/yourusername/PhD_Agent.git
cd PhD_Agent

Configure your environment variables:

cp .env.example .env
# Edit .env with your API keys

Install the package:

pip install -e .

Or install dependencies only:

pip install -r requirements.txt

📖 Usage Examples

Paper Search and Analysis

from phd_agent import PhdAgent

agent = PhdAgent()
# Search for papers on a specific topic
papers = await agent.search_papers("transformer architectures NLP")
# Analyze a specific paper
analysis = await agent.analyze_paper("https://arxiv.org/abs/...")

Slack Bot for Research Teams

from phd_agent.integrations import SlackPaperMonitor

monitor = SlackPaperMonitor()
# Start monitoring papers and responding to team queries
await monitor.start()

Index and Query Paper Codebases with DeepWiki

from phd_agent.integrations import DeepWikiMCPIntegration

deepwiki = DeepWikiMCPIntegration()

# Index a paper's implementation
result = await deepwiki.index_paper_codebase(
    github_url="https://github.com/huggingface/transformers",
    paper_title="Transformers: State-of-the-Art NLP"
)

# Ask questions about the codebase
answer = await deepwiki.ask_about_codebase(
    repository="huggingface/transformers",
    question="How do I fine-tune BERT for classification?"
)

# Search for specific implementations
results = await deepwiki.search_codebase(
    repository="huggingface/transformers",
    query="attention mechanism"
)

🎉 Conference Schedule Planner (NEW - Oct 11, 2025)

The Conference Planner uses RAG (Retrieval-Augmented Generation) to create personalized conference schedules based on your research interests, with optional thesis integration for maximum precision.

Features

📄 PDF Parsing: Extracts talks/posters from conference abstract PDFs (ASHG 2025 tested with 4000+ pages)
🎯 Smart Matching: Semantic similarity matching using ChromaDB and sentence transformers
👤 Author Tracking ⭐ NEW: Prioritize talks by specific authors (PIs, collaborators, etc.) with +15% relevance boost
🚫 Exclusion Filtering: Filter out wet-lab/clinical work for computational researchers
📖 Thesis Integration: Upload your unpublished work for highly precise matching (stored locally only)
⚡ Smart Caching: Parses PDF once, caches talks and embeddings (~2 seconds to regenerate)
⚠️ Conflict Detection: Identifies overlapping sessions for manual decision-making
📅 Day-by-Day Schedule: Markdown output organized by day with relevance scores
📊 Excel Export: Convert schedules to Microsoft Excel with filtering and color-coding

Quick Start

1. Run the PhD Agent:

phd-agent
# Or: python -m phd_agent.cli

2. Update your research interests:

📝 You: interests update

You'll be prompted for:

Research interests (e.g., "statistical fine-mapping", "single-cell RNA seq")
Exclusion topics (optional, e.g., "wet-lab protocols", "clinical case studies")
Thesis/dissertation (optional, drag & drop your PDF for best matching)

3. Generate your personalized schedule:

📝 You: conference plan ASHG2025

The system will:

Load cached talks (or parse PDF if first time)
Load your thesis if provided
Generate embeddings and match talks to your interests
Create a personalized schedule with conflict detection

Output: data/ASHG2025/ashg_schedule.md

Example Workflow

# Start the agent
$ phd-agent

📝 You: interests update

# Enter your interests
Interest #1: statistical fine-mapping, Bayesian approaches
Interest #2: eQTL, multi-omics, regulatory elements
Interest #3: perturb-seq, CRISPR perturbation
Interest #4: done

# Optional exclusions
Exclude #1: wet-lab protocols and techniques
Exclude #2: clinical case studies without methods
Exclude #3: done

# Optional thesis (drag & drop or type path)
Do you want to add your thesis/dissertation? (y/n): y
Enter path to thesis PDF (or drag file here): /path/to/thesis.pdf
  ✓ Thesis added: thesis.pdf

# Generate schedule
📝 You: conference plan ASHG2025

📦 Loading cached talks (faster!)...
✅ Loaded 323 talks from cache
📊 Talks already indexed in ChromaDB (323 talks)
📖 Loading thesis: thesis.pdf...
✅ Thesis loaded (10000 words)
🎯 Using thesis content for precise matching...
✅ Found 50 relevant talks!
🚫 Filtered out 12 talks matching exclusion criteria

🎊 CONFERENCE SCHEDULE COMPLETE!
  Conference: ASHG2025
  Total talks in PDF: 323
  Relevant to your interests: 50
  Scheduling conflicts: 13

📄 Schedule saved to:
  data/ASHG2025/ashg_schedule.md

Example Output with Author Highlighting:

### Fine-mapping of loci under directional selection reveals functional architecture

**Type:** 📋 Poster | **Relevance Score:** 63.62%  ← Boosted from ~48%!

**⏰ Time:** Friday, October 17 at 1:35pm – 1:40pm

**👥 Authors:** Javier Maravall-López, Ali Akbari, Gaspard Kerner, ... **⭐ Alkes L. Price** *et al.* (6 total)

**📝 Abstract:**

Studies of directional selection inform our understanding of evolution and biology...

Note the ⭐ star highlighting Alkes L. Price (your author of interest) even though he's the senior author!

Export to Excel:

python scripts/convert_schedule_to_excel.py
# Creates: data/ASHG2025/ashg2025_schedule.xlsx

Advanced Usage

Programmatic API:

from phd_agent.tools import ConferencePlanner

# Initialize planner
planner = ConferencePlanner(
    conference_name="ASHG2025",
    conference_dir="./data/ASHG2025"
)

# Parse PDF (cached after first run)
talks = planner.parse_conference_pdf("path/to/conference.pdf")

# Set research profile
planner.research_interests = [
    "statistical fine-mapping",
    "eQTL analysis",
    "machine learning for genomics"
]
planner.authors_of_interest = [
    "Alkes Price",
    "Bogdan Pasaniuc",
    "Nicholas Mancuso"
]
planner.exclusion_topics = [
    "wet-lab protocols",
    "clinical case studies"
]
planner.thesis_path = "/path/to/thesis.pdf"

# Index and find relevant talks
planner.index_talks()
relevant_talks = planner.find_relevant_talks(top_k=50, min_relevance_score=0.3)

# Generate schedule
planner.generate_schedule_markdown(
    relevant_talks,
    "data/ASHG2025/schedule.md"
)

Configuration

Research Interests File (research_interests.md):

# Research Interests

*Generated on: 2025-10-11*

## My Research Focus

- statistical fine-mapping, Bayesian approaches
- eQTL, multi-omics, regulatory elements
- perturb-seq, CRISPR perturbation

## Authors of Interest

*Senior authors, PIs, or collaborators to prioritize:*

- Alkes Price
- Bogdan Pasaniuc
- Nicholas Mancuso
- Jonathan Pritchard

## Topics to Exclude

*These topics will be filtered out from recommendations:*

- wet-lab protocols and techniques
- clinical case studies without methods

## Unpublished Work (Private)

*Thesis/dissertation stored locally for enhanced matching:*

- Path: `/Users/you/Documents/thesis.pdf`

How It Works

PDF Parsing: Custom ASHG parser extracts title, abstract, authors (all authors including senior), day, time, location
Caching: Talks cached as .ashg2025_talks_cache.pkl, embeddings in .chromadb/
Embedding: Uses sentence-transformers (all-MiniLM-L6-v2) to encode talks and interests
Thesis Integration: If provided, thesis text (first 10k words) is weighted 2x in the query
RAG Matching: ChromaDB vector search finds semantically similar talks
Author Boosting: Talks featuring your authors of interest receive +15% relevance boost and are highlighted with ⭐
Exclusion Filtering: Keyword-based filtering removes unwanted topics (wet-lab, pure clinical)
Conflict Detection: Groups talks by time slot, identifies overlaps
Schedule Generation: Markdown output organized by day with relevance scores
Excel Export: Optional conversion to Excel with sortable/filterable columns and color-coded conflicts

Performance

First run: ~60 seconds (parse PDF + generate embeddings)
Subsequent runs: ~2 seconds (load from cache)
Memory: ~500MB for 300+ talks with embeddings
Accuracy: Semantic similarity matching is far superior to keyword search

Supported Conferences

Currently optimized for:

✅ ASHG (American Society of Human Genetics)
🔄 Generic parser for other conference formats (may need customization)

To add a new conference format, implement a custom parser in src/phd_agent/tools/conference_planner.py.

🏗️ Architecture

The PhD Agent is organized as a Python package with the following structure:

src/phd_agent/
├── agent.py              # Main PhdAgent class
├── cli.py                # Command-line entry point
├── core/
│   ├── react_agent.py    # ReAct reasoning agent
│   └── phd_agent_tools.py # Tool wrappers
├── tools/
│   ├── paper_search.py   # Academic paper discovery
│   ├── paper_analyzer.py # AI-powered paper analysis
│   └── conference_planner.py # RAG-based conference scheduling
└── integrations/
    ├── mcp.py            # GitHub and Notion connectivity
    ├── slack.py          # Slack team collaboration
    ├── zotero.py         # Reference management
    ├── deepwiki.py       # Codebase indexing
    └── slack_monitor.py  # Paper monitoring

Additional directories:

scripts/: Utility scripts (meeting agenda, Excel export, etc.)
tests/: Test suite
data/: Conference data and schedules
docs/: Documentation

🔧 Configuration

Environment Variables

Create a .env file with the following:

# Claude AI
ANTHROPIC_API_KEY=your_api_key

# Slack (optional)
SLACK_BOT_TOKEN=your_bot_token
SLACK_APP_TOKEN=your_app_token

# Notion (optional)
NOTION_API_KEY=your_api_key

# GitHub (optional)
GITHUB_TOKEN=your_token

# Zotero (optional)
ZOTERO_API_KEY=your_api_key
ZOTERO_USER_ID=your_user_id

# DeepWiki (optional)
DEEPWIKI_API_KEY=your_api_key  # For private repos
DEEPWIKI_MAX_CONCURRENCY=5

See .env.example for a complete template.

📚 Documentation

Slack & Zotero Setup Guide - Detailed setup instructions for Slack bot and Zotero integration
Author Filter Guide - How to prioritize talks by specific authors
ReAct Demo Tasks - Example tasks for the ReAct agent

🧪 Testing

Run the test suite:

pytest tests/ -v

Or run individual tests:

python tests/test_agent.py
python tests/test_react_mock.py

🤝 Contributing

Contributions are welcome! This project is actively being developed as part of PhD research. Feel free to:

Report bugs
Suggest new features
Submit pull requests
Share research use cases

🛠️ Tech Stack

Python 3.8+: Core language
Claude AI (Anthropic): Advanced AI reasoning
Async/Await: Efficient concurrent operations
BeautifulSoup4: Web scraping
ArXiv API: Academic paper access
Slack SDK: Team collaboration
Pyzotero: Reference management
MCP SDK: Model Context Protocol support
ChromaDB: Vector database for RAG
Sentence Transformers: Semantic embeddings
PyPDF: PDF text extraction

📄 License

This project is developed for academic research purposes. Please cite if used in your research work.

👨‍🎓 Author

Developed as part of PhD research to enhance academic productivity through AI automation.

🚦 Current Status

Active Development - New features and integrations are regularly added based on research needs.

Recent Updates

✅ Oct 11, 2025: Conference Schedule Planner with RAG, thesis integration, exclusion filtering, and author-based prioritization
✅ DeepWiki integration for codebase indexing
✅ MCP integration for enhanced AI capabilities
✅ Slack bot for team collaboration
✅ Zotero reference management
✅ Meeting agenda automation
✅ Multi-source paper search

Upcoming Features

🎯 Multi-conference support (ISMB, NeurIPS, etc.)
📅 Google Calendar integration for conference schedules
📊 Research progress visualization
🔍 Citation network analysis
📝 Automated literature review generation

Built with ❤️ for PhD researchers, by a PhD researcher

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data/ASHG2025		data/ASHG2025
docs		docs
scripts		scripts
src/phd_agent		src/phd_agent
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
research_interests.md		research_interests.md

Folders and files

Latest commit

History

Repository files navigation

PhD Research Assistant Agent 🎓🤖

🌟 Key Features

📚 Literature Management

💬 Collaboration & Communication

📝 Productivity Tools

🔗 MCP (Model Context Protocol) Support

🚀 Quick Start

Prerequisites

Installation

📖 Usage Examples

Paper Search and Analysis

Slack Bot for Research Teams

Index and Query Paper Codebases with DeepWiki

🎉 Conference Schedule Planner (NEW - Oct 11, 2025)

Features

Quick Start

Example Workflow

Advanced Usage

Configuration

How It Works

Performance

Supported Conferences

🏗️ Architecture

🔧 Configuration

Environment Variables

📚 Documentation

🧪 Testing

🤝 Contributing

🛠️ Tech Stack

📄 License

👨‍🎓 Author

🚦 Current Status

Recent Updates

Upcoming Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages