Skip to content

CamelliaRui/PhD_Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PhD Research Assistant Agent πŸŽ“πŸ€–

An intelligent AI-powered research assistant designed to streamline PhD research workflows by integrating with multiple platforms and automating common academic tasks.

PhD Agent Architecture

🌟 Key Features

πŸ“š Literature Management

  • Automated Paper Search: Search academic papers across ArXiv, Google Scholar, and other databases
  • Smart Paper Analysis: Extract key insights, methodology, and findings from research papers using AI
  • Zotero Integration: Seamlessly manage your reference library with automated paper organization and tagging

πŸ’¬ Collaboration & Communication

  • Slack Integration: Interactive research assistant bot for team collaboration
  • Paper Monitoring: Real-time alerts for new papers in your research domain
  • Discussion Facilitation: AI-powered paper discussions and brainstorming sessions

πŸ“ Productivity Tools

  • Meeting Agenda Generation: Automatically generate supervisor meeting agendas based on recent work
  • GitHub Integration: Track research code progress and commits
  • Notion Integration: Organize research notes, papers, and ideas in structured databases
  • Weekly Reports: Automated progress reporting and task tracking
  • DeepWiki Codebase Indexing: Index and search paper implementation codebases for deep understanding
  • Conference Schedule Planner ⭐ NEW (Oct 11, 2025): RAG-based personalized conference scheduling with thesis integration

πŸ”— MCP (Model Context Protocol) Support

  • Advanced integration with Claude AI through MCP
  • Context-aware assistance for research tasks
  • Seamless workflow automation

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • API keys for:
    • Anthropic Claude API
    • Slack (optional)
    • Notion (optional)
    • GitHub (optional)
    • Zotero (optional)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/PhD_Agent.git
cd PhD_Agent
  1. Configure your environment variables:
cp .env.example .env
# Edit .env with your API keys
  1. Install the package:
pip install -e .

Or install dependencies only:

pip install -r requirements.txt

πŸ“– Usage Examples

Paper Search and Analysis

from phd_agent import PhdAgent

agent = PhdAgent()
# Search for papers on a specific topic
papers = await agent.search_papers("transformer architectures NLP")
# Analyze a specific paper
analysis = await agent.analyze_paper("https://arxiv.org/abs/...")

Slack Bot for Research Teams

from phd_agent.integrations import SlackPaperMonitor

monitor = SlackPaperMonitor()
# Start monitoring papers and responding to team queries
await monitor.start()

Index and Query Paper Codebases with DeepWiki

from phd_agent.integrations import DeepWikiMCPIntegration

deepwiki = DeepWikiMCPIntegration()

# Index a paper's implementation
result = await deepwiki.index_paper_codebase(
    github_url="https://github.com/huggingface/transformers",
    paper_title="Transformers: State-of-the-Art NLP"
)

# Ask questions about the codebase
answer = await deepwiki.ask_about_codebase(
    repository="huggingface/transformers",
    question="How do I fine-tune BERT for classification?"
)

# Search for specific implementations
results = await deepwiki.search_codebase(
    repository="huggingface/transformers",
    query="attention mechanism"
)

πŸŽ‰ Conference Schedule Planner (NEW - Oct 11, 2025)

The Conference Planner uses RAG (Retrieval-Augmented Generation) to create personalized conference schedules based on your research interests, with optional thesis integration for maximum precision.

Features

  • πŸ“„ PDF Parsing: Extracts talks/posters from conference abstract PDFs (ASHG 2025 tested with 4000+ pages)
  • 🎯 Smart Matching: Semantic similarity matching using ChromaDB and sentence transformers
  • πŸ‘€ Author Tracking ⭐ NEW: Prioritize talks by specific authors (PIs, collaborators, etc.) with +15% relevance boost
  • 🚫 Exclusion Filtering: Filter out wet-lab/clinical work for computational researchers
  • πŸ“– Thesis Integration: Upload your unpublished work for highly precise matching (stored locally only)
  • ⚑ Smart Caching: Parses PDF once, caches talks and embeddings (~2 seconds to regenerate)
  • ⚠️ Conflict Detection: Identifies overlapping sessions for manual decision-making
  • πŸ“… Day-by-Day Schedule: Markdown output organized by day with relevance scores
  • πŸ“Š Excel Export: Convert schedules to Microsoft Excel with filtering and color-coding

Quick Start

1. Run the PhD Agent:

phd-agent
# Or: python -m phd_agent.cli

2. Update your research interests:

πŸ“ You: interests update

You'll be prompted for:

  • Research interests (e.g., "statistical fine-mapping", "single-cell RNA seq")
  • Exclusion topics (optional, e.g., "wet-lab protocols", "clinical case studies")
  • Thesis/dissertation (optional, drag & drop your PDF for best matching)

3. Generate your personalized schedule:

πŸ“ You: conference plan ASHG2025

The system will:

  • Load cached talks (or parse PDF if first time)
  • Load your thesis if provided
  • Generate embeddings and match talks to your interests
  • Create a personalized schedule with conflict detection

Output: data/ASHG2025/ashg_schedule.md

Example Workflow

# Start the agent
$ phd-agent

πŸ“ You: interests update

# Enter your interests
Interest #1: statistical fine-mapping, Bayesian approaches
Interest #2: eQTL, multi-omics, regulatory elements
Interest #3: perturb-seq, CRISPR perturbation
Interest #4: done

# Optional exclusions
Exclude #1: wet-lab protocols and techniques
Exclude #2: clinical case studies without methods
Exclude #3: done

# Optional thesis (drag & drop or type path)
Do you want to add your thesis/dissertation? (y/n): y
Enter path to thesis PDF (or drag file here): /path/to/thesis.pdf
  βœ“ Thesis added: thesis.pdf

# Generate schedule
πŸ“ You: conference plan ASHG2025

πŸ“¦ Loading cached talks (faster!)...
βœ… Loaded 323 talks from cache
πŸ“Š Talks already indexed in ChromaDB (323 talks)
πŸ“– Loading thesis: thesis.pdf...
βœ… Thesis loaded (10000 words)
🎯 Using thesis content for precise matching...
βœ… Found 50 relevant talks!
🚫 Filtered out 12 talks matching exclusion criteria

🎊 CONFERENCE SCHEDULE COMPLETE!
  Conference: ASHG2025
  Total talks in PDF: 323
  Relevant to your interests: 50
  Scheduling conflicts: 13

πŸ“„ Schedule saved to:
  data/ASHG2025/ashg_schedule.md

Example Output with Author Highlighting:

### Fine-mapping of loci under directional selection reveals functional architecture

**Type:** πŸ“‹ Poster | **Relevance Score:** 63.62%  ← Boosted from ~48%!

**⏰ Time:** Friday, October 17 at 1:35pm – 1:40pm

**πŸ‘₯ Authors:** Javier Maravall-LΓ³pez, Ali Akbari, Gaspard Kerner, ... **⭐ Alkes L. Price** *et al.* (6 total)

**πŸ“ Abstract:**

Studies of directional selection inform our understanding of evolution and biology...

Note the ⭐ star highlighting Alkes L. Price (your author of interest) even though he's the senior author!

Export to Excel:

python scripts/convert_schedule_to_excel.py
# Creates: data/ASHG2025/ashg2025_schedule.xlsx

Advanced Usage

Programmatic API:

from phd_agent.tools import ConferencePlanner

# Initialize planner
planner = ConferencePlanner(
    conference_name="ASHG2025",
    conference_dir="./data/ASHG2025"
)

# Parse PDF (cached after first run)
talks = planner.parse_conference_pdf("path/to/conference.pdf")

# Set research profile
planner.research_interests = [
    "statistical fine-mapping",
    "eQTL analysis",
    "machine learning for genomics"
]
planner.authors_of_interest = [
    "Alkes Price",
    "Bogdan Pasaniuc",
    "Nicholas Mancuso"
]
planner.exclusion_topics = [
    "wet-lab protocols",
    "clinical case studies"
]
planner.thesis_path = "/path/to/thesis.pdf"

# Index and find relevant talks
planner.index_talks()
relevant_talks = planner.find_relevant_talks(top_k=50, min_relevance_score=0.3)

# Generate schedule
planner.generate_schedule_markdown(
    relevant_talks,
    "data/ASHG2025/schedule.md"
)

Configuration

Research Interests File (research_interests.md):

# Research Interests

*Generated on: 2025-10-11*

## My Research Focus

- statistical fine-mapping, Bayesian approaches
- eQTL, multi-omics, regulatory elements
- perturb-seq, CRISPR perturbation

## Authors of Interest

*Senior authors, PIs, or collaborators to prioritize:*

- Alkes Price
- Bogdan Pasaniuc
- Nicholas Mancuso
- Jonathan Pritchard

## Topics to Exclude

*These topics will be filtered out from recommendations:*

- wet-lab protocols and techniques
- clinical case studies without methods

## Unpublished Work (Private)

*Thesis/dissertation stored locally for enhanced matching:*

- Path: `/Users/you/Documents/thesis.pdf`

How It Works

  1. PDF Parsing: Custom ASHG parser extracts title, abstract, authors (all authors including senior), day, time, location
  2. Caching: Talks cached as .ashg2025_talks_cache.pkl, embeddings in .chromadb/
  3. Embedding: Uses sentence-transformers (all-MiniLM-L6-v2) to encode talks and interests
  4. Thesis Integration: If provided, thesis text (first 10k words) is weighted 2x in the query
  5. RAG Matching: ChromaDB vector search finds semantically similar talks
  6. Author Boosting: Talks featuring your authors of interest receive +15% relevance boost and are highlighted with ⭐
  7. Exclusion Filtering: Keyword-based filtering removes unwanted topics (wet-lab, pure clinical)
  8. Conflict Detection: Groups talks by time slot, identifies overlaps
  9. Schedule Generation: Markdown output organized by day with relevance scores
  10. Excel Export: Optional conversion to Excel with sortable/filterable columns and color-coded conflicts

Performance

  • First run: ~60 seconds (parse PDF + generate embeddings)
  • Subsequent runs: ~2 seconds (load from cache)
  • Memory: ~500MB for 300+ talks with embeddings
  • Accuracy: Semantic similarity matching is far superior to keyword search

Supported Conferences

Currently optimized for:

  • βœ… ASHG (American Society of Human Genetics)
  • πŸ”„ Generic parser for other conference formats (may need customization)

To add a new conference format, implement a custom parser in src/phd_agent/tools/conference_planner.py.

πŸ—οΈ Architecture

The PhD Agent is organized as a Python package with the following structure:

src/phd_agent/
β”œβ”€β”€ agent.py              # Main PhdAgent class
β”œβ”€β”€ cli.py                # Command-line entry point
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ react_agent.py    # ReAct reasoning agent
β”‚   └── phd_agent_tools.py # Tool wrappers
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ paper_search.py   # Academic paper discovery
β”‚   β”œβ”€β”€ paper_analyzer.py # AI-powered paper analysis
β”‚   └── conference_planner.py # RAG-based conference scheduling
└── integrations/
    β”œβ”€β”€ mcp.py            # GitHub and Notion connectivity
    β”œβ”€β”€ slack.py          # Slack team collaboration
    β”œβ”€β”€ zotero.py         # Reference management
    β”œβ”€β”€ deepwiki.py       # Codebase indexing
    └── slack_monitor.py  # Paper monitoring

Additional directories:

  • scripts/: Utility scripts (meeting agenda, Excel export, etc.)
  • tests/: Test suite
  • data/: Conference data and schedules
  • docs/: Documentation

πŸ”§ Configuration

Environment Variables

Create a .env file with the following:

# Claude AI
ANTHROPIC_API_KEY=your_api_key

# Slack (optional)
SLACK_BOT_TOKEN=your_bot_token
SLACK_APP_TOKEN=your_app_token

# Notion (optional)
NOTION_API_KEY=your_api_key

# GitHub (optional)
GITHUB_TOKEN=your_token

# Zotero (optional)
ZOTERO_API_KEY=your_api_key
ZOTERO_USER_ID=your_user_id

# DeepWiki (optional)
DEEPWIKI_API_KEY=your_api_key  # For private repos
DEEPWIKI_MAX_CONCURRENCY=5

See .env.example for a complete template.

πŸ“š Documentation

πŸ§ͺ Testing

Run the test suite:

pytest tests/ -v

Or run individual tests:

python tests/test_agent.py
python tests/test_react_mock.py

🀝 Contributing

Contributions are welcome! This project is actively being developed as part of PhD research. Feel free to:

  • Report bugs
  • Suggest new features
  • Submit pull requests
  • Share research use cases

πŸ› οΈ Tech Stack

  • Python 3.8+: Core language
  • Claude AI (Anthropic): Advanced AI reasoning
  • Async/Await: Efficient concurrent operations
  • BeautifulSoup4: Web scraping
  • ArXiv API: Academic paper access
  • Slack SDK: Team collaboration
  • Pyzotero: Reference management
  • MCP SDK: Model Context Protocol support
  • ChromaDB: Vector database for RAG
  • Sentence Transformers: Semantic embeddings
  • PyPDF: PDF text extraction

πŸ“„ License

This project is developed for academic research purposes. Please cite if used in your research work.

πŸ‘¨β€πŸŽ“ Author

Developed as part of PhD research to enhance academic productivity through AI automation.

🚦 Current Status

Active Development - New features and integrations are regularly added based on research needs.

Recent Updates

  • βœ… Oct 11, 2025: Conference Schedule Planner with RAG, thesis integration, exclusion filtering, and author-based prioritization
  • βœ… DeepWiki integration for codebase indexing
  • βœ… MCP integration for enhanced AI capabilities
  • βœ… Slack bot for team collaboration
  • βœ… Zotero reference management
  • βœ… Meeting agenda automation
  • βœ… Multi-source paper search

Upcoming Features

  • 🎯 Multi-conference support (ISMB, NeurIPS, etc.)
  • πŸ“… Google Calendar integration for conference schedules
  • πŸ“Š Research progress visualization
  • πŸ” Citation network analysis
  • πŸ“ Automated literature review generation

Built with ❀️ for PhD researchers, by a PhD researcher

About

An agent that helps PhD students

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors