An intelligent AI-powered research assistant designed to streamline PhD research workflows by integrating with multiple platforms and automating common academic tasks.
- Automated Paper Search: Search academic papers across ArXiv, Google Scholar, and other databases
- Smart Paper Analysis: Extract key insights, methodology, and findings from research papers using AI
- Zotero Integration: Seamlessly manage your reference library with automated paper organization and tagging
- Slack Integration: Interactive research assistant bot for team collaboration
- Paper Monitoring: Real-time alerts for new papers in your research domain
- Discussion Facilitation: AI-powered paper discussions and brainstorming sessions
- Meeting Agenda Generation: Automatically generate supervisor meeting agendas based on recent work
- GitHub Integration: Track research code progress and commits
- Notion Integration: Organize research notes, papers, and ideas in structured databases
- Weekly Reports: Automated progress reporting and task tracking
- DeepWiki Codebase Indexing: Index and search paper implementation codebases for deep understanding
- Conference Schedule Planner β NEW (Oct 11, 2025): RAG-based personalized conference scheduling with thesis integration
- Advanced integration with Claude AI through MCP
- Context-aware assistance for research tasks
- Seamless workflow automation
- Python 3.8+
- API keys for:
- Anthropic Claude API
- Slack (optional)
- Notion (optional)
- GitHub (optional)
- Zotero (optional)
- Clone the repository:
git clone https://github.com/yourusername/PhD_Agent.git
cd PhD_Agent- Configure your environment variables:
cp .env.example .env
# Edit .env with your API keys- Install the package:
pip install -e .Or install dependencies only:
pip install -r requirements.txtfrom phd_agent import PhdAgent
agent = PhdAgent()
# Search for papers on a specific topic
papers = await agent.search_papers("transformer architectures NLP")
# Analyze a specific paper
analysis = await agent.analyze_paper("https://arxiv.org/abs/...")from phd_agent.integrations import SlackPaperMonitor
monitor = SlackPaperMonitor()
# Start monitoring papers and responding to team queries
await monitor.start()from phd_agent.integrations import DeepWikiMCPIntegration
deepwiki = DeepWikiMCPIntegration()
# Index a paper's implementation
result = await deepwiki.index_paper_codebase(
github_url="https://github.com/huggingface/transformers",
paper_title="Transformers: State-of-the-Art NLP"
)
# Ask questions about the codebase
answer = await deepwiki.ask_about_codebase(
repository="huggingface/transformers",
question="How do I fine-tune BERT for classification?"
)
# Search for specific implementations
results = await deepwiki.search_codebase(
repository="huggingface/transformers",
query="attention mechanism"
)The Conference Planner uses RAG (Retrieval-Augmented Generation) to create personalized conference schedules based on your research interests, with optional thesis integration for maximum precision.
- π PDF Parsing: Extracts talks/posters from conference abstract PDFs (ASHG 2025 tested with 4000+ pages)
- π― Smart Matching: Semantic similarity matching using ChromaDB and sentence transformers
- π€ Author Tracking β NEW: Prioritize talks by specific authors (PIs, collaborators, etc.) with +15% relevance boost
- π« Exclusion Filtering: Filter out wet-lab/clinical work for computational researchers
- π Thesis Integration: Upload your unpublished work for highly precise matching (stored locally only)
- β‘ Smart Caching: Parses PDF once, caches talks and embeddings (~2 seconds to regenerate)
β οΈ Conflict Detection: Identifies overlapping sessions for manual decision-making- π Day-by-Day Schedule: Markdown output organized by day with relevance scores
- π Excel Export: Convert schedules to Microsoft Excel with filtering and color-coding
1. Run the PhD Agent:
phd-agent
# Or: python -m phd_agent.cli2. Update your research interests:
π You: interests update
You'll be prompted for:
- Research interests (e.g., "statistical fine-mapping", "single-cell RNA seq")
- Exclusion topics (optional, e.g., "wet-lab protocols", "clinical case studies")
- Thesis/dissertation (optional, drag & drop your PDF for best matching)
3. Generate your personalized schedule:
π You: conference plan ASHG2025
The system will:
- Load cached talks (or parse PDF if first time)
- Load your thesis if provided
- Generate embeddings and match talks to your interests
- Create a personalized schedule with conflict detection
Output: data/ASHG2025/ashg_schedule.md
# Start the agent
$ phd-agent
π You: interests update
# Enter your interests
Interest #1: statistical fine-mapping, Bayesian approaches
Interest #2: eQTL, multi-omics, regulatory elements
Interest #3: perturb-seq, CRISPR perturbation
Interest #4: done
# Optional exclusions
Exclude #1: wet-lab protocols and techniques
Exclude #2: clinical case studies without methods
Exclude #3: done
# Optional thesis (drag & drop or type path)
Do you want to add your thesis/dissertation? (y/n): y
Enter path to thesis PDF (or drag file here): /path/to/thesis.pdf
β Thesis added: thesis.pdf
# Generate schedule
π You: conference plan ASHG2025
π¦ Loading cached talks (faster!)...
β
Loaded 323 talks from cache
π Talks already indexed in ChromaDB (323 talks)
π Loading thesis: thesis.pdf...
β
Thesis loaded (10000 words)
π― Using thesis content for precise matching...
β
Found 50 relevant talks!
π« Filtered out 12 talks matching exclusion criteria
π CONFERENCE SCHEDULE COMPLETE!
Conference: ASHG2025
Total talks in PDF: 323
Relevant to your interests: 50
Scheduling conflicts: 13
π Schedule saved to:
data/ASHG2025/ashg_schedule.mdExample Output with Author Highlighting:
### Fine-mapping of loci under directional selection reveals functional architecture
**Type:** π Poster | **Relevance Score:** 63.62% β Boosted from ~48%!
**β° Time:** Friday, October 17 at 1:35pm β 1:40pm
**π₯ Authors:** Javier Maravall-LΓ³pez, Ali Akbari, Gaspard Kerner, ... **β Alkes L. Price** *et al.* (6 total)
**π Abstract:**
Studies of directional selection inform our understanding of evolution and biology...Note the β star highlighting Alkes L. Price (your author of interest) even though he's the senior author!
Export to Excel:
python scripts/convert_schedule_to_excel.py
# Creates: data/ASHG2025/ashg2025_schedule.xlsxProgrammatic API:
from phd_agent.tools import ConferencePlanner
# Initialize planner
planner = ConferencePlanner(
conference_name="ASHG2025",
conference_dir="./data/ASHG2025"
)
# Parse PDF (cached after first run)
talks = planner.parse_conference_pdf("path/to/conference.pdf")
# Set research profile
planner.research_interests = [
"statistical fine-mapping",
"eQTL analysis",
"machine learning for genomics"
]
planner.authors_of_interest = [
"Alkes Price",
"Bogdan Pasaniuc",
"Nicholas Mancuso"
]
planner.exclusion_topics = [
"wet-lab protocols",
"clinical case studies"
]
planner.thesis_path = "/path/to/thesis.pdf"
# Index and find relevant talks
planner.index_talks()
relevant_talks = planner.find_relevant_talks(top_k=50, min_relevance_score=0.3)
# Generate schedule
planner.generate_schedule_markdown(
relevant_talks,
"data/ASHG2025/schedule.md"
)Research Interests File (research_interests.md):
# Research Interests
*Generated on: 2025-10-11*
## My Research Focus
- statistical fine-mapping, Bayesian approaches
- eQTL, multi-omics, regulatory elements
- perturb-seq, CRISPR perturbation
## Authors of Interest
*Senior authors, PIs, or collaborators to prioritize:*
- Alkes Price
- Bogdan Pasaniuc
- Nicholas Mancuso
- Jonathan Pritchard
## Topics to Exclude
*These topics will be filtered out from recommendations:*
- wet-lab protocols and techniques
- clinical case studies without methods
## Unpublished Work (Private)
*Thesis/dissertation stored locally for enhanced matching:*
- Path: `/Users/you/Documents/thesis.pdf`- PDF Parsing: Custom ASHG parser extracts title, abstract, authors (all authors including senior), day, time, location
- Caching: Talks cached as
.ashg2025_talks_cache.pkl, embeddings in.chromadb/ - Embedding: Uses
sentence-transformers(all-MiniLM-L6-v2) to encode talks and interests - Thesis Integration: If provided, thesis text (first 10k words) is weighted 2x in the query
- RAG Matching: ChromaDB vector search finds semantically similar talks
- Author Boosting: Talks featuring your authors of interest receive +15% relevance boost and are highlighted with β
- Exclusion Filtering: Keyword-based filtering removes unwanted topics (wet-lab, pure clinical)
- Conflict Detection: Groups talks by time slot, identifies overlaps
- Schedule Generation: Markdown output organized by day with relevance scores
- Excel Export: Optional conversion to Excel with sortable/filterable columns and color-coded conflicts
- First run: ~60 seconds (parse PDF + generate embeddings)
- Subsequent runs: ~2 seconds (load from cache)
- Memory: ~500MB for 300+ talks with embeddings
- Accuracy: Semantic similarity matching is far superior to keyword search
Currently optimized for:
- β ASHG (American Society of Human Genetics)
- π Generic parser for other conference formats (may need customization)
To add a new conference format, implement a custom parser in src/phd_agent/tools/conference_planner.py.
The PhD Agent is organized as a Python package with the following structure:
src/phd_agent/
βββ agent.py # Main PhdAgent class
βββ cli.py # Command-line entry point
βββ core/
β βββ react_agent.py # ReAct reasoning agent
β βββ phd_agent_tools.py # Tool wrappers
βββ tools/
β βββ paper_search.py # Academic paper discovery
β βββ paper_analyzer.py # AI-powered paper analysis
β βββ conference_planner.py # RAG-based conference scheduling
βββ integrations/
βββ mcp.py # GitHub and Notion connectivity
βββ slack.py # Slack team collaboration
βββ zotero.py # Reference management
βββ deepwiki.py # Codebase indexing
βββ slack_monitor.py # Paper monitoring
Additional directories:
scripts/: Utility scripts (meeting agenda, Excel export, etc.)tests/: Test suitedata/: Conference data and schedulesdocs/: Documentation
Create a .env file with the following:
# Claude AI
ANTHROPIC_API_KEY=your_api_key
# Slack (optional)
SLACK_BOT_TOKEN=your_bot_token
SLACK_APP_TOKEN=your_app_token
# Notion (optional)
NOTION_API_KEY=your_api_key
# GitHub (optional)
GITHUB_TOKEN=your_token
# Zotero (optional)
ZOTERO_API_KEY=your_api_key
ZOTERO_USER_ID=your_user_id
# DeepWiki (optional)
DEEPWIKI_API_KEY=your_api_key # For private repos
DEEPWIKI_MAX_CONCURRENCY=5See .env.example for a complete template.
- Slack & Zotero Setup Guide - Detailed setup instructions for Slack bot and Zotero integration
- Author Filter Guide - How to prioritize talks by specific authors
- ReAct Demo Tasks - Example tasks for the ReAct agent
Run the test suite:
pytest tests/ -vOr run individual tests:
python tests/test_agent.py
python tests/test_react_mock.pyContributions are welcome! This project is actively being developed as part of PhD research. Feel free to:
- Report bugs
- Suggest new features
- Submit pull requests
- Share research use cases
- Python 3.8+: Core language
- Claude AI (Anthropic): Advanced AI reasoning
- Async/Await: Efficient concurrent operations
- BeautifulSoup4: Web scraping
- ArXiv API: Academic paper access
- Slack SDK: Team collaboration
- Pyzotero: Reference management
- MCP SDK: Model Context Protocol support
- ChromaDB: Vector database for RAG
- Sentence Transformers: Semantic embeddings
- PyPDF: PDF text extraction
This project is developed for academic research purposes. Please cite if used in your research work.
Developed as part of PhD research to enhance academic productivity through AI automation.
Active Development - New features and integrations are regularly added based on research needs.
- β Oct 11, 2025: Conference Schedule Planner with RAG, thesis integration, exclusion filtering, and author-based prioritization
- β DeepWiki integration for codebase indexing
- β MCP integration for enhanced AI capabilities
- β Slack bot for team collaboration
- β Zotero reference management
- β Meeting agenda automation
- β Multi-source paper search
- π― Multi-conference support (ISMB, NeurIPS, etc.)
- π Google Calendar integration for conference schedules
- π Research progress visualization
- π Citation network analysis
- π Automated literature review generation
Built with β€οΈ for PhD researchers, by a PhD researcher
