_
_ __ ___ ___ ___ __ _ _ __ ___| |__ _ _
| '__/ _ \/ __|/ _ \/ _` | '__/ __| '_ \ _| |_ _| |_
| | | __/\__ \ __/ (_| | | | (__| | | |_ _|_ _|
|_| \___||___/\___|\__,_|_| \___|_| |_| |_| |_|
researchpp is a staged, retrieval-grounded research pipeline that generates markdown reports from a user query.
It decomposes a question into subquestions, retrieves web evidence, extracts structured notes, writes a synthesis, and runs a reviewer pass that can trigger one bounded retry.
Given a query, the app runs this pipeline:
- Plan: turn the query into a research goal, subquestions, and report outline.
- Search: retrieve results for each subquestion (in parallel).
- Extract: convert snippets into high-value structured evidence notes (in parallel).
- Write: generate a markdown report grounded in extracted notes.
- Review: evaluate report support quality and optionally trigger one retry path.
Outputs are persisted at each stage for inspection and debugging.
The workflow is orchestrated with LangGraph and a shared typed state object.
START
-> planner
-> searcher
-> extractor
-> writer
-> reviewer
-> approved -> END
-> failed (max retries reached) -> END
-> retry_search -> prepare_retry -> searcher
-> retry_write -> prepare_retry -> writer
Core orchestration lives in workflow.py:
ResearchWorkflowState: typed cross-node state contract.- Node functions (
_planner_node,_searcher_node, etc.) update partial state. - Conditional routing after review determines retry behavior.
- Graph topology is unchanged:
planner -> searcher -> extractor -> writer -> reviewer. - Parallelism is internal to stage implementations, not graph nodes:
searcher.py: subquestion searches run concurrently viaThreadPoolExecutor.extractor.py: per-subquestion evidence extraction runs concurrently viaThreadPoolExecutor.
- Why this is safe:
- each subquestion is independent in search and extraction
- these calls are I/O-bound (web + model API latency), so threads reduce wall-clock time
- Failure isolation is preserved:
- search failure for one subquestion ->
[]for that subquestion - extraction failure for one subquestion ->
[]for that subquestion
- search failure for one subquestion ->
- Output schemas and artifact formats are unchanged.
researchpp/
├── main.py # Typer CLI entrypoint
├── workflow.py # LangGraph orchestration + routing
├── schemas.py # Pydantic contracts between stages
├── planner.py # Query -> ResearchPlan
├── searcher.py # Tavily retrieval + normalization
├── extractor.py # Evidence extraction + filtering/dedup
├── writer.py # Report synthesis + deterministic sources section
├── reviewer.py # Structured quality review + retry signals
├── research # Bash launcher (`uv run python main.py`)
└── outputs/ # Persisted artifacts (json + markdown)
Defined in schemas.py:
ResearchPlanresearch_goalsubquestions[]report_outline[]
SearchResulttitle,url,snippet
EvidenceNotesubquestion,source_title,source_urlkey_point,evidence,relevance_reason- optional
confidence_scorein[0.0, 1.0]
ReviewDecisionapproved,needs_more_researchoverall_assessmentsupport_gaps[],revision_instructions[],weak_sections[]
LLM stages use structured outputs to enforce these contracts.
From pyproject.toml:
- Python
>=3.13 langgraphlangchain/langchain-openailangchain-tavily/tavily-pythonrichpyfiglettyperpython-dotenv
From the project root:
cd researchpp
uv syncCreate .env with required keys:
OPENAI_API_KEY=
TAVILY_API_KEY=
LANGSMITH_API_KEY=
LANGSMITH_TRACING=
LANGSMITH_PROJECT=main.py calls load_dotenv(), so these are read automatically at runtime.
Use the bash wrapper:
./research "Your research question here"Or call the CLI directly:
uv run python main.py "Your research question here"query (positional): user research query
--max-results INTEGER: top results per subquestion (min 1, default 5)
--max-notes INTEGER: max extracted notes per subquestion (min 1, default 4)
--report-title TEXT: optional report title
--report-output TEXT: optional custom markdown output path
--max-retries INTEGER: reviewer-triggered retry passes (0 or 1, default 1)
- Uses
ChatOpenAI(model="gpt-4o-mini"). - Produces validated
ResearchPlan.
- Uses Tavily (
langchain_tavily.TavilySearch). - Runs all subquestion searches concurrently with
ThreadPoolExecutor. - Normalizes response shape into
SearchResult. - If one subquestion search fails, that subquestion gets an empty list so pipeline continues.
- Structured extraction into
EvidenceNoteList. - Runs per-subquestion extraction concurrently with
ThreadPoolExecutor. - Applies deterministic post-filters for signal quality:
- rejects short/vague/generic notes
- prefers quantitative/comparative/mechanistic evidence
- Deduplicates notes by
(source_url, key_point).
- Synthesizes markdown from plan + notes only.
- Includes strict grounding instructions in prompt.
- On retry passes, consumes reviewer guidance (
revision_instructions,weak_sections). - Strips any model-generated
## Sourcesand appends deterministic source list from notes.
- Structured quality gate (
ReviewDecision). - Evaluates report support against extracted notes only.
- Signals whether to retry search or rewrite.
Artifacts are written to outputs/:
search_results.json: grouped normalized search results.notes.json: grouped extracted evidence notes.revised_report.md: latest report artifact (default naming for all runs).review.json: latest reviewer decision and retry count.- optional custom markdown path via
--report-output.
- Planner, writer, reviewer raise
RuntimeErroron stage failure. - Search and extraction are fault-tolerant per subquestion and continue with empty lists on failures.
- Retry loop is bounded:
- reviewer approves -> workflow ends
- reviewer requests retry and retry budget remains -> one retry path executes
- retry budget exhausted -> workflow ends with latest report
- The system is intentionally modular: each stage has one clear responsibility.
- Structured schemas reduce free-form LLM drift between stages.
- Persisted intermediate artifacts make outputs explainable and debuggable.
- Stages run in sequence, but search and extraction fan out subquestion work in parallel for latency reduction.