diff --git a/container/skills/README.md b/container/skills/README.md index 2b58a20..b6b475d 100644 --- a/container/skills/README.md +++ b/container/skills/README.md @@ -13,6 +13,15 @@ Anything the WhatsApp / WeCom / Discord / local-web agent should use belongs **h | `sds-gel-review/` | SDS gel image review | | `query-*` | Database / API usage (Ensembl, UniProt, KEGG, …) | | `blast-search/`, `pubmed-search/`, `sequence-analysis/` | Literature & sequence workflows | +| `bio-manuscript-*` | Community-contributed manuscript planning pipeline for idea screening, figure planning, manuscript drafting, refinement, and implementation blueprints | +| `bio-manuscript-common/` | Shared templates and helper scripts used by the manuscript pipeline skill family | + +## Community Skills + +Some runtime skills may be integrated from BioClaw community contributors when they prove useful in real workflows. The manuscript pipeline skill family currently staged here is integrated as a community-contributed workflow. + +Contributor reference: +- Yuhong Dong, Westlake University PhD candidate, BioClaw community contributor ## Developer-only skills diff --git a/container/skills/bio-analysis-system/SKILL.md b/container/skills/bio-analysis-system/SKILL.md new file mode 100644 index 0000000..39954ba --- /dev/null +++ b/container/skills/bio-analysis-system/SKILL.md @@ -0,0 +1,161 @@ +# bio-analysis-system + +**Step 5: Analysis system design (分析方法体系构建)** + +Build the analysis layer for the manuscript by identifying which analyses, tools, and biological validations should support each figure and each task. + +## Purpose + +1. Extract analysis patterns from related work +2. Borrow useful analyses from adjacent domains when needed +3. Map analyses to BioClaw-compatible tools or fallback software +4. Explain why each analysis is included and what biological claim it supports +5. Connect analyses to figure panels + +## Input Format + +```text +topic: [research topic] +paper_count: [number of related papers] +task_system: [task system] +metric_system: [metric system] +dataset_catalog: [dataset catalog] +``` + +## Workflow + +### Step 5.1: Extract analyses from existing work + +If enough related papers exist, inspect their figures and extract: + +- panel type +- analysis method +- software / package +- important parameters +- the scientific or biological conclusion the panel supports + +### Step 5.2: Borrow from adjacent fields + +If the field is still thin, adapt common analyses from nearby areas such as: + +- clustering +- marker visualization +- latent embedding visualization +- pathway enrichment +- cell-cell communication +- spatial statistics +- GRN analysis + +### Step 5.3: Categorize analyses + +Use three broad groups: + +- **Quantitative analyses** + - clustering + - metric computation + - statistical tests + - baseline comparisons +- **Qualitative analyses** + - spatial visualization + - feature / violin plots + - UMAP / t-SNE + - before / after alignment comparisons + - heatmaps +- **Biological analyses** + - cell annotation + - marker genes + - pathway enrichment + - GRN + - ligand-receptor communication + - spatial statistics + - trajectory analysis + +### Step 5.4: Map to BioClaw or fallback tools + +Whenever possible, map analysis needs to BioClaw-compatible skills or established tools. + +Examples: + +- clustering -> Scanpy / Leiden +- annotation -> CellTypist / SingleR +- marker plots -> Scanpy +- enrichment -> gseapy +- spatial statistics -> squidpy +- GRN -> pySCENIC +- communication -> CellChat-like workflow + +### Step 5.5: Standardize analysis descriptions + +For each analysis, define: + +- category +- purpose +- biological claim supported +- preferred tool +- fallback tool +- key function +- recommended parameters +- inputs / outputs +- mapped task +- mapped figure / panel + +## Output Format + +```markdown +# Analysis System + +## Analysis Sources +- Extracted from related papers: +- Borrowed from adjacent domains: + +## Quantitative Analyses + +### Clustering +- Category: +- Purpose: +- Biological claim supported: +- Preferred tool: +- Fallback tool: +- Key function: +- Recommended parameters: +- Inputs / outputs: +- Relevant tasks: +- Figure mapping: + +### Metric computation +- Category: +- Purpose: +- Preferred tools: +- Relevant tasks: +- Figure mapping: + +## Qualitative Analyses +- spatial plot +- marker / feature plot +- latent embedding plot +- heatmap +- before / after alignment visualization + +## Biological Analyses +- annotation +- marker recovery +- pathway enrichment +- GRN +- communication +- trajectory + +## Next Step +- Use the analysis system to design figures in Step 6 +``` + +## Usage + +```bash +/bio-analysis-system "spatial multi-omics integration | paper_count: 5 | task_system: [...] | metric_system: [...] | dataset_catalog: [...]" +``` + +## Notes + +1. Prefer analyses that directly support paper claims. +2. Make the biological readouts visible early; they should not appear only at the very end. +3. Map each major analysis to a concrete figure panel. diff --git a/container/skills/bio-dataset-search/SKILL.md b/container/skills/bio-dataset-search/SKILL.md new file mode 100644 index 0000000..1b9ec2f --- /dev/null +++ b/container/skills/bio-dataset-search/SKILL.md @@ -0,0 +1,135 @@ +# bio-dataset-search + +**Step 3: Dataset search and task matching (数据集搜索与匹配)** + +Find suitable datasets for each task and map datasets to the task system defined earlier in the manuscript pipeline. + +## Purpose + +1. Extract datasets from related papers when possible +2. Search public repositories directly when needed +3. Normalize dataset metadata into a common structure +4. Match datasets to tasks in a defendable way + +## Input Format + +```text +topic: [research topic] +task_system: [task system from Step 2] +paper_count: [number of related papers] +existing_papers: [optional list of related papers] +``` + +## Workflow + +### Step 3.1: Extract datasets from existing work + +If `paper_count >= 5`, start from the strongest existing papers. + +Read Methods / Data Availability sections and extract: + +- dataset name +- data source +- platform +- modality +- sample scale +- download path +- annotation availability + +### Step 3.2: Search datasets directly + +If there is not enough prior work, search repositories such as: + +- GEO +- ArrayExpress +- project-specific public portals + +Use keyword sets built from: + +- topic +- modality +- tissue / disease +- benchmark intent + +### Step 3.3: Normalize dataset metadata + +For each dataset, record: + +- source +- platform +- species +- tissue / disease +- sample size +- feature count +- modalities +- annotation quality +- histology / region metadata +- format +- preprocessing needs +- recommended task fit + +### Step 3.4: Match datasets to tasks + +A good match should satisfy: + +1. Every major task has at least one viable dataset +2. Dataset structure matches the task's technical assumptions +3. Download remains feasible +4. Metadata quality is sufficient for evaluation +5. Prefer at least one backup dataset per important task + +## Output Format + +```markdown +# Dataset Catalog + +## Data Sources +- Extracted from related papers: +- Direct repository search: +- Borrowed from adjacent domains: + +## Dataset Entries + +### Dataset 1: [name] +- Source: +- Platform: +- Species: +- Tissue / disease: +- Modalities: +- Sample scale: +- Annotation quality: +- Download URL: +- Format: +- Recommended tasks: +- Why it fits: + +## Dataset-Task Mapping +| Task | Recommended dataset | Why it fits | Notes | +|------|---------------------|-------------|-------| +| ... | ... | ... | ... | + +## Acquisition Notes +- GEO download hints +- Public portal download hints + +## Preprocessing Recommendations +| Dataset | Preprocessing needs | Suggested skill / tool | +|---------|---------------------|------------------------| +| ... | ... | ... | + +## Next Step +- Build the metric system in Step 4 +``` + +## Usage + +```bash +/bio-dataset-search "spatial multi-omics integration | paper_count: 5 | task_system: [task system from Step 2]" +``` + +## Notes + +1. Prefer datasets already used in related work when possible. +2. Verify links before committing them to the benchmark plan. +3. Capture QC and annotation metadata whenever available. +4. Match datasets to tasks based on actual experimental needs, not just popularity. diff --git a/container/skills/bio-figure-design/SKILL.md b/container/skills/bio-figure-design/SKILL.md new file mode 100644 index 0000000..496552c --- /dev/null +++ b/container/skills/bio-figure-design/SKILL.md @@ -0,0 +1,125 @@ +# bio-figure-design + +**Step 6: Figure design (Figure 详细设计)** + +Design the manuscript figures panel by panel, including the figure logic, panel content, and caption intent. + +## Purpose + +1. Design Figure 1 as the method / framework overview +2. Design Figures 2-N as task-driven application figures +3. Plan supplementary figures +4. Draft figure captions +5. Keep figure logic synchronized with manuscript claims + +## Input Format + +```text +topic: [research topic] +task_system: [task system] +dataset_catalog: [dataset catalog] +metric_system: [metric system] +analysis_system: [analysis system] +target_journal: [optional, default nat-communications] +``` + +## Workflow + +### Step 6.1: Design Figure 1 + +Figure 1 should explain the overall method and paper framing: + +- Panel a: method / model overview +- Panel b: data or modality overview +- Panel c: task overview +- Panel d: metric overview +- Panel e: analysis overview + +The goal is to make the full manuscript logic visible in one figure. + +### Step 6.2: Design Figures 2-N + +Each application figure should be task-first: + +- one major task per figure +- panel a: data flow / experimental setup +- panels b-d: quantitative evaluation +- panel e or later: qualitative / biological validation + +This keeps the paper organized around claims rather than around plots. + +### Step 6.3: Supplementary figures + +Use supplementary figures for: + +- ablations +- robustness checks +- extra markers +- extended datasets +- alternative parameter settings + +### Step 6.4: Caption planning + +Every figure should have a caption plan that explains: + +- what each panel shows +- what claim it supports +- what dataset it uses +- what metric or biological conclusion it demonstrates + +## Output Format + +```markdown +# Figure Designs + +## Figure 1: Framework Overview +- Panel a: +- Panel b: +- Panel c: +- Panel d: +- Panel e: +- Caption intent: + +## Figure 2: [Task 1] +- Panel a: +- Panel b: +- Panel c: +- Panel d: +- Panel e: +- Caption intent: + +## Figure 3: [Task 2] +... + +## Supplementary Figures +- Supplementary Figure 1: +- Supplementary Figure 2: + +## Design Notes +- visual consistency +- panel ordering logic +- expected take-home message per figure + +## Next Step +- Use the figure plan to draft manuscript text in Step 7 +``` + +## Figure Design Principles + +1. One main claim per figure +2. Quantitative evidence should appear before broad interpretation +3. Biological validation should be visible, not hidden +4. Reviewer-facing clarity matters more than decorative complexity +5. Figure order should match the manuscript story + +## Usage + +```bash +/bio-figure-design "topic: spatial multi-omics integration | task_system: [...] | dataset_catalog: [...] | metric_system: [...] | analysis_system: [...] | target_journal: nat-communications" +``` + +## Notes + +1. Keep Figure 1 conceptual and clean. +2. For application figures, tie every panel to a concrete task and metric. +3. Do not overload a figure if a supplementary figure can carry the extra material. diff --git a/container/skills/bio-human-feedback/SKILL.md b/container/skills/bio-human-feedback/SKILL.md new file mode 100644 index 0000000..a53aba3 --- /dev/null +++ b/container/skills/bio-human-feedback/SKILL.md @@ -0,0 +1,80 @@ +# bio-human-feedback + +**Phase 2.6: Human review checkpoint (人类反馈验证)** + +Present the refined proposal to a human reviewer, collect feedback, and decide whether to continue or loop back for revision. + +## Purpose + +1. Present the current proposal in a concise reviewable form +2. Wait for explicit human feedback +3. Record approval or requested changes +4. Route the workflow back to the correct phase if needed + +## Inputs + +- `FINAL_PROPOSAL.md` +- current round number + +## Workflow + +1. Present the proposal summary +2. Wait for approval or feedback +3. Classify the feedback severity +4. Record the response and decide the next phase + +## Human Review Summary Format + +```markdown +# Proposal Review - Round X + +## Innovation Summary +- ... + +## Figure Overview +- Figure 1: +- Figure 2: + +## Experimental Plan +| Task | Dataset | Metric | Baseline | +|------|---------|--------|----------| +| ... | ... | ... | ... | + +## Key Decisions Requiring Human Approval +1. ... +2. ... + +## Reviewer Summary +| Reviewer | Score / status | Main advice | +|----------|----------------|-------------| +| Editor | ... | ... | +| Computational | ... | ... | +| Biological | ... | ... | + +## Human Decision +- Approve and continue +- Request revisions +``` + +## Feedback Severity + +- **Critical**: return to early planning +- **Major**: return to design / manuscript refinement +- **Minor**: revise locally and continue + +## Files To Record + +- `refine-logs/human-feedback/feedback-round-X.md` +- `refine-logs/HUMAN_APPROVAL.md` + +## Usage + +```bash +/bio-human-feedback --round 2 --proposal refine-logs/FINAL_PROPOSAL.md +``` + +## Notes + +1. Do not continue automatically without explicit human approval. +2. Record all human feedback in the workspace. +3. Keep the requested decisions concise and actionable. diff --git a/container/skills/bio-innovation-check/SKILL.md b/container/skills/bio-innovation-check/SKILL.md new file mode 100644 index 0000000..09a9e37 --- /dev/null +++ b/container/skills/bio-innovation-check/SKILL.md @@ -0,0 +1,141 @@ +# bio-innovation-check + +**Step 1: Innovation assessment (创新性检测)** + +Estimate whether a research idea is sufficiently novel for a strong methods-style paper by expanding the topic and searching the literature. + +## Purpose + +1. Generate multiple topic variants and synonyms +2. Search PubMed, bioRxiv, and arXiv q-bio +3. Count and de-duplicate related papers +4. Assign a novelty level +5. Suggest how to sharpen or reposition the idea if needed + +## Input Format + +```text +topic: [research topic] +``` + +## Workflow + +### Step 1.1: Topic expansion + +Use several types of expansions: + +1. Core term substitution +2. Phrase re-ordering +3. Parent / child concept expansion +4. Adjacent-domain vocabulary borrowing +5. Method keyword enrichment + +Example: + +- "spatial multi-omics integration" +- "integration of spatial transcriptomics and proteomics" +- "spatial multi-modal data fusion" + +Target output: 15-20 topic variants by default. + +### Step 1.2: Literature search + +Search these sources: + +- PubMed +- bioRxiv +- arXiv q-bio + +Suggested pattern: + +```python +for variant in topic_variants: + results = search(variant, platforms=["PubMed", "bioRxiv", "arXiv"]) + all_papers.extend(results) + +unique_papers = deduplicate(all_papers, threshold=0.8) +``` + +### Step 1.3: Novelty scoring + +Use a simple first-pass threshold: + +```python +if paper_count <= 2: + level = "strong novelty / methods-journal candidate" +elif paper_count <= 5: + level = "promising but needs sharpening" +else: + level = "needs repositioning" +``` + +This is only a heuristic. Final judgment should still use human reasoning. + +### Step 1.4: Repositioning suggestions + +If the project is not yet strong enough, suggest improvements from one or more of these angles: + +1. Method angle +2. Task angle +3. Data / validation angle +4. Analysis angle + +## Output Format + +```markdown +# Innovation Assessment Report + +## Search Strategy +- Number of variants: +- Search sources: +- Search date: + +## Topic Variants +| No. | Variant | +|-----|---------| +| 1 | ... | + +## Search Results Summary +| Variant | PubMed | bioRxiv | arXiv | Total | +|---------|--------|---------|-------|-------| +| ... | ... | ... | ... | ... | + +## De-duplicated Counts +- Total related studies: +- Published papers: +- Preprints: + +## Novelty Decision +- Level: +- Reason: + +## Representative Related Work +1. [title] + - Source: + - Year: + - Main method: + - Overlap with the proposed idea: + +## Repositioning Suggestions +1. Method: +2. Task: +3. Data / validation: + +## Next Step +- If novelty is strong: continue to Step 2 +- If the idea needs sharpening: refine and continue +- If it needs repositioning: redesign before proceeding +``` + +## Usage + +```bash +/bio-innovation-check "spatial multi-omics integration" +``` + +## Notes + +1. Use timeouts because search latency varies by source. +2. De-duplication matters; otherwise novelty will be overestimated or underestimated. +3. Overlap scoring still needs human judgment. +4. Journal-specific novelty expectations can differ by field. diff --git a/container/skills/bio-manuscript-common/README.md b/container/skills/bio-manuscript-common/README.md new file mode 100644 index 0000000..c278d1c --- /dev/null +++ b/container/skills/bio-manuscript-common/README.md @@ -0,0 +1,13 @@ +## Bio Manuscript Common + +Shared assets for the BioClaw integration of `bio-manuscript-forge`. + +Contents: +- `templates/`: journal and manuscript planning templates +- `scripts/`: helper scripts copied from the upstream repository + +Integration notes: +- These assets were copied from `external/bio-manuscript-forge/bio-manuscript-forge/`. +- Runtime skills live under `/home/node/.claude/skills/` inside the BioClaw container. +- Keep skill-to-skill references within `container/skills/` and avoid `~/.openclaw/...` assumptions. +- Treat this directory as shared support data for the manuscript pipeline skill family. diff --git a/container/skills/bio-manuscript-common/scripts/generate_figure_image.py b/container/skills/bio-manuscript-common/scripts/generate_figure_image.py new file mode 100644 index 0000000..3072d11 --- /dev/null +++ b/container/skills/bio-manuscript-common/scripts/generate_figure_image.py @@ -0,0 +1,247 @@ +#!/usr/bin/env python3 +""" +Figure 图文生成器 + +调用 Gemini API 生成 Figure 1 Panel b-c-d-e 的图文。 +""" + +import os +from typing import Optional, Dict, List + + +class FigureImageGenerator: + """Figure 图文生成器""" + + def __init__(self, api_key: Optional[str] = None): + """ + 初始化 + + Args: + api_key: Gemini API key(也可从环境变量获取) + """ + self.api_key = api_key or os.environ.get("GEMINI_API_KEY") + if not self.api_key: + print("Warning: No Gemini API key provided. Image generation will be simulated.") + + def generate_panel_image(self, + panel_type: str, + content: str, + description: str, + output_path: str) -> Dict: + """ + 生成 Panel 图文 + + Args: + panel_type: Panel 类型(data/task/metric/analysis) + content: 内容描述 + description: 简短描述 + output_path: 输出路径 + + Returns: + 生成结果 + """ + prompt = self._build_prompt(panel_type, content, description) + + # 调用 Gemini API(简化版) + result = self._call_gemini(prompt, output_path) + + return result + + def _build_prompt(self, panel_type: str, content: str, description: str) -> str: + """构建 prompt""" + + panel_descriptions = { + "data": "数据类型介绍", + "task": "任务层级介绍", + "metric": "评价指标介绍", + "analysis": "分析方法介绍" + } + + prompt = f""" +生成一个简洁的科学论文插图,用于 Nature Methods 论文 Figure 1。 + +Panel 类型:{panel_descriptions.get(panel_type, panel_type)} + +内容:{content} + +简短描述:{description} + +要求: +1. 风格:Nature Methods 论文插图,简洁清晰,配色专业 +2. 布局:包含标题标签 + 示意图 + 简短描述文字 +3. 颜色:专业科学论文配色,避免过于鲜艳 +4. 尺寸:适合 Figure 1 Panel(约 60mm × 30mm 每个项目) +5. 格式:矢量图或高分辨率位图 + +请生成一个框框样式的设计,框框内包含: +- 内容名称(如 "Proteomics") +- 简洁的示意图 +- 一句话描述 +""" + return prompt + + def _call_gemini(self, prompt: str, output_path: str) -> Dict: + """调用 Gemini API""" + + if not self.api_key: + # 模拟生成 + return { + "success": False, + "message": "No API key provided. This is a simulation.", + "prompt": prompt, + "output_path": output_path, + "note": "Please provide GEMINI_API_KEY to enable actual image generation." + } + + try: + # 实际调用 Gemini API + # import google.generativeai as genai + # genai.configure(api_key=self.api_key) + # model = genai.GenerativeModel('gemini-pro-vision') + # response = model.generate_content(prompt) + # ... 保存图片 + + return { + "success": True, + "prompt": prompt, + "output_path": output_path, + "message": "Image generated successfully (placeholder)" + } + except Exception as e: + return { + "success": False, + "error": str(e), + "prompt": prompt + } + + def generate_figure1_panel_b(self, data_types: List[Dict]) -> Dict: + """ + 生成 Figure 1 Panel b(数据介绍) + + Args: + data_types: 数据类型列表 + [{"name": "Proteomics", "description": "蛋白质表达测量"}, ...] + + Returns: + 生成结果 + """ + results = [] + + for data_type in data_types: + result = self.generate_panel_image( + panel_type="data", + content=data_type["name"], + description=data_type["description"], + output_path=f"figure1_panel_b_{data_type['name'].lower()}.png" + ) + results.append(result) + + return { + "panel": "Figure 1 Panel b", + "results": results + } + + def generate_figure1_panel_c(self, tasks: List[Dict]) -> Dict: + """ + 生成 Figure 1 Panel c(任务介绍) + + Args: + tasks: 任务列表 + [{"name": "Vertical Integration", "description": "同细胞多模态"}, ...] + + Returns: + 生成结果 + """ + results = [] + + for task in tasks: + result = self.generate_panel_image( + panel_type="task", + content=task["name"], + description=task["description"], + output_path=f"figure1_panel_c_{task['name'].lower().replace(' ', '_')}.png" + ) + results.append(result) + + return { + "panel": "Figure 1 Panel c", + "results": results + } + + def generate_figure1_panel_d(self, metrics: List[Dict]) -> Dict: + """ + 生成 Figure 1 Panel d(指标介绍) + + Args: + metrics: 指标列表 + [{"name": "ARI", "description": "聚类一致性评估"}, ...] + + Returns: + 生成结果 + """ + results = [] + + for metric in metrics: + result = self.generate_panel_image( + panel_type="metric", + content=metric["name"], + description=metric["description"], + output_path=f"figure1_panel_d_{metric['name'].lower()}.png" + ) + results.append(result) + + return { + "panel": "Figure 1 Panel d", + "results": results + } + + def generate_figure1_panel_e(self, analyses: List[Dict]) -> Dict: + """ + 生成 Figure 1 Panel e(分析介绍) + + Args: + analyses: 分析列表 + [{"name": "Clustering", "description": "Leiden聚类"}, ...] + + Returns: + 生成结果 + """ + results = [] + + for analysis in analyses: + result = self.generate_panel_image( + panel_type="analysis", + content=analysis["name"], + description=analysis["description"], + output_path=f"figure1_panel_e_{analysis['name'].lower()}.png" + ) + results.append(result) + + return { + "panel": "Figure 1 Panel e", + "results": results + } + + +def main(): + """主函数""" + generator = FigureImageGenerator() + + # 示例:生成 Panel b(数据介绍) + data_types = [ + {"name": "Proteomics", "description": "蛋白质表达测量,提供细胞功能信息"}, + {"name": "Transcriptomics", "description": "全转录组表达,揭示基因调控"}, + {"name": "Epigenomics", "description": "表观遗传修饰,反映染色质状态"}, + ] + + result = generator.generate_figure1_panel_b(data_types) + + print(f"Panel: {result['panel']}") + print(f"Generated {len(result['results'])} items") + + for r in result['results']: + print(f" - {r.get('output_path', 'N/A')}: {r.get('message', r.get('error', 'Unknown'))}") + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/container/skills/bio-manuscript-common/scripts/integrate_omicsclaw.py b/container/skills/bio-manuscript-common/scripts/integrate_omicsclaw.py new file mode 100644 index 0000000..afcf11e --- /dev/null +++ b/container/skills/bio-manuscript-common/scripts/integrate_omicsclaw.py @@ -0,0 +1,194 @@ +#!/usr/bin/env python3 +""" +OmicsClaw 联动脚本 + +封装 OmicsClaw CLI 调用,用于实际数据分析。 +""" + +import subprocess +import os +from typing import Optional, Dict, List + + +class OmicsClawRunner: + """OmicsClaw 运行器""" + + def __init__(self, omicsclaw_path: str = "omicsclaw"): + """ + 初始化 + + Args: + omicsclaw_path: omicsclaw 命令路径 + """ + self.omicsclaw_path = omicsclaw_path + + def run_skill(self, + skill_name: str, + input_file: str, + output_dir: Optional[str] = None, + params: Optional[Dict] = None) -> Dict: + """ + 运行 OmicsClaw skill + + Args: + skill_name: skill 名称 + input_file: 输入文件路径 + output_dir: 输出目录 + params: 参数字典 + + Returns: + 运行结果 + """ + cmd = [self.omicsclaw_path, "run", skill_name, "--input", input_file] + + if output_dir: + cmd.extend(["--output", output_dir]) + + if params: + for key, value in params.items(): + cmd.extend([f"--{key}", str(value)]) + + # 运行命令 + result = subprocess.run(cmd, capture_output=True, text=True) + + return { + "skill": skill_name, + "command": " ".join(cmd), + "returncode": result.returncode, + "stdout": result.stdout, + "stderr": result.stderr, + "success": result.returncode == 0 + } + + def preprocess(self, input_file: str, output_dir: str) -> Dict: + """ + 数据预处理 + + Args: + input_file: 输入文件 + output_dir: 输出目录 + + Returns: + 运行结果 + """ + return self.run_skill( + skill_name="spatial-preprocess", + input_file=input_file, + output_dir=output_dir + ) + + def cluster(self, + input_file: str, + method: str = "leiden", + resolution: float = 0.5) -> Dict: + """ + 职类分析 + + Args: + input_file: 输入文件 + method: 职类方法 + resolution: 分辨率参数 + + Returns: + 运行结果 + """ + return self.run_skill( + skill_name="spatial-domains", + input_file=input_file, + params={"method": method, "resolution": resolution} + ) + + def annotate(self, input_file: str, method: str = "markers") -> Dict: + """ + 细胞类型注释 + + Args: + input_file: 输入文件 + method: 注释方法 + + Returns: + 运行结果 + """ + return self.run_skill( + skill_name="spatial-annotate", + input_file=input_file, + params={"method": method} + ) + + def find_markers(self, input_file: str, output_dir: str) -> Dict: + """ + Marker 基因识别 + + Args: + input_file: 输入文件 + output_dir: 输出目录 + + Returns: + 运行结果 + """ + return self.run_skill( + skill_name="sc-markers", + input_file=input_file, + output_dir=output_dir + ) + + def spatial_statistics(self, input_file: str) -> Dict: + """ + 空间统计分析 + + Args: + input_file: 输入文件 + + Returns: + 运行结果 + """ + return self.run_skill( + skill_name="spatial-statistics", + input_file=input_file + ) + + def enrichment(self, + gene_list: List[str], + database: str = "KEGG_2021_Human") -> Dict: + """ + 通路富集分析 + + Args: + gene_list: 基因列表 + database: 富集数据库 + + Returns: + 运行结果 + """ + # 创建临时文件 + temp_file = "/tmp/gene_list.txt" + with open(temp_file, "w") as f: + f.write("\n".join(gene_list)) + + return self.run_skill( + skill_name="spatial-enrichment", + input_file=temp_file, + params={"database": database} + ) + + +def main(): + """主函数""" + runner = OmicsClawRunner() + + # 示例:预处理 + print("示例:运行 spatial-preprocess") + result = runner.preprocess( + input_file="data/example.h5ad", + output_dir="results/" + ) + + print(f"命令: {result['command']}") + print(f"成功: {result['success']}") + + if not result['success']: + print(f"错误: {result['stderr']}") + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/container/skills/bio-manuscript-common/scripts/topic_synonym_transform.py b/container/skills/bio-manuscript-common/scripts/topic_synonym_transform.py new file mode 100644 index 0000000..3744699 --- /dev/null +++ b/container/skills/bio-manuscript-common/scripts/topic_synonym_transform.py @@ -0,0 +1,171 @@ +#!/usr/bin/env python3 +""" +Topic 同义变换生成器 + +生成研究主题的多个语义等价表达,用于文献搜索。 +""" + +import re +from typing import List, Dict + + +class TopicSynonymTransformer: + """Topic 同义变换器""" + + # 领域术语映射 + DOMAIN_TERMS = { + "空间组学": ["spatial omics", "spatial transcriptomics", "spatial biology"], + "多组学": ["multi-omics", "multi-modal", "multimodal"], + "整合": ["integration", "fusion", "alignment", "mapping"], + "单细胞": ["single-cell", "scRNA-seq", "single cell"], + "转录组": ["transcriptomics", "RNA-seq", "gene expression"], + "蛋白质组": ["proteomics", "protein expression"], + "表观遗传": ["epigenomics", "epigenetic", "ATAC-seq", "ChIP-seq"], + "细胞类型": ["cell type", "cell identity", "cell annotation"], + "轨迹": ["trajectory", "pseudotime", "development"], + "通讯": ["communication", "interaction", "signaling"], + } + + def __init__(self): + self.variants = [] + + def transform(self, topic: str) -> List[str]: + """ + 生成 topic 的同义变换 + + Args: + topic: 原始研究主题 + + Returns: + 变体列表 + """ + self.variants = [] + + # 1. 原始 topic + self.variants.append(topic) + + # 2. 中文翻译 + en_topic = self._translate_to_english(topic) + if en_topic != topic: + self.variants.append(en_topic) + + # 3. 术语替换 + term_variants = self._replace_terms(en_topic) + self.variants.extend(term_variants) + + # 4. 词序调整 + order_variants = self._reorder_words(en_topic) + self.variants.extend(order_variants) + + # 5. 上下位扩展 + hierarchy_variants = self._expand_hierarchy(en_topic) + self.variants.extend(hierarchy_variants) + + # 去重 + self.variants = list(set(self.variants)) + + return self.variants + + def _translate_to_english(self, topic: str) -> str: + """中文翻译为英文""" + # 简单的术语替换 + result = topic + for cn, en_list in self.DOMAIN_TERMS.items(): + if cn in topic: + result = result.replace(cn, en_list[0]) + return result + + def _replace_terms(self, topic: str) -> List[str]: + """术语替换""" + variants = [] + + for cn, en_list in self.DOMAIN_TERMS.items(): + for en in en_list: + # 检查是否有相关术语 + for other_cn, other_en_list in self.DOMAIN_TERMS.items(): + for other_en in other_en_list: + if other_en in topic.lower(): + new_topic = topic.lower().replace(other_en, en) + if new_topic != topic.lower(): + variants.append(new_topic) + + return variants + + def _reorder_words(self, topic: str) -> List[str]: + """词序调整""" + variants = [] + words = topic.lower().split() + + if len(words) >= 3: + # 交换前两个词 + variant1 = ' '.join([words[1], words[0]] + words[2:]) + variants.append(variant1) + + # "integration of X and Y" → "X and Y integration" + if "integration" in words: + idx = words.index("integration") + if idx == 0: + variant2 = ' '.join(words[1:] + ["integration"]) + variants.append(variant2) + + return variants + + def _expand_hierarchy(self, topic: str) -> List[str]: + """上下位扩展""" + variants = [] + + # 上位词 + if "spatial" in topic.lower(): + # 上位:multi-modal + variants.append(topic.lower().replace("spatial", "multi-modal")) + variants.append(topic.lower().replace("spatial", "single-cell")) + + if "multi-omics" in topic.lower(): + # 下位:具体模态 + variants.append(topic.lower().replace("multi-omics", "transcriptomics proteomics")) + variants.append(topic.lower().replace("multi-omics", "RNA protein")) + + return variants + + def generate_search_queries(self, topic: str, max_variants: int = 15) -> List[Dict]: + """ + 生成搜索查询 + + Returns: + 查询列表,每个包含 topic 和搜索平台 + """ + variants = self.transform(topic)[:max_variants] + + queries = [] + for i, variant in enumerate(variants): + queries.append({ + "id": i + 1, + "query": variant, + "platforms": ["PubMed", "bioRxiv", "arXiv q-bio"] + }) + + return queries + + +def main(): + """主函数""" + transformer = TopicSynonymTransformer() + + # 示例 + topic = "空间多组学整合" + variants = transformer.transform(topic) + + print(f"原始 Topic: {topic}") + print(f"\n生成的 {len(variants)} 个变体:") + for i, v in enumerate(variants, 1): + print(f" {i}. {v}") + + print(f"\n搜索查询:") + queries = transformer.generate_search_queries(topic) + for q in queries: + print(f" Query {q['id']}: {q['query']}") + print(f" Platforms: {', '.join(q['platforms'])}") + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/container/skills/bio-manuscript-common/templates/nature-methods/figure_template.md b/container/skills/bio-manuscript-common/templates/nature-methods/figure_template.md new file mode 100644 index 0000000..2a90e31 --- /dev/null +++ b/container/skills/bio-manuscript-common/templates/nature-methods/figure_template.md @@ -0,0 +1,33 @@ +# Figure Template + +## Figure Planning Block + +### Figure title + +[Figure title] + +### Scientific claim + +[What this figure proves] + +### Panel layout + +- Panel a: +- Panel b: +- Panel c: +- Panel d: +- Panel e: + +### Dataset / benchmark + +- dataset: +- baseline: +- metric: + +### Biological interpretation + +[What biological conclusion or validation this figure supports] + +### Caption intent + +[2-4 sentences describing what the full figure caption should communicate] diff --git a/container/skills/bio-manuscript-common/templates/nature-methods/introduction_template.md b/container/skills/bio-manuscript-common/templates/nature-methods/introduction_template.md new file mode 100644 index 0000000..090e40f --- /dev/null +++ b/container/skills/bio-manuscript-common/templates/nature-methods/introduction_template.md @@ -0,0 +1,35 @@ +# Introduction Template + +## Five-Paragraph Structure + +### Paragraph 1: Field background + +Introduce the field, explain why the problem matters, and highlight the broad motivation. + +### Paragraph 2: Related work + +Summarize the strongest prior methods and their main ideas. + +### Paragraph 3: Current limitations + +State the main gaps in existing methods, benchmarks, analyses, or biological interpretation. + +### Paragraph 4: Our method + +Introduce the proposed method and clearly separate: + +- conceptual novelty +- algorithmic novelty +- task / analysis novelty + +### Paragraph 5: Significance + +Explain why the method matters technically, biologically, and methodologically. + +## Writing Checklist + +- concise field motivation +- reviewer-friendly related work summary +- explicit gap statement +- concrete method positioning +- realistic significance claim diff --git a/container/skills/bio-manuscript-common/templates/nature-methods/manuscript_template.md b/container/skills/bio-manuscript-common/templates/nature-methods/manuscript_template.md new file mode 100644 index 0000000..7895bbf --- /dev/null +++ b/container/skills/bio-manuscript-common/templates/nature-methods/manuscript_template.md @@ -0,0 +1,45 @@ +# Manuscript Template + +## Title + +[Working title] + +## Abstract + +- background: +- gap: +- method: +- results: +- significance: + +## Introduction + +- field background +- related work +- limitations +- our method +- significance + +## Results + +- Figure 1 overview +- Figure 2 task result +- Figure 3 task result +- Figure 4 task result +- Figure 5 task result + +## Discussion + +- strengths +- biological implications +- limitations +- future directions + +## Methods + +- preprocessing +- model +- training +- metrics +- analyses +- baselines diff --git a/container/skills/bio-manuscript-common/templates/nature-methods/methods_template.md b/container/skills/bio-manuscript-common/templates/nature-methods/methods_template.md new file mode 100644 index 0000000..d8ce1b0 --- /dev/null +++ b/container/skills/bio-manuscript-common/templates/nature-methods/methods_template.md @@ -0,0 +1,46 @@ +# Methods Template + +## Data + +- dataset sources +- accession numbers +- preprocessing assumptions + +## Preprocessing + +- QC +- normalization +- feature selection +- dimensionality reduction + +## Model + +- architecture overview +- key modules +- latent space definition + +## Training + +- losses +- optimization +- scheduling +- calibration / regularization + +## Evaluation + +- metrics +- baselines +- statistical testing + +## Biological Analyses + +- marker analysis +- enrichment +- spatial statistics +- communication / GRN / trajectory if relevant + +## Reproducibility + +- software versions +- config files +- output artifacts diff --git a/container/skills/bio-manuscript-pipeline/SKILL.md b/container/skills/bio-manuscript-pipeline/SKILL.md new file mode 100644 index 0000000..74b03c6 --- /dev/null +++ b/container/skills/bio-manuscript-pipeline/SKILL.md @@ -0,0 +1,622 @@ +# bio-manuscript-pipeline + +**End-to-end pipeline from structured research input to a full manuscript plan (一条龙 Pipeline)** + +BioClaw integration notes: +- This skill is staged under `container/skills/` as part of a multi-skill manuscript pipeline. +- Shared templates and helper scripts are available under the sibling directory `bio-manuscript-common/`. +- When this pipeline needs supporting capabilities, prefer the copied BioClaw sibling skills in `container/skills/` over any `~/.openclaw/...` layout assumptions. +- This skill family is being integrated as a BioClaw community-contributed workflow. +- Contributor reference for attribution and documentation: Yuhong Dong, Westlake University PhD candidate, BioClaw community contributor / BioClaw 共创团队 contributor. +- In BioClaw, treat the sibling manuscript skills as stepwise companion skills. You should explicitly follow their guidance phase by phase rather than assuming an automatic runtime dispatcher. +- If a later phase depends on outputs from an earlier phase, write those outputs into the group workspace first and then continue with the next sibling skill using those artifacts as context. +- At the end of each substantial run, also write a concise human-readable execution summary. Prefer `FINAL_EXEC_SUMMARY.md`; for integration-focused runs, also write `INTEGRATION_TEST_REPORT.md`. + +--- + +## Welcome + +Welcome to Bio-Manuscript-Forge. This workflow helps turn a rough research idea into a manuscript-ready planning package. + +### Input Template + +Provide your project in the following structure: + +```text +topic: [research topic] +base_work: + - paper: [related paper link] + - code: [related code repository] + +innovation: [one-sentence innovation summary] +- algorithmic novelty (算法创新性): [core method novelty] +- tasks (任务): [task1, task2, task3, ...] +- data (数据): [dataset source or type] +- benchmark: [evaluation benchmark] +- metrics (计算指标): [metric1, metric2, ...] +- biological analyses (生物学分析手段): [how biological significance will be shown] + +demo_data: [demo dataset link] +target_journal: [optional, default nat-communications] +num_refine_rounds: [optional, default 2] +``` + +### Example Input + +``` +topic: spatial multi-omics integration +base_work: + - paper: https://www.nature.com/articles/s41592-021-01336-8 + - code: https://github.com/broadinstitute/Tangram + +innovation: jointly align spatial transcriptomics and proteomics while preserving tissue-domain boundaries +- algorithmic novelty (算法创新性): boundary-aware cross-modal alignment with explicit domain-consistency regularization +- tasks (任务): cell annotation, spatial domain detection, cross-modal integration, biological interpretation +- data (数据): public spatial transcriptomics and spatial proteomics cohorts with matched single-cell references +- benchmark: compare against mapping, domain, and integration baselines on public tumor datasets +- metrics (计算指标): ARI, NMI, Macro-F1, boundary preservation score, biological consistency +- biological analyses (生物学分析手段): + - marker recovery across modalities + - pathway enrichment consistency + - neighborhood preservation + - tissue-boundary case studies + +demo_data: https://zenodo.org/record/0000000 +target_journal: nat-communications +num_refine_rounds: 2 +``` + +### Expected Outputs + +| File | Content | +|------|---------| +| **PPT** | Lab meeting / progress presentation | +| **FINAL_PROPOSAL** | Full research proposal | +| **Figure 2-7 (v3)** | Detailed task-wise figure designs | +| **Manuscript text (v2)** | Introduction, Results, Discussion, Methods | + +--- + +Provide the project description and the pipeline can begin. + +--- + +## Purpose + +Run the full manuscript pipeline, generate a journal-style plan, and iteratively refine it through reviewer-style feedback. + +## Input Schema + +```text +topic: [research topic] +base_work: [paper links + code links] +innovation: [high-level innovation summary] +- algorithmic novelty (算法创新性): [core algorithmic novelty] +- tasks (任务): [downstream tasks, comma-separated] +- data (数据): [dataset source / type] +- benchmark: [benchmark dataset or evaluation setup] +- metrics (计算指标): [safety + task metrics such as ASR, ARI, etc.] +- biological analyses (生物学分析手段): [marker genes, pathways, neighborhood analysis, etc.] + +demo_data: [demo dataset link] +target_journal: [optional, default nat-communications] +num_refine_rounds: [optional, default 2] +``` + +### Example Input (public-safe sample) + +``` +topic: spatial multi-omics integration +base_work: + - paper: https://www.nature.com/articles/s41592-021-01336-8 + - code: https://github.com/broadinstitute/Tangram + +innovation: jointly align spatial transcriptomics and proteomics while preserving tissue-domain boundaries +- algorithmic novelty (算法创新性): boundary-aware cross-modal alignment with explicit domain-consistency regularization +- tasks (任务): cell annotation, spatial domain detection, cross-modal integration, biological interpretation +- data (数据): public spatial transcriptomics and spatial proteomics cohorts with matched single-cell references +- benchmark: compare against mapping, domain, and integration baselines on public tumor datasets +- metrics (计算指标): ARI, NMI, Macro-F1, boundary preservation score, biological consistency +- biological analyses (生物学分析手段): + - marker recovery across modalities + - pathway enrichment consistency + - neighborhood preservation + - tissue-boundary case studies + +demo_data: https://zenodo.org/record/0000000 +target_journal: nat-communications +num_refine_rounds: 2 +``` + +### Field Guide + +| Field | Required | Description | +|------|----------|-------------| +| topic | yes | concise research topic | +| base_work | yes | paper + code links | +| innovation | yes | high-level idea plus structured subfields | +| demo_data | yes | demo dataset link | +| target_journal | no | default `nat-communications` | +| num_refine_rounds | no | default `2` | + +### Innovation Subfields + +| Subfield | Description | Example | +|----------|-------------|---------| +| algorithmic novelty (算法创新性) | core method novelty | attention entropy, loss redesign, architecture change | +| tasks (任务) | downstream tasks covered | cell annotation, perturbation, GRN inference | +| data (数据) | dataset source / type | public cohorts, user data, target tissue | +| benchmark | evaluation setup | existing benchmark or new benchmark | +| metrics (计算指标) | safety + task metrics | ASR, Accuracy, F1, ARI, Pearson | +| biological analyses (生物学分析手段) | how biology will be demonstrated | marker gene, pathway, regulatory links | + +## Execution Flow + +### Phase 1: System building (Steps 1-5) + +**Input parsing**: first extract the key signals from user input: +- `topic` → 用于创新性搜索 +- `base_work` → 提取已有工作数据集、指标、方法 +- `innovation.algorithmic novelty` / `innovation.算法创新性` -> novelty assessment +- `innovation.tasks` / `innovation.任务` -> task system design +- `innovation.data` / `innovation.数据` -> dataset search direction +- `innovation.metrics` / `innovation.计算指标` -> metric system design +- `innovation.biological analyses` / `innovation.生物学分析手段` -> analysis system design + +``` +Step 1: 创新性检测 +├─ 解析输入:topic, base_work, innovation.算法创新性 +├─ 调用 searxng/web_search 搜索 +├─ Topic 同义变换生成 10-20 个变体 +├─ 搜索 PubMed + bioRxiv + arXiv q-bio +├─ 统计相似文章数量 +├─ 结合 innovation.算法创新性 判断创新性级别 +└─ 输出:01_INNOVATION_ASSESSMENT.md + +Step 2: 任务体系构建 +├─ 解析输入:innovation.任务 +├─ 若用户提供任务列表 → 直接使用 +├─ 若未提供 → 搜索领域主要任务分类 +├─ 识别任务层级(Level 1-4) +├─ 确保难度递进 +└─ 输出:02_TASK_SYSTEM.md + +Step 3: 数据集搜索 +├─ 解析输入:innovation.数据, innovation.benchmark, demo_data +├─ 若用户提供数据描述 → 搜索匹配数据集 +├─ 从 base_work 论文提取数据集 +├─ 数据集与任务匹配 +└─ 输出:03_DATASET_CATALOG.md + +Step 4: 指标体系构建 +├─ 解析输入:innovation.计算指标 +├─ 若用户提供指标 → 直接使用并补充 +├─ 若未提供 → 从已有工作提取指标 +├─ 分类:安全指标 + 任务指标 +└─ 输出:04_METRIC_SYSTEM.md + +Step 5: 分析方法体系 +├─ 解析输入:innovation.生物学分析手段 +├─ 若用户提供分析手段 → 直接使用并补充 +├─ 若未提供 → 从已有工作提取分析方法 +├─ 标注 OmicsClaw/Bioclaw skill +├─ 说明为什么用、证明什么、体现什么生物学意义 +└─ 输出:05_ANALYSIS_SYSTEM.md +``` + +### Phase 2: 设计与文案(Steps 6-7) + +**⚠️ 核心原则**: +1. **任务为先**:Figure 2-N 每个对应一个任务,数据/指标/分析随任务而定 +2. **分析增强**:每个 Figure 必须包含安全 + 生物学分析 +3. **文案同步**:Figure 改完立即更新 Results + +``` +Step 6: Figure 设计 +│ +├─ Figure 1:算法创新性(方法框架) +│ ├─ Panel a:方法 Overview +│ ├─ Panel b:创新点示意 +│ ├─ Panel c:模型覆盖 +│ ├─ Panel d:任务覆盖 +│ └─ Panel e:指标体系 +│ +├─ Figure 2-N:每个 Figure = 一个任务 ⭐ 任务为先原则 +│ │ +│ ├─ Panel a: 任务 Overview(数据流) +│ │ +│ ├─ Panel b-d: 定量测评 +│ │ ├─ 多模型对比 +│ │ ├─ ASR 降低 +│ │ └─ 任务指标保持 +│ │ +│ ├─ Panel e: Technical analysis ⭐ must include +│ │ ├─ representation pattern shifts +│ │ ├─ error / uncertainty analysis +│ │ └─ failure-mode or boundary-case inspection +│ │ +│ ├─ Panel f: 生物学分析 ⭐ 必须包含 +│ │ ├─ Marker gene recovery +│ │ ├─ Pathway preservation +│ │ └─ 具体生物学意义 +│ │ +│ ├─ Panel g: In-depth case studies ⭐ 1-2 cases +│ │ ├─ concrete biological question +│ │ ├─ baseline vs proposed method comparison +│ │ └─ interpretation of recovered biological structure +│ │ +│ └─ 数据/指标/分析依据任务选取 +│ +├─ Figure N+1: Summary + 生物学意义总结 +│ +└─ 输出:06_FIGURE_DESIGNS/ + +Step 6.5: 文案同步检查 ⭐ 必须 +├─ Figure 有这个 Panel → Results 有对应段落? +├─ Figure 有这个案例 → Results 有详细展开? +└─ 检查通过才能进入下一步 + +Step 7: 论文文案生成 +│ +├─ Introduction(5段) +│ ├─ 第一段:领域介绍 +│ ├─ 第二段:相关工作调研 +│ ├─ 第三段:现有方法不足 +│ ├─ 第四段:本文方法介绍 +│ └─ 第五段:意义与应用 +│ +├─ Results(与 Figure 对应)⭐ 结构对齐 +│ ├─ 2.1 Overview(对应 Figure 1) +│ ├─ 2.2 Task 1 / Main claim(对应 Figure 2) +│ │ ├─ quantitative evaluation +│ │ ├─ technical analysis +│ │ ├─ biological analysis +│ │ └─ case study +│ ├─ 2.3 Task 2 / Main claim(对应 Figure 3) +│ ├─ ...每个任务一个 section +│ └─ 2.N Summary(对应最后一个 Figure) +│ +├─ Discussion +│ ├─ 方法优势总结 +│ ├─ 安全-生物学结合意义 ⭐ +│ ├─ 与现有方法对比 +│ ├─ 方法局限性 +│ └─ 未来方向 +│ +├─ Methods +│ ├─ 数据预处理 +│ ├─ 模型架构 +│ ├─ 任务特定方法 ⭐ 按任务组织 +│ ├─ 生物学分析方法 ⭐ +│ ├─ 统计分析 +│ └─ 代码与数据可用性 +│ +└─ 输出:07_MANUSCRIPT_TEXT/ +``` + +**Figure 设计检查清单**: + +``` +- [ ] Figure 1 是方法框架? +- [ ] Figure 2-N 每个对应一个任务? +- [ ] 每个 Figure 包含多模型对比? +- [ ] 每个 Figure 有定量测评 Panel? +- [ ] 每个 Figure 有安全分析 Panel? ⭐ +- [ ] 每个 Figure 有生物学分析 Panel? ⭐ +- [ ] 每个 Figure 有 1-2 个深入案例? ⭐ +- [ ] 分析手段多样化? +- [ ] Results 结构与 Figure 对应? ⭐ +``` + +### Phase 2.5: Refine Loop ⭐ + +``` +Step 7.5: 三审稿人迭代优化 +│ +├─ Round 0: 保存初始方案 +│ └─ 输出:refine-logs/round-0-initial-proposal.md +│ +├─ Round 1 Review: +│ ├─ Editor Review(创新性评估,Nature子刊标准) +│ │ ├─ 概念创新 / 方法创新 / 应用创新 +│ │ └─ 评分:创新性 / 可行性 / 推荐度 +│ │ +│ ├─ 计算审稿人 Review(算法/方法评审) +│ │ ├─ 算法设计合理性 / 方法创新性 +│ │ ├─ 实验设计严谨性(Baseline/指标/Ablation) +│ │ └─ 评分:方法创新 / 技术严谨 / 代码可行 +│ │ +│ ├─ 生物分析审稿人 Review(生物学意义评审) +│ │ ├─ 生物学意义 / 分析设计合理性 +│ │ ├─ 数据集选择合理性 +│ │ └─ 评分:生物意义 / 分析设计 / 数据选择 +│ │ +│ └─ 输出:refine-logs/round-1/ +│ +├─ Round 1 Refinement: +│ ├─ 汇总三审稿人意见 +│ ├─ 问题分类(Critical/Major/Minor) +│ ├─ 逐条响应和修改 +│ ├─ 更新 Proposal +│ └─ 输出:refine-logs/round-1/refinement.md +│ +├─ Round 2 Review:(同 Round 1) +│ └─ 输出:refine-logs/round-2/ +│ +├─ Round 2 Refinement: +│ └─ 输出:refine-logs/round-2/refinement.md +│ +└─ 最终输出: + ├─ refine-logs/REVIEW_SUMMARY.md(每轮汇总) + ├─ refine-logs/FINAL_PROPOSAL.md(最终方案) + ├─ refine-logs/score-history.md(评分历史) + └─ refine-logs/REFINEMENT_REPORT.md(完整报告) +``` + +### Phase 2.6: 人类反馈验证 ⭐ NEW + +``` +Step 7.6: 人类反馈循环 +│ +├─ 呈现 Proposal +│ ├─ 展示 FINAL_PROPOSAL.md 核心内容 +│ ├─ 包含:创新点、Figure 设计、实验方案、关键修改 +│ └─ 格式:结构化摘要 + 关键决策点 +│ +├─ 等待人类反馈 +│ ├─ 选项 A: 同意 → 继续 Phase 3 +│ └─ 选项 B: 有意见 → 收集反馈内容 +│ +├─ 反馈处理 +│ ├─ 如果同意 → 记录并进入 Phase 3 +│ └─ 如果不同意 → +│ ├─ 记录反馈意见到 refine-logs/human-feedback/ +│ ├─ 根据反馈类型决定返回点: +│ │ ├─ Phase 1 级问题:创新性/任务体系需重构 +│ │ ├─ Phase 2 级问题:Figure/文案需调整 +│ │ └─ Phase 2.5 级问题:细节优化 +│ ├─ 执行迭代修改 +│ ├─ 重新运行 Phase 2.5 Refine Loop +│ └─ 再次呈现给人类验证 +│ +└─ 输出: + ├─ refine-logs/human-feedback/feedback-round-X.md + └─ refine-logs/HUMAN_APPROVAL.md(最终批准记录) +``` + +**人类反馈处理流程:** + +``` +人类反馈 → 问题分类 → 返回点决策 +│ +├─ Critical 问题(创新性方向错误) +│ └─ 返回 Phase 1 → 重新评估创新点 +│ +├─ Major 问题(设计/方案需要大改) +│ └─ 返回 Phase 2 → 调整 Figure/文案 +│ +├─ Minor 问题(细节优化) +│ └─ 返回 Phase 2.5 → Refine Loop +│ +└─ 批准 + └─ 进入 Phase 3 +``` + +**反馈收集格式:** + +```markdown +## 人类反馈 Round X + +**反馈时间**: YYYY-MM-DD HH:MM +**反馈内容**: [用户意见] +**问题级别**: Critical / Major / Minor +**返回阶段**: Phase 1 / Phase 2 / Phase 2.5 +**修改建议**: [AI 分析后的修改方案] + +--- + +## 修改执行记录 + +- [ ] 修改项 1 +- [ ] 修改项 2 +... +``` + +### Phase 3: 验证与汇报(Steps 8-11) + +``` +Step 8: 代码修改方案 +├─ 克隆原有代码仓库 +├─ 分析代码结构 +├─ 映射创新点到修改位置 +├─ 设计新增文件 + 修改文件 +└─ 输出:08_CODE_MODIFICATION_PLAN.md + +Step 9: Demo 快速验证 +├─ 应用代码修改 +├─ 下载 Demo 数据 +├─ Subsample + 少 epoch 快速运行 +├─ 可行性判断 +├─ 如果不可行 → 修改建议 +└─ 输出:09_DEMO_VALIDATION.md + +Step 10: 详细分析执行(可选) +├─ 调用 OmicsClaw/Bioclaw +├─ 运行完整分析 +├─ 生成实际数据 +└─ 输出:10_ANALYSIS_RESULTS.md + +Step 11: 生成组会汇报 PPT(⭐ 新增) +├─ 从 FINAL_PROPOSAL.md 提取核心内容 +├─ 从 DEMO_VALIDATION.md 提取 Demo 结果 +├─ 生成 12-15 页组会汇报 PPT +├─ 格式:Markdown (Marp) / HTML (reveal.js) / PPTX +└─ 输出:11_PPT_PRESENTATION.md + +Step 12: 执行总结与汇报摘要(⭐ BioClaw 集成建议) +├─ 汇总本次实际跑过的阶段 +├─ 汇总关键输出文件与路径 +├─ 标注哪些步骤真正跑通、哪些仅为草案/脚手架 +├─ 标注当前 blocker +├─ 给出下一步建议(最多 3 条) +├─ 记录适合集成汇报的结论 +└─ 输出:FINAL_EXEC_SUMMARY.md +``` + +## 输出目录结构 + +``` +manuscript-plan/ +├── 01_INNOVATION_ASSESSMENT.md +├── 02_TASK_SYSTEM.md +├── 03_DATASET_CATALOG.md +├── 04_METRIC_SYSTEM.md +├── 05_ANALYSIS_SYSTEM.md +│ +├── 06_FIGURE_DESIGNS/ +│ ├── FIGURE_1_DESIGN.md +│ ├── FIGURE_2_DESIGN.md +│ ├── FIGURE_3_DESIGN.md +│ ├── FIGURE_4_DESIGN.md +│ ├── FIGURE_5_DESIGN.md +│ └── SUPPLEMENTARY_DESIGN.md +│ +├── 07_MANUSCRIPT_TEXT/ +│ ├── INTRODUCTION.md +│ ├── RESULTS.md +│ ├── DISCUSSION.md +│ └── METHODS.md +│ +├── refine-logs/ # ⭐ 新增 +│ ├── round-0-initial-proposal.md +│ │ +│ ├── round-1/ +│ │ ├── editor-review.md +│ │ ├── computational-review.md +│ │ ├── biological-review.md +│ │ ├── review-summary.md +│ │ └── refinement.md +│ │ +│ ├── round-2/ +│ │ ├── editor-review.md +│ │ ├── computational-review.md +│ │ ├── biological-review.md +│ │ ├── review-summary.md +│ │ └── refinement.md +│ │ +│ ├── human-feedback/ # ⭐ NEW: 人类反馈记录 +│ │ ├── feedback-round-1.md +│ │ ├── feedback-round-2.md +│ │ └── ... +│ │ +│ ├── REVIEW_SUMMARY.md +│ ├── FINAL_PROPOSAL.md +│ ├── HUMAN_APPROVAL.md # ⭐ NEW: 人类批准记录 +│ ├── score-history.md +│ └── REFINEMENT_REPORT.md +│ +├── 08_CODE_MODIFICATION_PLAN.md +├── 09_DEMO_VALIDATION.md +├── 10_ANALYSIS_RESULTS.md +│ +├── 11_PPT_PRESENTATION.md # ⭐ 新增:组会汇报 PPT +├── FINAL_EXEC_SUMMARY.md # ⭐ 新增:面向人类汇报的执行摘要 +├── INTEGRATION_TEST_REPORT.md # ⭐ 可选:集成/验证测试报告 +│ +└── FINAL_MANUSCRIPT_PLAN.md +``` + +## 执行摘要模板 + +每次较完整运行结束后,补一个汇报友好的摘要文件,至少覆盖以下内容: + +```md +# FINAL_EXEC_SUMMARY + +## Run Scope +- Topic: +- Date: +- Workspace: +- Pipeline entry: + +## Stages Executed +- Step / Phase: +- Step / Phase: + +## Key Files Generated +- path/to/file +- path/to/file + +## Verified Outputs +- What actually ran successfully +- What was only drafted / scaffolded + +## Current Blockers +- blocker 1 +- blocker 2 + +## Recommended Next Steps +1. ... +2. ... +3. ... + +## Attribution +- Workflow family: Bio-Manuscript-Forge +- BioClaw integration: community-contributed workflow +- Contributor reference: Yuhong Dong, Westlake University PhD candidate, BioClaw community contributor +``` + +## 三审稿人评审标准 + +### Editor(编辑) +- **职责**:初审,判断是否达到 Nature 子刊水平 +- **评审维度**:创新性、可行性、期刊匹配度 +- **评分**:创新性/10、可行性/10、推荐意见 + +### 计算审稿人 +- **职责**:从计算/算法角度评审 +- **评审维度**:算法设计、方法创新、实验严谨性、代码可行性 +- **评分**:方法创新/10、技术严谨/10、代码可行/10 + +### 生物分析审稿人 +- **职责**:从生物学/分析角度评审 +- **评审维度**:生物学意义、分析设计、数据选择 +- **评分**:生物意义/10、分析设计/10、数据选择/10 + +## 使用方式 + +```bash +/bio-manuscript-pipeline "topic: spatial multi-omics integration | base_work: https://github.com/example/project | innovation: boundary-aware cross-modal alignment | demo_data: https://example.com/data.h5ad | target_journal: nat-communications | num_refine_rounds: 2" +``` + +## 子 Skill 调用 + +本 Pipeline 会依次调用以下子 Skill: +- `bio-innovation-check`(Step 1) +- `bio-task-system`(Step 2) +- `bio-dataset-search`(Step 3) +- `bio-metric-system`(Step 4) +- `bio-analysis-system`(Step 5) +- `bio-figure-design`(Step 6) +- `bio-manuscript-text`(Step 7) +- `bio-manuscript-refine`(Step 7.5)⭐ +- `bio-human-feedback`(Step 7.6)⭐ NEW - 人类反馈验证 +- `bio-code-modification`(Step 8) +- `bio-demo-validate`(Step 9) +- `bio-ppt-generate`(Step 11)⭐ + +## 注意事项 + +1. **Phase 1 完成后**:检查创新性评估结果 +2. **Phase 2 完成后**:检查 Figure 设计和文案 +3. **Phase 2.5(Refine Loop)**:每轮评分需达到 7+ 才能进入下一阶段 +4. **Phase 2.6(人类反馈验证)**:⭐ 关键检查点 + - 呈现 FINAL_PROPOSAL.md 给人类审阅 + - 必须等待人类明确反馈 + - 同意 → 继续 Phase 3 + - 不同意 → 根据问题级别返回对应阶段迭代 + - 所有反馈记录到 refine-logs/human-feedback/ +5. **Phase 3**:Demo 验证如果不可行,回到 Step 8 重新设计 +6. **迭代收敛**:通常 2 轮 Refine 后评分趋于稳定 +7. **最终检查**:使用 FINAL_PROPOSAL.md 作为执行依据 +8. **人类批准**:必须有人类批准记录(HUMAN_APPROVAL.md)才能进入 Phase 3 diff --git a/container/skills/bio-manuscript-refine/SKILL.md b/container/skills/bio-manuscript-refine/SKILL.md new file mode 100644 index 0000000..f42c670 --- /dev/null +++ b/container/skills/bio-manuscript-refine/SKILL.md @@ -0,0 +1,117 @@ +# bio-manuscript-refine + +**Refine loop: three-reviewer iterative refinement (三审稿人迭代优化)** + +Run a reviewer-style refinement loop over the manuscript plan using three perspectives: editor, computational reviewer, and biological reviewer. + +## Purpose + +1. Review the current manuscript plan +2. Produce structured review comments +3. Revise the proposal round by round +4. Track score history and revision history + +## Input Format + +```text +manuscript_plan: [full manuscript plan generated in previous steps] +target_journal: [target journal, default nat-communications] +num_rounds: [number of refine rounds, default 2] +``` + +## Workflow + +### Round 0 + +- save the initial proposal snapshot + +### Each review round + +Produce three reviews: + +1. **Editor** + - novelty + - feasibility + - journal fit +2. **Computational reviewer** + - method design + - technical rigor + - benchmark quality + - implementation feasibility +3. **Biological reviewer** + - biological significance + - analysis design + - dataset suitability + +Then generate: + +- a review summary +- a revision response +- a refined proposal + +## Output Format + +```markdown +# Refine Report + +## Round 0 +- initial proposal snapshot + +## Round 1 Reviews +### Editor +- scores: +- key concerns: + +### Computational Reviewer +- scores: +- key concerns: + +### Biological Reviewer +- scores: +- key concerns: + +## Round 1 Revision +- addressed concerns: +- remaining risks: + +## Score History +| Round | Editor | Computational | Biological | Overall | +|-------|--------|---------------|------------|---------| +| 0 | ... | ... | ... | ... | + +## Final Proposal Status +- ready for next phase / needs more revision +``` + +## Reviewer Criteria + +### Editor + +- novelty +- feasibility +- journal fit + +### Computational reviewer + +- algorithmic soundness +- method novelty +- benchmark rigor +- code feasibility + +### Biological reviewer + +- biological significance +- analysis relevance +- dataset realism + +## Usage + +```bash +/bio-manuscript-refine "manuscript_plan: [path to proposal] | target_journal: nat-communications | num_rounds: 2" +``` + +## Notes + +1. Revision should update the proposal itself, not only append comments. +2. Keep a full history of each round. +3. Scores are guidance, not absolute truth; comments matter more than raw numbers. diff --git a/container/skills/bio-manuscript-text/SKILL.md b/container/skills/bio-manuscript-text/SKILL.md new file mode 100644 index 0000000..f03b913 --- /dev/null +++ b/container/skills/bio-manuscript-text/SKILL.md @@ -0,0 +1,127 @@ +# bio-manuscript-text + +**Step 7: Manuscript drafting (论文文案生成)** + +Draft the main manuscript text from the figure plan, metric system, and analysis system. + +## Purpose + +1. Write the Introduction +2. Draft the Results around the figure logic +3. Draft the Discussion +4. Draft the Methods skeleton + +## Input Format + +```text +topic: [research topic] +figure_designs: [figure design document] +innovation: [innovation summary] +base_work: [related prior work] +metric_system: [metric system] +analysis_system: [analysis system] +paper_count: [number of related papers] +related_papers: [optional list] +target_journal: [target journal] +``` + +## Workflow + +### Step 7.1: Write the Introduction + +Use a five-paragraph structure: + +1. field background +2. related work +3. limitations of current methods +4. introduce the proposed method +5. significance and broader impact + +### Step 7.2: Write the Results + +Organize the Results around figure order: + +- Figure 1: framework / method overview +- Figure 2-N: one main task or claim per figure + +For each figure section, explain: + +- setup +- quantitative findings +- qualitative or biological findings +- take-home message + +### Step 7.3: Write the Discussion + +Cover: + +- method strengths +- comparison to prior methods +- biological implications +- limitations +- future directions + +### Step 7.4: Write the Methods skeleton + +Include at minimum: + +- preprocessing +- model architecture +- training strategy +- metrics +- biological analyses +- baselines + +## Output Format + +```markdown +# Manuscript Text + +## INTRODUCTION +### Paragraph 1: Field background +### Paragraph 2: Related work +### Paragraph 3: Current limitations +### Paragraph 4: Our method +### Paragraph 5: Significance + +## RESULTS +### 2.1 Framework overview +### 2.2 Task 1 / Figure 2 +### 2.3 Task 2 / Figure 3 +### 2.4 Task 3 / Figure 4 +### 2.5 Task 4 / Figure 5 + +## DISCUSSION +- strengths +- comparisons +- biological implications +- limitations +- future work + +## METHODS +- preprocessing +- model +- training +- metrics +- biological analyses +- baselines +``` + +## Writing Principles + +1. Make the manuscript track the figure logic. +2. Keep claims tied to evidence. +3. Separate technical and biological claims clearly. +4. Prefer reviewer-friendly clarity over stylistic complexity. + +## Usage + +```bash +/bio-manuscript-text "topic: spatial multi-omics integration | figure_designs: [...] | innovation: [...] | base_work: [...] | paper_count: 5 | target_journal: nat-communications" +``` + +## Notes + +1. Do not draft text before the figure logic is stable. +2. Every Results subsection should map to a figure. +3. Keep Methods detailed enough that later implementation planning remains consistent. diff --git a/container/skills/bio-metric-system/SKILL.md b/container/skills/bio-metric-system/SKILL.md new file mode 100644 index 0000000..5af5e0f --- /dev/null +++ b/container/skills/bio-metric-system/SKILL.md @@ -0,0 +1,149 @@ +# bio-metric-system + +**Step 4: Metric system design (评价指标体系构建)** + +Build a defensible set of quantitative and qualitative metrics by extracting them from related work or adapting them from adjacent fields. + +## Purpose + +1. Extract evaluation metrics from existing literature +2. Borrow metrics from adjacent domains when needed +3. Organize metrics into quantitative and qualitative groups +4. Explain what each metric measures and how it should be computed + +## Input Format + +```text +topic: [research topic] +paper_count: [number of related papers] +task_system: [task system from Step 2] +``` + +## Workflow + +### Step 4.1: Extract metrics from existing work + +If `paper_count >= 5`, review the Results / Benchmark sections of the strongest related papers and extract: + +- metric name +- what it evaluates +- formula or computation method +- expected range +- how often it appears in the field + +### Step 4.2: Borrow metrics from adjacent domains + +If the literature is still thin, adapt metrics from a nearby field. + +Examples: + +- clustering agreement -> ARI / NMI +- modality agreement -> Pearson / cosine similarity +- reconstruction / registration -> MSE / MAE +- biological relevance -> marker recovery / enrichment scores + +### Step 4.3: Organize the metric system + +Split metrics into: + +- **Quantitative metrics** + - integration quality + - modality consistency + - registration / alignment quality + - biological agreement +- **Qualitative metrics** + - spatial plots + - feature plots + - latent visualizations + - heatmaps + - pathway / enrichment figures + +### Step 4.4: Standardize each metric + +For each metric, define: + +- English name +- optional Chinese reference in parentheses +- category +- what it measures +- formula (if needed) +- range / interpretation +- software implementation +- task relevance +- mapped figure / panel + +## Output Format + +```markdown +# Metric System + +## Metric Sources +- Extracted from related papers: +- Borrowed from adjacent domains: + +## Quantitative Metrics + +### ARI (Adjusted Rand Index) +- Category: +- What it measures: +- Formula: +- Range: +- Interpretation: +- Implementation: +- Relevant tasks: +- Figure mapping: + +### NMI (Normalized Mutual Information) +- Category: +- What it measures: +- Formula: +- Range: +- Interpretation: +- Implementation: +- Relevant tasks: +- Figure mapping: + +### Pearson correlation +- Category: +- What it measures: +- Formula: +- Range: +- Interpretation: +- Implementation: +- Relevant tasks: +- Figure mapping: + +## Qualitative Metrics / Visual Readouts +- spatial domain map +- feature plot +- violin plot +- UMAP / latent visualization +- heatmap +- pathway enrichment figure + +## Next Step +- Use the metric system to build the analysis system in Step 5 +``` + +## Recommended Core Metrics + +For most manuscript-planning runs, include at least: + +- ARI +- NMI +- Macro-F1 or annotation accuracy +- Pearson / cosine similarity when cross-modal agreement matters +- MSE / MAE when reconstruction or alignment quality matters +- at least one biological validation readout + +## Usage + +```bash +/bio-metric-system "spatial multi-omics integration | paper_count: 5 | task_system: [task system from Step 2]" +``` + +## Notes + +1. Do not overload the paper with too many metrics; prefer a compact but defendable set. +2. Match each metric to a specific task claim. +3. Include at least one metric that reflects biological value, not just technical fit. diff --git a/container/skills/bio-ppt-generate/SKILL.md b/container/skills/bio-ppt-generate/SKILL.md new file mode 100644 index 0000000..25e9baa --- /dev/null +++ b/container/skills/bio-ppt-generate/SKILL.md @@ -0,0 +1,84 @@ +# bio-ppt-generate + +**Presentation generation (组会汇报 PPT)** + +Generate a concise presentation package from the final proposal and demo / validation outputs. + +## Purpose + +1. Extract the core message from `FINAL_PROPOSAL.md` +2. Incorporate demo or validation results +3. Produce a 10-15 slide presentation outline +4. Output Markdown-first slides that can be converted later + +## Input Format + +```text +final_proposal: [path to FINAL_PROPOSAL.md] +demo_result: [path to DEMO_VALIDATION.md or equivalent] +ppt_title: [presentation title] +author: [author] +date: [date] +``` + +## Suggested Slide Order + +1. Title +2. Background +3. Research question +4. Innovation +5. Method overview +6. Task design +7. Data and metrics +8. Demo / validation results +9. Expected figure set +10. Biological significance +11. Next steps +12. Summary +13. Q&A + +## Output Format + +Preferred output: + +- Markdown slide deck +- optionally HTML / reveal.js structure +- optionally a PowerPoint-style outline + +```markdown +--- +marp: true +theme: gaia +paginate: true +--- + +# [Title] + +**[Author]** + +[Date] + +--- + +# Background + +- ... + +--- + +# Research Question + +- ... +``` + +## Usage + +```bash +/bio-ppt-generate "final_proposal: FINAL_PROPOSAL.md | demo_result: DEMO_VALIDATION.md | ppt_title: Project Update | author: Name | date: 2026-04-04" +``` + +## Notes + +1. Keep the deck concise and review-friendly. +2. One slide should carry one main idea. +3. Demo evidence should be included if available. diff --git a/container/skills/bio-task-system/SKILL.md b/container/skills/bio-task-system/SKILL.md new file mode 100644 index 0000000..97033c5 --- /dev/null +++ b/container/skills/bio-task-system/SKILL.md @@ -0,0 +1,118 @@ +# bio-task-system + +**Step 2: Task system design (任务体系构建)** + +Identify the main task categories in the field and organize them into a staged difficulty ladder. + +## Purpose + +1. Find the dominant task taxonomy in the target field +2. Define Level 1-4 task tiers +3. Ensure the task ladder increases in difficulty +4. Prepare the task system for downstream dataset and metric design + +## Input Format + +```text +topic: [research topic] +paper_count: [number of related papers from Step 1] +``` + +## Workflow + +### Step 2.1: Task taxonomy search + +If there is substantial prior work, extract tasks from existing papers. + +If there is not enough prior work, borrow the taxonomy from a parent domain and adapt it. + +Typical adaptation logic: + +- single-cell multi-omics -> spatial multi-omics +- modality alignment -> spatial-cell alignment +- batch integration -> cross-sample integration + +### Step 2.2: Task tier design + +Define four levels: + +- **Level 1**: basic validation task +- **Level 2**: intermediate application task +- **Level 3**: challenge task +- **Level 4**: flagship innovation task + +Increase across three dimensions: + +1. Data complexity +2. Technical difficulty +3. Biological value + +### Step 2.3: Standardize task descriptions + +For each task, write: + +- definition +- difficulty level +- data requirements +- technical focus +- biological value +- representative methods +- mapped figure + +## Output Format + +```markdown +# Task System Design + +## Task Sources +- Extracted from related papers: +- Borrowed from parent domain: + +## Tier Overview +| Level | Task type | Difficulty | Figure | +|-------|-----------|------------|--------| +| 1 | ... | low | Figure 2 | +| 2 | ... | medium | Figure 3 | +| 3 | ... | high | Figure 4 | +| 4 | ... | highest | Figure 5 | + +## Detailed Task Descriptions + +### Task 1: [task name] +- Definition: +- Difficulty: +- Data requirements: +- Technical focus: +- Biological value: +- Representative methods: +- Mapped figure: + +### Task 2: ... + +## Progression Rationale +1. Data complexity rises across tasks +2. Technical difficulty rises across tasks +3. Biological value rises across tasks + +## Next Step +- Use the task system to search for datasets in Step 3 +``` + +## Example Ladder + +- Level 1: vertical integration +- Level 2: horizontal / cross-slice integration +- Level 3: mosaic integration with missing modalities +- Level 4: diagonal integration across platform / resolution / cohort + +## Usage + +```bash +/bio-task-system "spatial multi-omics integration | paper_count: 5" +``` + +## Notes + +1. Keep the ladder interpretable to reviewers. +2. Avoid adding too many tasks; four well-designed tiers are usually enough. +3. Make sure each task can later be tied to datasets, metrics, and figures.