Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions container/skills/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@ Anything the WhatsApp / WeCom / Discord / local-web agent should use belongs **h
| `sds-gel-review/` | SDS gel image review |
| `query-*` | Database / API usage (Ensembl, UniProt, KEGG, …) |
| `blast-search/`, `pubmed-search/`, `sequence-analysis/` | Literature & sequence workflows |
| `bio-manuscript-*` | Community-contributed manuscript planning pipeline for idea screening, figure planning, manuscript drafting, refinement, and implementation blueprints |
| `bio-manuscript-common/` | Shared templates and helper scripts used by the manuscript pipeline skill family |

## Community Skills

Some runtime skills may be integrated from BioClaw community contributors when they prove useful in real workflows. The manuscript pipeline skill family currently staged here is integrated as a community-contributed workflow.

Contributor reference:
- Yuhong Dong, Westlake University PhD candidate, BioClaw community contributor

## Developer-only skills

Expand Down
161 changes: 161 additions & 0 deletions container/skills/bio-analysis-system/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# bio-analysis-system

**Step 5: Analysis system design (分析方法体系构建)**

Build the analysis layer for the manuscript by identifying which analyses, tools, and biological validations should support each figure and each task.

## Purpose

1. Extract analysis patterns from related work
2. Borrow useful analyses from adjacent domains when needed
3. Map analyses to BioClaw-compatible tools or fallback software
4. Explain why each analysis is included and what biological claim it supports
5. Connect analyses to figure panels

## Input Format

```text
topic: [research topic]
paper_count: [number of related papers]
task_system: [task system]
metric_system: [metric system]
dataset_catalog: [dataset catalog]
```

## Workflow

### Step 5.1: Extract analyses from existing work

If enough related papers exist, inspect their figures and extract:

- panel type
- analysis method
- software / package
- important parameters
- the scientific or biological conclusion the panel supports

### Step 5.2: Borrow from adjacent fields

If the field is still thin, adapt common analyses from nearby areas such as:

- clustering
- marker visualization
- latent embedding visualization
- pathway enrichment
- cell-cell communication
- spatial statistics
- GRN analysis

### Step 5.3: Categorize analyses

Use three broad groups:

- **Quantitative analyses**
- clustering
- metric computation
- statistical tests
- baseline comparisons
- **Qualitative analyses**
- spatial visualization
- feature / violin plots
- UMAP / t-SNE
- before / after alignment comparisons
- heatmaps
- **Biological analyses**
- cell annotation
- marker genes
- pathway enrichment
- GRN
- ligand-receptor communication
- spatial statistics
- trajectory analysis

### Step 5.4: Map to BioClaw or fallback tools

Whenever possible, map analysis needs to BioClaw-compatible skills or established tools.

Examples:

- clustering -> Scanpy / Leiden
- annotation -> CellTypist / SingleR
- marker plots -> Scanpy
- enrichment -> gseapy
- spatial statistics -> squidpy
- GRN -> pySCENIC
- communication -> CellChat-like workflow

### Step 5.5: Standardize analysis descriptions

For each analysis, define:

- category
- purpose
- biological claim supported
- preferred tool
- fallback tool
- key function
- recommended parameters
- inputs / outputs
- mapped task
- mapped figure / panel

## Output Format

```markdown
# Analysis System

## Analysis Sources
- Extracted from related papers:
- Borrowed from adjacent domains:

## Quantitative Analyses

### Clustering
- Category:
- Purpose:
- Biological claim supported:
- Preferred tool:
- Fallback tool:
- Key function:
- Recommended parameters:
- Inputs / outputs:
- Relevant tasks:
- Figure mapping:

### Metric computation
- Category:
- Purpose:
- Preferred tools:
- Relevant tasks:
- Figure mapping:

## Qualitative Analyses
- spatial plot
- marker / feature plot
- latent embedding plot
- heatmap
- before / after alignment visualization

## Biological Analyses
- annotation
- marker recovery
- pathway enrichment
- GRN
- communication
- trajectory

## Next Step
- Use the analysis system to design figures in Step 6
```

## Usage

```bash
/bio-analysis-system "spatial multi-omics integration | paper_count: 5 | task_system: [...] | metric_system: [...] | dataset_catalog: [...]"
```

## Notes

1. Prefer analyses that directly support paper claims.
2. Make the biological readouts visible early; they should not appear only at the very end.
3. Map each major analysis to a concrete figure panel.
135 changes: 135 additions & 0 deletions container/skills/bio-dataset-search/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# bio-dataset-search

**Step 3: Dataset search and task matching (数据集搜索与匹配)**

Find suitable datasets for each task and map datasets to the task system defined earlier in the manuscript pipeline.

## Purpose

1. Extract datasets from related papers when possible
2. Search public repositories directly when needed
3. Normalize dataset metadata into a common structure
4. Match datasets to tasks in a defendable way

## Input Format

```text
topic: [research topic]
task_system: [task system from Step 2]
paper_count: [number of related papers]
existing_papers: [optional list of related papers]
```

## Workflow

### Step 3.1: Extract datasets from existing work

If `paper_count >= 5`, start from the strongest existing papers.

Read Methods / Data Availability sections and extract:

- dataset name
- data source
- platform
- modality
- sample scale
- download path
- annotation availability

### Step 3.2: Search datasets directly

If there is not enough prior work, search repositories such as:

- GEO
- ArrayExpress
- project-specific public portals

Use keyword sets built from:

- topic
- modality
- tissue / disease
- benchmark intent

### Step 3.3: Normalize dataset metadata

For each dataset, record:

- source
- platform
- species
- tissue / disease
- sample size
- feature count
- modalities
- annotation quality
- histology / region metadata
- format
- preprocessing needs
- recommended task fit

### Step 3.4: Match datasets to tasks

A good match should satisfy:

1. Every major task has at least one viable dataset
2. Dataset structure matches the task's technical assumptions
3. Download remains feasible
4. Metadata quality is sufficient for evaluation
5. Prefer at least one backup dataset per important task

## Output Format

```markdown
# Dataset Catalog

## Data Sources
- Extracted from related papers:
- Direct repository search:
- Borrowed from adjacent domains:

## Dataset Entries

### Dataset 1: [name]
- Source:
- Platform:
- Species:
- Tissue / disease:
- Modalities:
- Sample scale:
- Annotation quality:
- Download URL:
- Format:
- Recommended tasks:
- Why it fits:

## Dataset-Task Mapping
| Task | Recommended dataset | Why it fits | Notes |
|------|---------------------|-------------|-------|
| ... | ... | ... | ... |

## Acquisition Notes
- GEO download hints
- Public portal download hints

## Preprocessing Recommendations
| Dataset | Preprocessing needs | Suggested skill / tool |
|---------|---------------------|------------------------|
| ... | ... | ... |

## Next Step
- Build the metric system in Step 4
```

## Usage

```bash
/bio-dataset-search "spatial multi-omics integration | paper_count: 5 | task_system: [task system from Step 2]"
```

## Notes

1. Prefer datasets already used in related work when possible.
2. Verify links before committing them to the benchmark plan.
3. Capture QC and annotation metadata whenever available.
4. Match datasets to tasks based on actual experimental needs, not just popularity.
Loading
Loading