Skip to content

Add source_environment field to media schema for cross-repo environmental linking #2

@realmarcin

Description

@realmarcin

Summary

Add source_environment field to CultureMech media records to enable environmental linking with CommunityMech and improve cross-repository queries.

Background

Triggered by: CommunityMech issue #24 (SPRUCE Peatland Community addition)

Currently, CommunityMech has robust environmental metadata using ENVO terms:

environment_term:
  preferred_term: peatland
  term:
    id: ENVO:00000044
    label: peatland

However, CultureMech has no corresponding field to link media to environments, creating a gap in cross-repository environmental linking.

Proposed Schema Addition

Add a source_environment field to media records:

# In culturemech.yaml schema
SourceEnvironmentDescriptor:
  is_a: Descriptor
  description: Environment from which target organisms originate
  attributes:
    preferred_term:
      description: Human-readable environment name
      required: true
      range: string
    term:
      description: ENVO term for the environment
      range: EnvironmentTerm
      recommended: true
      inlined: true
    notes:
      description: Additional environmental context
      range: string

# Add to CultureMedia class:
source_environment:
  description: Environment(s) from which target organisms originate
  range: SourceEnvironmentDescriptor
  multivalued: true
  inlined_as_list: true

Example Usage

id: CultureMech:010001
name: Acidic Peatland Medium
category: bacterial
medium_type: COMPLEX

source_environment:
  - preferred_term: peatland
    term:
      id: ENVO:00000044
      label: peatland
    notes: "Designed for acidophilic bacteria from northern boreal peatlands"
  
  - preferred_term: peat bog
    term:
      id: ENVO:00005773
      label: peat bog

ingredients:
  - preferred_term: Humic acid
    concentration:
      value: '0.5'
      unit: G_PER_L

Benefits

  1. Cross-repository queries: Find all media for specific environments (e.g., "all peatland media")
  2. Community curation: Auto-suggest relevant media when adding communities
  3. Environmental coverage: Track which environments have cultivation media
  4. FAIR data: Standardized environmental metadata across repos
  5. Discovery: Enable environment-based browsing and search

Use Cases

Use Case 1: Adding a Peatland Community

When curating a peatland community (like SPRUCE), automatically find and suggest:

  • Media tagged with source_environment: ENVO:00000044 (peatland)
  • Media for acidophilic organisms
  • Media for methanogenic archaea

Use Case 2: Environmental Coverage Analysis

Environment: Peatland (ENVO:00000044)
- Communities: 3 (CommunityMech)
- Media: 15 (CultureMech) ✅
- Coverage: HIGH

Environment: Deep-sea hydrothermal vent (ENVO:01000030)  
- Communities: 5 (CommunityMech)
- Media: 2 (CultureMech) ⚠️ LOW
- Action needed: Create more media for this environment

Use Case 3: Cross-Repository SPARQL Query

SELECT ?community ?media ?ingredient
WHERE {
  ?community communitymech:environment_term ENVO:00000044 .
  ?media culturemech:source_environment ENVO:00000044 .
  ?ingredient mediaingredientmech:environmental_context ENVO:00000044 .
}

Implementation Plan

Phase 1: Schema Design

  • Review and refine SourceEnvironmentDescriptor class
  • Decide on required vs. recommended fields
  • Define validation rules for ENVO terms

Phase 2: Schema Migration

  • Update culturemech.yaml schema
  • Regenerate Python dataclasses
  • Update validation pipelines
  • Create migration documentation

Phase 3: Data Curation (Optional)

  • Automated: Infer environments from organism metadata in existing media
  • LLM-assisted: Suggest ENVO terms based on media names, ingredients, notes
  • Manual: Curate high-priority environments (peatland, marine, soil, gut)

Notes

  • ENVO prefix already exists in CultureMech schema: ENVO: http://purl.obolibrary.org/obo/ENVO_
  • Field should be optional to avoid breaking existing records
  • Field is multivalued (media can be relevant to multiple environments)
  • Coordinated with MediaIngredientMech issue for environmental_context field

Related Issues

  • MediaIngredientMech: [Issue for environmental_context field]
  • CommunityMech: [Issue for cross-repo coordination]

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions