Skip to content

Add Obsidian vault URL ingestion #5

@MakiDevelop

Description

@MakiDevelop

What needs to be done

Extend ingest.py to scan an Obsidian vault directory and extract URLs from markdown files.

Which file(s) to modify

  • ingest.py — add Obsidian scanning logic

Proposed approach

  1. Add --obsidian /path/to/vault flag
  2. Recursively find all .md files in the vault
  3. Extract URLs using the existing extract_urls() function
  4. Pass through normalize_url()ingest_urls(source="obsidian")

Example usage

python3 ingest.py --obsidian ~/Documents/MyVault
python3 ingest.py --obsidian ~/Documents/MyVault --after 2024-01-01

Hints

  • extract_urls() already handles URL extraction from text
  • Obsidian markdown links look like [text](https://...) or bare URLs
  • Consider adding --after date filter to avoid re-processing old notes
  • Use pathlib.Path.rglob("*.md") for recursive file finding

Acceptance criteria

  • Recursively scans .md files
  • Extracts both [text](url) and bare URLs
  • Uses existing extract_urls() for consistency
  • Adds tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions