What needs to be done
Extend ingest.py to scan an Obsidian vault directory and extract URLs from markdown files.
Which file(s) to modify
ingest.py — add Obsidian scanning logic
Proposed approach
- Add
--obsidian /path/to/vault flag
- Recursively find all
.md files in the vault
- Extract URLs using the existing
extract_urls() function
- Pass through
normalize_url() → ingest_urls(source="obsidian")
Example usage
python3 ingest.py --obsidian ~/Documents/MyVault
python3 ingest.py --obsidian ~/Documents/MyVault --after 2024-01-01
Hints
extract_urls() already handles URL extraction from text
- Obsidian markdown links look like
[text](https://...) or bare URLs
- Consider adding
--after date filter to avoid re-processing old notes
- Use
pathlib.Path.rglob("*.md") for recursive file finding
Acceptance criteria
What needs to be done
Extend
ingest.pyto scan an Obsidian vault directory and extract URLs from markdown files.Which file(s) to modify
ingest.py— add Obsidian scanning logicProposed approach
--obsidian /path/to/vaultflag.mdfiles in the vaultextract_urls()functionnormalize_url()→ingest_urls(source="obsidian")Example usage
Hints
extract_urls()already handles URL extraction from text[text](https://...)or bare URLs--afterdate filter to avoid re-processing old notespathlib.Path.rglob("*.md")for recursive file findingAcceptance criteria
.mdfiles[text](url)and bare URLsextract_urls()for consistency