Skip to content

feat: add Obsidian vault search support (--obsidian flag)#20

Merged
sinzin91 merged 3 commits intomainfrom
feat/obsidian-search
Mar 23, 2026
Merged

feat: add Obsidian vault search support (--obsidian flag)#20
sinzin91 merged 3 commits intomainfrom
feat/obsidian-search

Conversation

@sinzin91
Copy link
Copy Markdown
Owner

Summary

  • Add --obsidian <path> flag to search markdown files in Obsidian vaults
  • Uses ripgrep for fast search with pure Rust fallback when rg unavailable
  • Skips hidden directories (.obsidian, .git, .trash)
  • Supports AND semantics (all query terms must match)

Test plan

  • All 22 integration tests pass
  • Manual testing with fixture vault
  • CI build passes

🤖 Generated with Claude Code

Add ability to search markdown files in Obsidian vaults with a new
--obsidian <path> flag. Uses ripgrep for fast search with pure Rust
fallback. Skips hidden directories (.obsidian, .git, .trash).

- Add ObsidianMatch struct and search_obsidian/search_obsidian_rust functions
- Add print_obsidian_results for formatted output
- Add test fixtures and 6 integration tests
- Support AND semantics (all query terms must match)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sinzin91
Copy link
Copy Markdown
Owner Author

sinzin91 commented Mar 23, 2026

Automated Analysis Results

Verdict: FAIL
Model: gpt-5.4 (unknown)
Parser: gemini/gemini-3-flash-preview
Reviewing: #20 @ e711241 (feat/obsidian-search)

Summary

The PR contains two blocker-level issues: a fragile ripgrep output parser that fails on paths containing colons (common on Windows) and a discrepancy in search semantics where multi-term AND logic is restricted to single lines rather than whole files. Several nits regarding path validation and test coverage were also identified.

Required Changes

  • src/main.rs:search_obsidian Ripgrep output parsing breaks on paths containing colons. The code uses line.find(':'), which fails for Windows absolute paths or vault paths with colons. 3fb2063c27ba
    Fix: Use ripgrep's --json output and parse structured records, or use --null separators for safe parsing.
  • src/main.rs:search_obsidian Ripgrep search and Rust fallback only match AND terms if they appear on the same line, whereas users expect file-level matching. d44ad39509a6
    Fix: Decide the intended contract. If file-level, scan entire file contents for all terms then emit relevant snippets.

Follow-ups

  • [NIT] src/main.rs:find_markdown_files Case-sensitive .md extension check skips valid files like README.MD. 4fc382bd0d25
    Fix: Normalize extension case before comparison in the Rust fallback logic.
  • [NIT] src/main.rs:main Vault path validation only checks for existence, not if the path is actually a directory. 5a75a6b01d39
    Fix: Validate vault_path.is_dir() and emit a specific error if it is a file.
  • [NIT] src/main.rs:search_obsidian Ripgrep command may search hidden items differently than the manual fallback traversal. fa8ee8d6b121
    Fix: Sync the traversal policies between the two paths exactly.
  • [NIT] tests/integration_tests.rs No explicit test ensures parity between the ripgrep execution path and the Rust fallback path. c2f222ec820f
    Fix: Add tests that force fallback mode and compare outputs to the ripgrep mode.
  • [NIT] tests/integration_tests.rs Tests do not cover vault paths containing colons, masking the parsing bug. 8258001cd8ab
    Fix: Add unit tests for the parser or use structured JSON parsing.
  • [NIT] tests/integration_tests.rs:obsidian_search Tests do not disambiguate if AND terms must be on the same line or just in the same file. b912b16f5bc2
    Fix: Add a fixture where terms appear on separate lines to define expected behavior.
  • [NIT] tests/integration_tests.rs Missing negative test for passing a file instead of a directory to the vault path argument. 17e46d82948d
    Fix: Add a test using a markdown file as the --obsidian argument.
  • [OPTIONAL] src/main.rs:find_markdown_files Hidden-file behavior is inconsistent; hidden directories are skipped but hidden files in visible directories are searched. da60610ec6cb
    Fix: Either document this explicitly or skip dotfiles for consistency.
  • [OPTIONAL] src/main.rs:find_markdown_files Potential performance issue on large vaults in fallback mode due to collecting all paths in memory before searching. 994e03933f03
    Fix: Search while walking the directory tree to allow for streaming results.
  • [OPTIONAL] src/main.rs The constant MAX_MATCHES_PER_SESSION is reused for Obsidian file matches, which is naming-wise misleading. 34d001c3612f
    Fix: Rename to MAX_MATCHES_PER_SOURCE or introduce a separate constant.
  • [FYI] src/main.rs:print_obsidian_results format_project_path usage for arbitrary vault files might be semantically confusing. d67548ce7f46
    Fix: No action required unless a more neutral helper name is desired.

Security Notes

  • None noted.

Tool Notes

  • Warning: gpt-5.4 does not support 'reasoning_effort', ignoring.
  • Warning: Repo-map can't include Cargo.lock (deleted from file system but not git?)

Misc

The reviewer suggests the README documentation implies file-level search which contradicts the line-level implementation.


Generated by passfail.py
Time: 1m 0s
passfail: 09a07151 (2026-03-09T17:17:21-04:00)

passfail:structured-data
{
  "review": {
    "verdict": "FAIL",
    "summary": "The PR contains two blocker-level issues: a fragile ripgrep output parser that fails on paths containing colons (common on Windows) and a discrepancy in search semantics where multi-term AND logic is restricted to single lines rather than whole files. Several nits regarding path validation and test coverage were also identified.",
    "relevant_files": [
      "src/main.rs",
      "tests/integration_tests.rs",
      "tests/fixtures/obsidian-vault/Projects.md",
      "Cargo.toml",
      "README.md",
      "tests/fixtures/obsidian-vault/Daily Notes.md"
    ],
    "findings": [
      {
        "id": "F1",
        "file": "src/main.rs",
        "line_range": "search_obsidian",
        "issue": "Ripgrep output parsing breaks on paths containing colons. The code uses line.find(':'), which fails for Windows absolute paths or vault paths with colons.",
        "level": "Blocker",
        "fix_hint": "Use ripgrep's --json output and parse structured records, or use --null separators for safe parsing."
      },
      {
        "id": "F2",
        "file": "src/main.rs",
        "line_range": "search_obsidian",
        "issue": "Ripgrep search and Rust fallback only match AND terms if they appear on the same line, whereas users expect file-level matching.",
        "level": "Blocker",
        "fix_hint": "Decide the intended contract. If file-level, scan entire file contents for all terms then emit relevant snippets."
      },
      {
        "id": "F3",
        "file": "src/main.rs",
        "line_range": "find_markdown_files",
        "issue": "Case-sensitive .md extension check skips valid files like README.MD.",
        "level": "Nit",
        "fix_hint": "Normalize extension case before comparison in the Rust fallback logic."
      },
      {
        "id": "F4",
        "file": "src/main.rs",
        "line_range": "find_markdown_files",
        "issue": "Hidden-file behavior is inconsistent; hidden directories are skipped but hidden files in visible directories are searched.",
        "level": "Optional",
        "fix_hint": "Either document this explicitly or skip dotfiles for consistency."
      },
      {
        "id": "F5",
        "file": "src/main.rs",
        "line_range": "main",
        "issue": "Vault path validation only checks for existence, not if the path is actually a directory.",
        "level": "Nit",
        "fix_hint": "Validate vault_path.is_dir() and emit a specific error if it is a file."
      },
      {
        "id": "F6",
        "file": "src/main.rs",
        "line_range": "search_obsidian",
        "issue": "Ripgrep command may search hidden items differently than the manual fallback traversal.",
        "level": "Nit",
        "fix_hint": "Sync the traversal policies between the two paths exactly."
      },
      {
        "id": "F7",
        "file": "tests/integration_tests.rs",
        "line_range": null,
        "issue": "No explicit test ensures parity between the ripgrep execution path and the Rust fallback path.",
        "level": "Nit",
        "fix_hint": "Add tests that force fallback mode and compare outputs to the ripgrep mode."
      },
      {
        "id": "F8",
        "file": "tests/integration_tests.rs",
        "line_range": null,
        "issue": "Tests do not cover vault paths containing colons, masking the parsing bug.",
        "level": "Nit",
        "fix_hint": "Add unit tests for the parser or use structured JSON parsing."
      },
      {
        "id": "F9",
        "file": "tests/integration_tests.rs",
        "line_range": "obsidian_search",
        "issue": "Tests do not disambiguate if AND terms must be on the same line or just in the same file.",
        "level": "Nit",
        "fix_hint": "Add a fixture where terms appear on separate lines to define expected behavior."
      },
      {
        "id": "F10",
        "file": "src/main.rs",
        "line_range": "find_markdown_files",
        "issue": "Potential performance issue on large vaults in fallback mode due to collecting all paths in memory before searching.",
        "level": "Optional",
        "fix_hint": "Search while walking the directory tree to allow for streaming results."
      },
      {
        "id": "F11",
        "file": "src/main.rs",
        "line_range": null,
        "issue": "The constant MAX_MATCHES_PER_SESSION is reused for Obsidian file matches, which is naming-wise misleading.",
        "level": "Optional",
        "fix_hint": "Rename to MAX_MATCHES_PER_SOURCE or introduce a separate constant."
      },
      {
        "id": "F12",
        "file": "src/main.rs",
        "line_range": "print_obsidian_results",
        "issue": "format_project_path usage for arbitrary vault files might be semantically confusing.",
        "level": "FYI",
        "fix_hint": "No action required unless a more neutral helper name is desired."
      },
      {
        "id": "F13",
        "file": "tests/integration_tests.rs",
        "line_range": null,
        "issue": "Missing negative test for passing a file instead of a directory to the vault path argument.",
        "level": "Nit",
        "fix_hint": "Add a test using a markdown file as the --obsidian argument."
      }
    ],
    "security_notes": [],
    "aider_warnings": [
      "gpt-5.4 does not support 'reasoning_effort', ignoring.",
      "Repo-map can't include Cargo.lock (deleted from file system but not git?)"
    ],
    "aider_errors": [],
    "misc": "The reviewer suggests the README documentation implies file-level search which contradicts the line-level implementation.",
    "is_infrastructure_error": false,
    "infrastructure_error_message": ""
  },
  "usage": {
    "total_tokens": "tokens unavailable",
    "total_cost": "$0.19",
    "notes": "Token counts are provided per-message but not summed as a total for the full run in the provided text; session cost is explicitly tracked and ended at $0.19."
  }
}

@sinzin91
Copy link
Copy Markdown
Owner Author

sinzin91 commented Mar 23, 2026

Automated Analysis Results

Verdict: FAIL
Model: openrouter/google/gemini-3.1-pro-preview (reasoning)
Parser: gemini/gemini-3-flash-preview
Reviewing: #20 @ e711241 (feat/obsidian-search)

Summary

The PR introduces a search feature using ripgrep that contains several critical flaws, including broken search logic for multi-word queries, potential command injection vulnerabilities, and performance inefficiencies in the Rust fallback. Additionally, the integration tests use weak assertions that allow them to pass even when zero matches are found.

Required Changes

  • src/main.rs:1333-1348 The rg command construction breaks intended AND semantics and is vulnerable to regex/argument injection. It treats multi-word queries as a single regex phrase and fails if the query starts with a dash or contains regex special characters. f5db94a48e0a
    Fix: Use --fixed-strings and pass each query term as a separate -e argument to the ripgrep command.
  • tests/integration_tests.rs:406-409 The test assertion for 'matches found' is too weak because the CLI output contains '0 matches found' on failure, which still satisfies the check. This hides the fact that the search is currently finding zero results. c8b010d89454
    Fix: Strengthen assertions to check for specific non-zero match counts or expected filenames like Projects.md.

Follow-ups

  • [NIT] src/main.rs:1376-1378 The parsing logic for ripgrep output assumes colons only separate path and line numbers, which will fail on Windows systems where absolute paths contain drive letter colons. 99aaeff5c8e1
    Fix: Consider splitting the output string from the right or utilizing ripgrep's --json output format for robust parsing.
  • [NIT] src/main.rs:38 The --obsidian flag is silently ignored if --openclaw is also provided due to the control flow logic in main(). e5a72aa29edf
    Fix: Add conflicts_with = "openclaw" to the obsidian argument definition in Clap.
  • [OPTIONAL] src/main.rs:1428-1431 High memory allocation overhead due to calling file_path.clone() for every single line of every markdown file processed in the Rust fallback loop. 61cc313539b2
    Fix: Refactor search_obsidian_rust to use a local counter outside the line loop instead of a HashMap keyed by PathBuf.

Security Notes

  • Query strings are passed directly to a shell command (rg), which could lead to command argument injection if users provide inputs starting with dashes.

Misc

The reviewer noted that some issues, like the Windows path parsing bug, exist elsewhere in the codebase but are being highlighted here because they affect the new functionality.


Generated by passfail.py
Time: 4m 0s
passfail: 09a07151 (2026-03-09T17:17:21-04:00)

passfail:structured-data
{
  "review": {
    "verdict": "FAIL",
    "summary": "The PR introduces a search feature using ripgrep that contains several critical flaws, including broken search logic for multi-word queries, potential command injection vulnerabilities, and performance inefficiencies in the Rust fallback. Additionally, the integration tests use weak assertions that allow them to pass even when zero matches are found.",
    "relevant_files": [
      "src/main.rs",
      "tests/integration_tests.rs",
      "tests/fixtures/obsidian-vault/Projects.md"
    ],
    "findings": [
      {
        "id": "F1",
        "file": "src/main.rs",
        "line_range": "1333-1348",
        "issue": "The rg command construction breaks intended AND semantics and is vulnerable to regex/argument injection. It treats multi-word queries as a single regex phrase and fails if the query starts with a dash or contains regex special characters.",
        "level": "Blocker",
        "fix_hint": "Use --fixed-strings and pass each query term as a separate -e argument to the ripgrep command."
      },
      {
        "id": "F2",
        "file": "tests/integration_tests.rs",
        "line_range": "406-409",
        "issue": "The test assertion for 'matches found' is too weak because the CLI output contains '0 matches found' on failure, which still satisfies the check. This hides the fact that the search is currently finding zero results.",
        "level": "Blocker",
        "fix_hint": "Strengthen assertions to check for specific non-zero match counts or expected filenames like Projects.md."
      },
      {
        "id": "F3",
        "file": "src/main.rs",
        "line_range": "1428-1431",
        "issue": "High memory allocation overhead due to calling file_path.clone() for every single line of every markdown file processed in the Rust fallback loop.",
        "level": "Optional",
        "fix_hint": "Refactor search_obsidian_rust to use a local counter outside the line loop instead of a HashMap keyed by PathBuf."
      },
      {
        "id": "F4",
        "file": "src/main.rs",
        "line_range": "1376-1378",
        "issue": "The parsing logic for ripgrep output assumes colons only separate path and line numbers, which will fail on Windows systems where absolute paths contain drive letter colons.",
        "level": "Nit",
        "fix_hint": "Consider splitting the output string from the right or utilizing ripgrep's --json output format for robust parsing."
      },
      {
        "id": "F5",
        "file": "src/main.rs",
        "line_range": "38",
        "issue": "The --obsidian flag is silently ignored if --openclaw is also provided due to the control flow logic in main().",
        "level": "Nit",
        "fix_hint": "Add conflicts_with = \"openclaw\" to the obsidian argument definition in Clap."
      }
    ],
    "security_notes": [
      "Query strings are passed directly to a shell command (rg), which could lead to command argument injection if users provide inputs starting with dashes."
    ],
    "aider_warnings": [],
    "aider_errors": [],
    "misc": "The reviewer noted that some issues, like the Windows path parsing bug, exist elsewhere in the codebase but are being highlighted here because they affect the new functionality.",
    "is_infrastructure_error": false,
    "infrastructure_error_message": ""
  },
  "usage": {
    "total_tokens": "28k sent, 3.0k received",
    "total_cost": "$0.31 session",
    "notes": "Token and cost values represent the full session summary provided in the final aider output."
  }
}

Addresses passfail F5: validate that the vault path is actually a
directory, not just that it exists.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sinzin91
Copy link
Copy Markdown
Owner Author

Passfail Findings Classification

ID Finding Bucket Rationale
F1 Colon parsing in rg output GUARD Real risk on Windows paths with C:\. This is a macOS-focused personal tool, so lower priority. Would require --json output parsing - significant change for edge case.
F2 Line-level AND semantics WRONG This matches the existing Claude/OpenClaw search behavior exactly - grep-style line matching is intentional by design.
F3 Case-sensitive .md extension NOISE Edge case; .MD files are rare in Obsidian vaults.
F4 Hidden files vs directories NOISE Current behavior is correct - skip hidden dirs, search visible files. Standard Unix behavior.
F5 Missing is_dir() validation GUARD Fixed in 5b40c1d
F6 rg vs fallback hidden item handling NOISE Minor edge case, both implementations are reasonable.
F7 No rg/fallback parity test NOISE Overkill for a personal CLI tool.
F8 No colon path tests NOISE Related to F1, would be moot with JSON parsing.
F9 AND semantics test ambiguity WRONG Line-level AND is intentional, matches other search modes.
F10 Performance on large vaults NOISE Premature optimization for personal tool.
F11 Constant naming POLISH One-liner rename, skip.
F12 format_project_path naming NOISE FYI acknowledged, no action needed.
F13 Missing file-as-dir test GUARD Fixed in 5b40c1d

Summary

  • Fixed: F5, F13 (is_dir validation + test)
  • Intentional design: F2, F9 (line-level AND semantics matches existing Claude/OpenClaw behavior)
  • Deferred: F1 (Windows colon paths - would need --json parsing, low priority for macOS tool)
  • Skipped: F3, F4, F6-F8, F10-F12 (noise/polish)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sinzin91 sinzin91 merged commit 1e0fd83 into main Mar 23, 2026
5 checks passed
@sinzin91 sinzin91 deleted the feat/obsidian-search branch March 23, 2026 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant