Handle frontmatter in chunking and title extraction#551
Open
Handle frontmatter in chunking and title extraction#551
Conversation
79af5c7 to
dcd8ce3
Compare
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Contributor
Author
|
@tobi pushed a fix for the seemingly flakey CI test |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This teaches qmd to treat leading frontmatter as document metadata instead of ordinary prose.
Before this change, smart chunking could split frontmatter across multiple chunks, and
extractTitle()ignored frontmatter entirely. That meant metadata could get mixed into semantic content chunks, and documents that explicitly declared a title in frontmatter still fell back to headings or filenames.With this PR:
titleWhat changed
Frontmatter-aware chunking
src/store.tsFrontmatter-aware title extraction
extractTitle()now checks frontmatter first and returnsdata.titlewhen present#inside frontmatter cannot be treated as the document titleParsing behavior
gray-matterfor frontmatter parsinggray-matter+++ ... +++frontmatter is also recognized for chunk separationgray-matterdoes not decode into an object, qmd falls back to parsing the raw matter as YAML or JSON when looking fortitleTests
Added coverage for:
+++frontmatter as its own chunkValidated with:
bunx vitest run test/store.test.ts -t frontmatterbun run buildWhy this is useful
A lot of markdown collections rely on frontmatter for canonical titles and metadata. Keeping that metadata in a dedicated chunk makes retrieval cleaner, and honoring frontmatter
titlemakes qmd’s document display/title behavior line up better with how static-site and note-taking tools already structure documents.