When querying Notion compliance pages (e.g. SOC 2 & HIPAA, Policy Central), PDF attachments hosted in Notion cannot be read or summarized. The Notion MCP tools return file reference metadata but not the document content, so questions that require information inside those PDFs hit a dead end.
- Notion search and fetch work for page text, but PDFs attached to pages (e.g. SOC 2 Type 2 reports, policy summaries) are opaque
- This blocks use cases like auditing compliance controls, answering security policy questions, or summarizing audit findings without a human manually reading the PDF
- Affects any Notion page that stores key information as PDF attachments rather than inline page content
Example: The SOC2 & HIPAA page has yearly SOC 2 reports as PDF files — Junior can see the file names but cannot extract or summarize their contents.
Options:
- Download and parse PDFs via the Notion file URL, then feed extracted text to the query pipeline
- Use an OCR/PDF extraction tool as a post-processing step after Notion fetch
- Investigate whether Notion's API exposes any content extraction for file blocks
Action taken on behalf of David Cramer.
When querying Notion compliance pages (e.g. SOC 2 & HIPAA, Policy Central), PDF attachments hosted in Notion cannot be read or summarized. The Notion MCP tools return file reference metadata but not the document content, so questions that require information inside those PDFs hit a dead end.
Example: The SOC2 & HIPAA page has yearly SOC 2 reports as PDF files — Junior can see the file names but cannot extract or summarize their contents.
Options:
Action taken on behalf of David Cramer.