Feature Proposal: FERPA-Compliant Content Handling for U.S. Educational Deployments #537
ashutoshrana
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Background
OpenMAIC is actively used in U.S. educational institutions. Once a deployment involves students as users — even just interacting with the classroom chat — it enters the scope of the Family Educational Rights and Privacy Act (FERPA, 20 U.S.C. § 1232g). FERPA §99.31 limits which student record data can be disclosed to third-party systems, including AI/LLM providers. There is currently no FERPA handling in the generation or chat pipeline.
I'd like to propose adding a lightweight FERPA compliance layer and am happy to implement it if maintainers are interested.
The Problem in the Current Codebase
Two entry points in the current pipeline can inadvertently transmit FERPA-protected data to LLM providers:
1.
app/api/generate-classroom/route.ts—pdfContentfieldThe generation API accepts a
pdfContentfield alongside the lessonrequirement. An educator generating a lesson from an uploaded PDF may accidentally upload a document containing student records (grade reports, disability accommodation files, class rosters with grades). That raw content flows directly intorunClassroomGenerationJob()and eventually into LLM prompts via the generation pipeline. There is no scan step.2.
lib/orchestration/summarizers/conversation-summary.ts/message-converter.tsStudent chat messages are accumulated into
state.messagesand passed to LLM providers viaconvertMessagesToOpenAI(). If a student types a message containing their own student ID, disability status, or financial aid situation (common in an advising-style AI classroom), that PII flows into LLM API calls. FERPA's minimum-necessary principle applies here.Proposed Implementation
A single optional middleware module —
lib/ferpa/ferpa-filter.ts— with two functions:Insertion points (minimal, non-breaking):
app/api/generate-classroom/route.ts: runredactFerpaContent(body.pdfContent)before passing torunClassroomGenerationJob. If redactions occur, log them and (optionally) surface a warning to the client.lib/orchestration/summarizers/message-converter.ts: runscanChatMessageForFerpaPii()on user message content inconvertMessagesToOpenAI()— warn-only mode by default so it does not break existing behavior.Both are opt-in via environment variable (
ENABLE_FERPA_FILTER=true) so non-U.S. deployments see zero overhead.Detection Patterns
FERPA-protected fields to detect (regex + heuristic):
\d{3}-\d{2}-\d{4}Prior Art / Reference Implementation
I've implemented this pattern as an open-source Python package (
haystack-ferpa-filter, PyPI v0.1.0) for Haystack AI pipelines, and have deployed a production version of this architecture in a FERPA-compliant enterprise RAG system. The TypeScript implementation proposed here would be a clean-room adaptation of the same pattern for OpenMAIC's Node.js/Next.js stack — not a port of the Python code.Scope / What I'm NOT Proposing
Questions for Maintainers
lib/ferpa/module (as proposed) or as a plugin/skill under theskills/directory?Happy to open a draft PR with the
lib/ferpa/ferpa-filter.tsmodule and the two insertion points for review before doing anything more.Beta Was this translation helpful? Give feedback.
All reactions