Skip to content

feat: implement end-to-end secure vault architecture with NoSQL, OTP-auth, and RBAC#94

Open
CodeByRachit wants to merge 24 commits intoKathiraveluLab:devfrom
CodeByRachit:feat-private-vault-auth
Open

feat: implement end-to-end secure vault architecture with NoSQL, OTP-auth, and RBAC#94
CodeByRachit wants to merge 24 commits intoKathiraveluLab:devfrom
CodeByRachit:feat-private-vault-auth

Conversation

@CodeByRachit
Copy link
Copy Markdown

@CodeByRachit CodeByRachit commented Mar 7, 2026

Behavioral Health Vault (BHV) - Core Security & Ingestion Engine

Overview

This Pull Request implements the high-performance, security-first architecture for the Behavioral Health Vault (BHV) as outlined in my GSoC '26 proposal. I have successfully migrated the legacy stack to a modern FastAPI/Flask + MongoDB hybrid engine, focusing on "Privacy by Default," Zero-Knowledge principles, and memory-efficient data handling.


Security & Privacy Suite (Zone 1 & 2)

  • Identity Shield (RBAC): Implemented a multi-tier Role-Based Access Control system (Owner, Admin, and User) using Flask-Login and SQLAlchemy.
  • Zero-Knowledge Vaulting: Integrated AES-256 (Fernet) At-Rest Encryption. Files are encrypted in-memory before storage; static directories contain only unreadable encrypted blobs.
  • Multi-Factor Authentication:
    • Email OTP: 6-digit secure verification for account creation and password resets.
    • TOTP 2FA: Support for Authenticator apps (Google/Microsoft) to secure high-stakes clinic accounts.
  • Hardened Auth: Implemented strict regex-based password validation, CSRF protection across all sensitive routes, and secure environment variable management.
  • Audit Transparency: Internal tracking mechanism to ensure accountability for all vault access (Ingestion/Deletion).

Performance & Engineering

  • Asynchronous Streaming Ingestion: Developed a 64KB chunked streaming pipeline using FastAPI and Motor (Async MongoDB) to bypass framework buffering.
  • Benchmarks (Local Environment):
    • Memory Footprint: Constant ~10MB RAM (Scalable for low-spec clinic hardware).
    • Ingestion Speed: 500MB (Upload + AES Encryption) in ~2.8s (~178 MB/s).
    • Retrieval Speed: 500MB (Download + Decryption) in ~0.46s (>1 GB/s).
  • Cryptographic Integrity: 100% bit-for-bit verification using SHA-256 hashing on ingestion.

UI/UX & Adaptive Design

  • Culturally-Adaptive Interface: System-wide Dark/Light mode support with smooth CSS View Transitions.
  • Dynamic Vault Gallery: Responsive grid layout featuring on-the-fly secure decryption for authorized owners.
  • Enhanced Profile Dashboard: A redesigned two-column management center for security settings and avatar customization.

Technical Stack & Dependencies

  • Backend: Python (Flask/FastAPI), Motor (Async MongoDB), Uvicorn.
  • Database: MongoDB (NoSQL Vault) & SQLite/SQLAlchemy (Relational Auth).
  • Security: Cryptography (Fernet), Authlib (OAuth), itsdangerous (Tokenization), Pillow (Metadata Stripping).
  • Frontend: Jinja2, Vanilla JS (Theme Engine/Previews), CSS3 Variables.

Changelog Summary

  1. Refactor: Migrated Relational User logic to link with NoSQL RecoveryEntries for data isolation.
  2. Feature: Added serve_file route for secure in-memory decryption loop.
  3. Fix: Resolved initial security vulnerabilities (Hardcoded keys, PRNG for OTP, CSRF gaps).
  4. Optimization: Implemented paginated Search API for metadata filtering without RAM spikes.
  5. Infrastructure: Enhanced .gitignore and .env.example for secure deployment.

Note for Mentors: As suggested by @mdxabu, I have consolidated my work into this single PR to provide a unified view of the BHV prototype development for GSoC evaluation.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Behavioral Health Vault (BHV) by introducing a comprehensive security and privacy suite. It establishes a robust user authentication system, ensuring that each user's sensitive data is securely isolated within their private vault. The changes also streamline account management through email-based OTP for password resets and email verification, alongside integrating Google OAuth for convenient and secure sign-ins. Furthermore, the user interface has been upgraded to provide a more secure and intuitive experience, complemented by crucial infrastructure updates to safeguard sensitive configuration files.

Highlights

  • Private User Vaults: Implemented user-specific data isolation by linking RecoveryEntry records to individual User accounts, ensuring users only access their own data.
  • Email OTP Verification: Introduced a 6-digit OTP system for secure password resets and email verification during the signup process.
  • Secure Login & Authentication: Added local email/password authentication with robust hashing, email verification, Google OAuth integration, password visibility toggles, and confirm password validation.
  • Infrastructure Security: Enhanced the .gitignore file to prevent sensitive environment variables (.env) and local database files (vault_core.db) from being committed to the repository.
  • Improved UI/UX: Developed new HTML templates for login, signup, upload, gallery, and a base layout, incorporating features like theme toggling, auto-dismissing alerts, and client-side file preview.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • Added entries for .env, .venv/, venv/, pycache/, *.pyc, instance/, .pytest_cache/, vault_core.db, and .DS_Store to prevent sensitive files and build artifacts from being tracked.
  • app.py
    • Added a new Flask application file containing all core logic.
    • Configured SQLAlchemy, Flask-Login, Authlib, and itsdangerous for authentication.
    • Implemented user registration, login, logout, email verification, OTP-based password reset, and Google OAuth.
    • Integrated file upload and deletion logic, ensuring user-specific data access.
    • Included image processing for cleaning image metadata and SHA256 hashing for file integrity.
    • Defined routes for welcome, login, signup, verify email, logout, send OTP, reset password, Google login/authorize, upload page, gallery page, ingest record, and delete record.
  • models.py
    • Defined a new User SQLAlchemy model with fields for name, email, password_hash, is_verified, and created_at.
    • Established a one-to-many relationship between User and RecoveryEntry to link entries to specific users.
    • Made password_hash nullable to support Google OAuth users.
  • requirements.txt
    • Added new Python package dependencies including Authlib, Flask-Login, Flask-SQLAlchemy, itsdangerous, pillow, and python-dotenv.
  • static/img/js/main.js
    • Added JavaScript for theme toggling (light/dark mode) with view transitions.
    • Implemented auto-dismissing alerts functionality.
    • Included client-side modal logic for confirming record deletions.
  • static/img/js/upload.js
    • Added JavaScript to handle client-side file selection and display previews for both images and generic files on the upload page.
    • Implemented functionality to clear the selected file and reset the preview.
  • templates/base.html
    • Created a new base HTML template with a responsive header, navigation links, user authentication status display (Login/Logout), and a theme toggle button.
    • Integrated a global alert display system for flash messages.
    • Included a reusable modal for delete confirmations.
    • Defined global CSS styles for consistent look and feel, including light/dark theme variables.
  • templates/gallery.html
    • Created a new HTML template to display user-specific RecoveryEntry records in a grid format.
    • Included features for displaying file previews (images vs. generic files), narrative text, file hash snippets, and creation dates.
    • Added download buttons and delete buttons with modal confirmation for each entry.
    • Applied CSS for card animations and responsive grid layout.
  • templates/login.html
    • Created a new HTML template for a dynamic authentication form supporting login, signup, and password reset workflows.
    • Implemented client-side JavaScript for toggling password visibility, validating password confirmation, and handling OTP sending via AJAX.
    • Included form fields for name, email, password, confirm password, and OTP.
  • templates/upload.html
    • Created a new HTML template for the file upload interface.
    • Designed a drag-and-drop-like upload zone with file preview capabilities.
    • Included a textarea for narrative input and a submit button to ingest records.
    • Integrated client-side JavaScript for file selection, preview display, and clearing the selected file.
  • templates/welcome.html
    • Created a new HTML template for the application's welcome page.
    • Features a hero section with a title and description.
    • Provides action cards linking to the upload and gallery pages.
  • validators.py
    • Added a new Python file containing utility functions for file validation and anonymization.
    • Implemented is_authorized_upload to check file extensions and size against defined limits (100MB).
    • Implemented anonymize_filename to generate secure, unique filenames using UUIDs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant new features for user authentication, private data vaults, and security enhancements, covering email/password signup, Google OAuth, OTP-based password reset, and secure file handling. However, it introduces several critical and high-severity security vulnerabilities, including a hardcoded fallback for the SECRET_KEY that compromises session management, a complete lack of CSRF protection on sensitive POST routes, and the use of a non-cryptographically secure PRNG for OTP generation. The absence of rate limiting on the password reset endpoint further exacerbates the risk of brute-force attacks. Additionally, there are issues such as a merge conflict in .gitignore, a broken dependency in requirements.txt, duplicated JavaScript code in templates, and opportunities to make SMTP server configurations more flexible.

Comment thread .gitignore Outdated
Comment thread requirements.txt
Comment thread app.py Outdated
Comment thread .gitignore Outdated
Comment thread app.py
Comment thread app.py Outdated
Comment thread static/img/js/main.js
Comment thread templates/base.html
Comment thread templates/gallery.html Outdated
Comment thread templates/login.html Outdated
@mdxabu mdxabu added the on hold Not merging this PR now. label Mar 8, 2026
@CodeByRachit
Copy link
Copy Markdown
Author

@pradeeban @mdxabu Can you please review this pr

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu Hi Sir, as suggested, I have closed the earlier PRs. I will continue pushing all further commits here. Please let me know if any changes are required. Thank you!

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Added some new security features and UI updates to this PR!

🔐 Security Updates

  • 2FA (TOTP): Added a secure Two-Factor Authentication setup and login flow.
  • Password Management: Users can now update their passwords directly from the profile page.

🎨 UI/UX Improvements

  • New Profile Dashboard: Redesigned with a clean, two-column layout.
  • Smart Avatars: Automatically generates an initial-based avatar if the user hasn't uploaded a custom one.

Open to any feedback or suggestions!

@CodeByRachit
Copy link
Copy Markdown
Author

CodeByRachit commented Mar 10, 2026

@mdxabu @pradeeban Sir Today I implemented
Successfully implemented AES-256 At-Rest Encryption for the Vault!

🔑 What's New:

  • Cryptographic Locking: All uploaded files are now encrypted using the cryptography (Fernet) library before they touch the disk.
  • Encrypted Storage: If anyone accesses the static/img folder directly, the files are completely unreadable and scrambled.
  • Secure In-Memory Decryption: Added a new serve_file route that decrypts data on-the-fly only for the authenticated owner, ensuring data is never stored in its plain state.
  • Dependency Management: Updated requirements.txt to include the new security libraries.

The vault is now significantly more secure against server-side data breaches!

@CodeByRachit CodeByRachit requested a review from mdxabu March 11, 2026 11:53
@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Implemented a FastAPI-WSGI gateway with an asynchronous Motor/MongoDB layer for high-performance, non-blocking storage of unstructured narratives.
Secured the ingestion pipeline with AES-256 (Fernet) encryption, ensuring patient data is processed strictly in-memory and vaulted as encrypted blobs to fulfill Zero-Knowledge privacy standards.

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Updated the core architecture to FastAPI/MongoDB with AES-256 encryption to complete the Zero-Knowledge privacy milestone; please review.

…ents

Implemented disk-spooling with explicit GC to maintain a 10MB RAM footprint during 100MB file uploads. Verified throughput at ~182MB/s.
Refactored ingestion to use a 64KB async generator. Verified 1GB throughput at ~185MB/s with 10MB RAM ceiling. Bypasses framework buffering to ensure stability on resource-constrained hardware.
Verified 500MB round-trip processing at >1GB/s with zero memory bloat and matching SHA-256 hashes.
@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban
I've pushed a major architectural update to this PR. I implemented a memory-efficient streaming pipeline to handle large file ingestion and retrieval without crashing the server.

Here are the benchmark results for the new end-to-end flow:
Memory Footprint: Constant ~10MB (bypassing framework buffering)
Ingestion Speed (Upload + AES-256): 500MB in ~2.8 seconds (~178 MB/s)
Retrieval Speed (Download + AES Decryption): 500MB in ~0.46 seconds (>1 GB/s)
Data Integrity: 100% bit-for-bit cryptographic verification (SHA-256)

Could you please re-review this PR when you have a moment? I made several core changes to support this streaming model.

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Implemented a memory-safe Search API with database-level pagination to efficiently filter patient vault records without overloading server RAM.

@pradeeban
Copy link
Copy Markdown
Member

Good work. Tagged "on hold" per our BHV convention.

@CodeByRachit
Copy link
Copy Markdown
Author

CodeByRachit commented Mar 15, 2026

Good work. Tagged "on hold" per our BHV convention.

Thanks, @pradeeban I recall the 'on hold' convention from the recent BHV discussions.
Just to confirm: since @mdxabu previously asked me to consolidate all my work into this single PR, is it still okay for me to continue pushing my upcoming proof-of-concept commits here so you both can review them?

@CodeByRachit CodeByRachit changed the title feat: implement private user vaults, OTP reset, and secure login feat: implement end-to-end secure vault architecture with NoSQL, OTP-auth, and RBAC Mar 18, 2026
@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Added a secure email update flow to prevent Account Takeovers (ATO). > Users must now verify a 6-digit OTP sent to their new email address before the system officially updates their profile in the database.

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Hardened the AES-CTR streaming pipeline with fail-fast key validation and an 11-case cryptographic Pytest suite.

@CodeByRachit
Copy link
Copy Markdown
Author

@pradeeban @mdxabu Following up on the authentication discussion #111 : you mentioned preferring a minimal, zero-bloat approach like Google OAuth. Just to clarify, does this mean we should completely discard a traditional email/password registration flow? Building an email confirmation loop would require integrating an SMTP service, so relying strictly on OAuth would definitely keep the deployment much lighter. Happy to implement whichever direction you prefer!

@pradeeban
Copy link
Copy Markdown
Member

@CodeByRachit Email/Password should be ok. What we don't want is another Clerk-style auth. A local API gateway solution would be far easier to manage.

@CodeByRachit
Copy link
Copy Markdown
Author

CodeByRachit commented Mar 26, 2026

@CodeByRachit Email/Password should be ok. What we don't want is another Clerk-style auth. A local API gateway solution would be far easier to manage.

@pradeeban Understood completely. I'll drop the external OAuth/third-party dependencies and stick to a self-contained Email/Password system using our own database and JWTs for the local API gateway. Does that align exactly with what you're looking for?

@pradeeban
Copy link
Copy Markdown
Member

We leave that up to you. :) Remember, we are evaluating the project proposal for an empty repository.

Please also refer to Acceptable-and-Ethical-AI-Use-Policy.md when drafting your GSoC proposal or GitHub conversations on the use of AI. We keep receiving AI-drafted proposals that we are ranking much lower than human-drafted proposals.

@CodeByRachit
Copy link
Copy Markdown
Author

@pradeeban Thanks for that information; I will use the local JWT and email/password method since I think that one makes the most sense for air-gapped clinics as well. I completely understand the AI policy and will write my final proposal with 100% original wording and an architecturally view. Thank you for your help.

@CodeByRachit
Copy link
Copy Markdown
Author

CodeByRachit commented Apr 1, 2026

@pradeeban @mdxabu I have submitted my idea proposal and kept only one single pr(this one ) so that it is easy for you to consolidating to see the prototype (recommended by @mdxabu ) Thank you so much for your guidance

@CodeByRachit
Copy link
Copy Markdown
Author

Hi @pradeeban @mdxabu , is it okay if I continue making small updates to refine the prototype while we wait for the results?

@pradeeban
Copy link
Copy Markdown
Member

@CodeByRachit, of course. Open source projects always appreciate volunteer contributions (although BHV is in an awkward spot. We had a similar situation in the Beehive in the past when it was an empty repository. But that time, the volume was more manageable. Now with BHV, there are several PRs on-hold. Going to be tricky).

@CodeByRachit
Copy link
Copy Markdown
Author

sure @pradeeban Since this is already quite large PR , what is the best way forward to avoid adding to the review bottleneck? Should I keep updating it, or start a new, smaller PR?

@pradeeban
Copy link
Copy Markdown
Member

@CodeByRachit You can keep updating this PR.

@CodeByRachit
Copy link
Copy Markdown
Author

Understood @pradeeban

@CodeByRachit
Copy link
Copy Markdown
Author

@pradeeban @mdxabu Implemented and passed a 39-case security test suite validating our core crypto engine, auth flows, and RBAC logic.

Copilot AI review requested due to automatic review settings April 7, 2026 15:56
@CodeByRachit CodeByRachit force-pushed the feat-private-vault-auth branch from 1c9b2f1 to 94befc5 Compare April 7, 2026 15:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a security-focused “Behavioral Health Vault” prototype that combines Flask (auth/RBAC/UI) with FastAPI (streaming ingestion/download) and MongoDB (vault storage), plus a refreshed UI for upload/gallery/admin dashboards.

Changes:

  • Added Flask auth system features (email verification, OTP reset, TOTP 2FA) and RBAC dashboards (admin/owner), plus secure upload/gallery routes.
  • Implemented crypto utilities for Fernet at-rest encryption and AES-CTR streaming encryption/decryption with integrity hashing.
  • Added extensive UI templates and basic pytest suites for auth and crypto behaviors, plus dependency and ignore-file updates.

Reviewed changes

Copilot reviewed 19 out of 22 changed files in this pull request and generated 26 comments.

Show a summary per file
File Description
app.py Core Flask + FastAPI hybrid app: auth, RBAC, OTP/TOTP flows, ingestion/gallery/vault endpoints, and security headers
models.py SQLAlchemy user/entry models + Motor-based MongoDB persistence and GridFS streaming helpers
crypto.py Fernet + AES-CTR streaming crypto helpers and chunked ingestion helper
validators.py File upload validation and filename anonymization
requirements.txt Pinned Python dependencies for Flask/FastAPI/Mongo/crypto stack
test_auth.py Flask auth integration/security-oriented tests
test_crypto.py Crypto integrity and security regression tests
templates/base.html Base layout, navigation, theme toggle, and delete modal logic
templates/welcome.html Landing page UI
templates/login.html Login/signup/forgot-password UI with client-side validation and OTP request
templates/login_2fa.html TOTP prompt UI
templates/profile.html Profile dashboard, avatar upload, password change, and email-update OTP flow
templates/setup_2fa.html 2FA enrollment UI (QR + secret)
templates/upload.html Upload UI with file preview and narrative entry
templates/gallery.html Gallery UI with download/delete controls
templates/admin_dashboard.html Admin metrics + patient directory UI
templates/admin_view_vault.html Admin read-only view into a patient’s records
templates/owner_dashboard.html Owner system dashboard UI
static/img/js/upload.js Duplicated upload preview JS logic extracted to static file
static/img/js/main.js Duplicated base template JS logic extracted to static file
.gitignore Added local env/venv/cache/db ignores

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread app.py Outdated
Comment thread app.py
Comment thread app.py Outdated
Comment thread app.py
Comment thread app.py
Comment thread templates/base.html
Comment thread app.py
Comment thread app.py
Comment thread models.py
Comment thread app.py
@CodeByRachit
Copy link
Copy Markdown
Author

@pradeeban @mdxabu Sorry about the Copilot auto-review noise—an unexpected setting triggered it. I've implemented rate limiting for brute-force protection and hardened the authentication test suite with 20+ passing cases

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Can you please review this pr again with all the changes made

@CodeByRachit
Copy link
Copy Markdown
Author

@mdxabu @pradeeban Strict backend rate-limiting to block brute-force attacks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on hold Not merging this PR now.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants