MedAgentShield is an open-source evaluation and governance toolkit for healthcare LLM agents. It combines regulatory evidence mapping with runtime safety testing for prompt injection, PHI leakage, unsafe tool use, and canary-token exposure.
Healthcare LLM systems sit at the intersection of clinical risk, privacy, security, and AI governance. A deployment may need to reason about HIPAA technical safeguards, FDA expectations for AI-enabled medical devices, NIST AI RMF risk management, ISO/IEC 42001 management-system controls, EU AI Act high-risk AI obligations, and SOC 2 evidence requirements.
MedAgentShield provides a reference implementation for turning those concerns into auditable engineering artifacts:
- a healthcare LLM reference architecture;
- a STRIDE-AI threat model;
- a multi-framework control mapping;
- a NIST SP 800-30 style risk register;
- a Python CLI scanner for deployment-governance evidence;
- a runtime sandbox for LLM-agent safety evaluation.
The package and CLI are currently named healthguard for compatibility with
the existing Python module layout.
The static scanner reads a YAML deployment profile and emits a Markdown evidence report. The report groups findings by NIST AI RMF function and includes cross-references to relevant healthcare, security, and AI-governance frameworks.
The scanner includes 20 implemented checks covering:
- PHI redaction posture;
- prompt-injection canary configuration;
- IAM least privilege;
- encryption in transit and at rest;
- audit-log integrity;
- model provenance;
- output PII leakage controls;
- toxicity, bias, and red-team evidence;
- human oversight and incident response;
- data lineage and model-card documentation;
- vendor risk, change management, breach notification, and access review.
The sandbox evaluates agent behavior under healthcare-specific runtime risks:
- indirect prompt injection from retrieved clinical context;
- unsafe tool use, including external egress attempts;
- PHI-like identifiers in tool arguments or final answers;
- canary-token leakage;
- excessive tool-call attempts;
- malformed security policy.
It supports three adapter modes:
replay: evaluate declared scenario traces;heuristic: use a deterministic unsafe baseline agent for CI-safe tests;openai-compatible: call a live OpenAI-compatible chat-completions endpoint and evaluate the model-generated tool calls through the same policy monitor.
The repository uses a hypothetical healthcare LLM system called MedAssist as the reference system. It is modeled as a clinician-facing assistant that processes de-identified patient summaries, invokes backend services, and generates clinician-facing output.
flowchart LR
Clinician["Clinician UI"] -->|"OIDC + WAF"| APIGW["API Gateway"]
APIGW --> Lambda["LLM Orchestrator"]
Lambda --> Redactor["PHI Redaction Service"]
Redactor --> Model["Foundation Model"]
Model --> Lambda
Lambda --> Logs["KMS-encrypted Audit Logs"]
Lambda --> Evidence["Versioned Evidence Store"]
Lambda --> Clinician
The architecture is intentionally realistic rather than product-specific. It is designed to make security, privacy, and governance controls concrete enough to test and document.
MedAgentShield/
README.md
CHANGELOG.md
CONTRIBUTING.md
LICENSE
pyproject.toml
docs/
architecture.md
threat_model.md
control_mapping.md
risk_register.md
agent_sandbox.md
research_protocol.md
examples/
sample_config.yaml
system_prompt.md
agent_scenario_prompt_injection.yaml
sandbox_scenarios/
healthguard/
cli.py
config.py
report.py
checks/
sandbox/
tests/
git clone https://github.com/EM0V0/MedAgentShield.git
cd MedAgentShield
pip install -e ".[dev]"Run the sample governance scan:
healthguard scan examples/sample_config.yaml -o evidence_report.mdList registered checks:
healthguard list-checksThe generated report contains:
- executive summary;
- per-check PASS/FAIL status;
- severity;
- evidence details;
- remediation guidance;
- framework references.
Replay a declared trace without executing tools:
healthguard sandbox run examples/agent_scenario_prompt_injection.yaml \
-o sandbox_report.mdExecute only policy-approved mock tools:
healthguard sandbox execute examples/agent_scenario_prompt_injection.yaml \
-o sandbox_execution_report.mdRun the deterministic benchmark:
healthguard sandbox benchmark examples/sandbox_scenarios \
--adapter heuristic \
-o sandbox_benchmark_report.mdRun a bounded multi-turn loop:
healthguard sandbox loop examples/sandbox_scenarios/benign_clinician_notify.yaml \
--adapter heuristic \
--max-steps 4 \
-o sandbox_loop_report.mdMedAgentShield can evaluate live model behavior through an OpenAI-compatible chat-completions endpoint. API keys are read from environment variables and must not be committed to the repository.
PowerShell example:
$env:HEALTHGUARD_LLM_MODEL="openai/gpt-4.1-mini"
$env:HEALTHGUARD_LLM_BASE_URL="https://openrouter.ai/api/v1"
$env:HEALTHGUARD_LLM_API_KEY="your-api-key"
$env:HEALTHGUARD_LLM_TIMEOUT_SECONDS="60"
healthguard sandbox benchmark examples\sandbox_scenarios `
--adapter openai-compatible `
-o sandbox_live_benchmark.mdRemove local credentials after testing:
Remove-Item Env:HEALTHGUARD_LLM_API_KEY -ErrorAction SilentlyContinue
Remove-Item Env:HEALTHGUARD_LLM_MODEL -ErrorAction SilentlyContinue
Remove-Item Env:HEALTHGUARD_LLM_BASE_URL -ErrorAction SilentlyContinue
Remove-Item Env:HEALTHGUARD_LLM_TIMEOUT_SECONDS -ErrorAction SilentlyContinueThe live adapter asks the model to emit structured JSON with proposed tool calls and a final response. The model is not trusted to enforce policy. All proposed actions are evaluated by the sandbox monitor before any mock tool is executed.
pytest --cov-fail-under=95
ruff check
mypy healthguardRecent local validation:
- 157 pytest cases passed;
- total coverage above 95%;
- ruff passed;
- mypy passed.
Key project documents:
docs/architecture.md: reference healthcare LLM architecture;docs/threat_model.md: STRIDE-AI threat model;docs/control_mapping.md: multi-framework control matrix;docs/risk_register.md: NIST SP 800-30 style risk register;docs/agent_sandbox.md: sandbox design and roadmap;docs/research_protocol.md: reproducible runtime-safety evaluation protocol.
MedAgentShield is a research and reference implementation. It is suitable for studying governance evidence workflows and LLM-agent runtime safety patterns, but it is not a certified compliance product or a clinical validation system.
Current limitations:
- PHI detection is conservative and regex-based;
- sandbox tools are deterministic mocks, not real EHR/FHIR integrations;
- the bundled benchmark suite is a seed suite, not a comprehensive safety benchmark;
- live LLM results are model-dependent and may vary across runs;
- framework mappings require deployment-specific legal and audit review before production use.
| Project | Focus | Relationship |
|---|---|---|
| microsoft/agent-governance-toolkit | AI-agent governance and OWASP Agentic Top 10 | Complementary general-purpose agent governance toolkit. |
| emmanuelgjr/GenAI-Security-Crosswalk | GenAI security risk crosswalk | Related risk-mapping work; MedAgentShield is healthcare- and evidence-workflow focused. |
| The-Swarm-Corporation/MedGuard | HIPAA-oriented runtime library for LLM agents | Runtime protection layer; MedAgentShield focuses on evaluation and audit evidence. |
| microsoft/presidio | PII/PHI detection | Candidate backend for stronger PHI detection. |
| intuitem/ciso-assistant-community | GRC platform | Potential downstream destination for generated evidence artifacts. |
MedAgentShield is a reference toolkit. It does not provide legal advice, clinical safety validation, HIPAA certification, FDA clearance, SOC 2 attestation, ISO/IEC 42001 certification, or EU AI Act conformity assessment. Use qualified legal, clinical, security, and audit professionals for production deployments.