MedAgentShield

MedAgentShield is an open-source evaluation and governance toolkit for healthcare LLM agents. It combines regulatory evidence mapping with runtime safety testing for prompt injection, PHI leakage, unsafe tool use, and canary-token exposure.

Overview

Healthcare LLM systems sit at the intersection of clinical risk, privacy, security, and AI governance. A deployment may need to reason about HIPAA technical safeguards, FDA expectations for AI-enabled medical devices, NIST AI RMF risk management, ISO/IEC 42001 management-system controls, EU AI Act high-risk AI obligations, and SOC 2 evidence requirements.

MedAgentShield provides a reference implementation for turning those concerns into auditable engineering artifacts:

a healthcare LLM reference architecture;
a STRIDE-AI threat model;
a multi-framework control mapping;
a NIST SP 800-30 style risk register;
a Python CLI scanner for deployment-governance evidence;
a runtime sandbox for LLM-agent safety evaluation.

The package and CLI are currently named healthguard for compatibility with the existing Python module layout.

Capabilities

Governance Evidence Scanner

The static scanner reads a YAML deployment profile and emits a Markdown evidence report. The report groups findings by NIST AI RMF function and includes cross-references to relevant healthcare, security, and AI-governance frameworks.

The scanner includes 20 implemented checks covering:

PHI redaction posture;
prompt-injection canary configuration;
IAM least privilege;
encryption in transit and at rest;
audit-log integrity;
model provenance;
output PII leakage controls;
toxicity, bias, and red-team evidence;
human oversight and incident response;
data lineage and model-card documentation;
vendor risk, change management, breach notification, and access review.

LLM-Agent Safety Sandbox

The sandbox evaluates agent behavior under healthcare-specific runtime risks:

indirect prompt injection from retrieved clinical context;
unsafe tool use, including external egress attempts;
PHI-like identifiers in tool arguments or final answers;
canary-token leakage;
excessive tool-call attempts;
malformed security policy.

It supports three adapter modes:

replay: evaluate declared scenario traces;
heuristic: use a deterministic unsafe baseline agent for CI-safe tests;
openai-compatible: call a live OpenAI-compatible chat-completions endpoint and evaluate the model-generated tool calls through the same policy monitor.

Reference Architecture

The repository uses a hypothetical healthcare LLM system called MedAssist as the reference system. It is modeled as a clinician-facing assistant that processes de-identified patient summaries, invokes backend services, and generates clinician-facing output.

flowchart LR
    Clinician["Clinician UI"] -->|"OIDC + WAF"| APIGW["API Gateway"]
    APIGW --> Lambda["LLM Orchestrator"]
    Lambda --> Redactor["PHI Redaction Service"]
    Redactor --> Model["Foundation Model"]
    Model --> Lambda
    Lambda --> Logs["KMS-encrypted Audit Logs"]
    Lambda --> Evidence["Versioned Evidence Store"]
    Lambda --> Clinician

The architecture is intentionally realistic rather than product-specific. It is designed to make security, privacy, and governance controls concrete enough to test and document.

Repository Layout

MedAgentShield/
  README.md
  CHANGELOG.md
  CONTRIBUTING.md
  LICENSE
  pyproject.toml
  docs/
    architecture.md
    threat_model.md
    control_mapping.md
    risk_register.md
    agent_sandbox.md
    research_protocol.md
  examples/
    sample_config.yaml
    system_prompt.md
    agent_scenario_prompt_injection.yaml
    sandbox_scenarios/
  healthguard/
    cli.py
    config.py
    report.py
    checks/
    sandbox/
  tests/

Installation

git clone https://github.com/EM0V0/MedAgentShield.git
cd MedAgentShield
pip install -e ".[dev]"

Static Evidence Scan

Run the sample governance scan:

healthguard scan examples/sample_config.yaml -o evidence_report.md

List registered checks:

healthguard list-checks

The generated report contains:

executive summary;
per-check PASS/FAIL status;
severity;
evidence details;
remediation guidance;
framework references.

Sandbox Evaluation

Replay a declared trace without executing tools:

healthguard sandbox run examples/agent_scenario_prompt_injection.yaml \
  -o sandbox_report.md

Execute only policy-approved mock tools:

healthguard sandbox execute examples/agent_scenario_prompt_injection.yaml \
  -o sandbox_execution_report.md

Run the deterministic benchmark:

healthguard sandbox benchmark examples/sandbox_scenarios \
  --adapter heuristic \
  -o sandbox_benchmark_report.md

Run a bounded multi-turn loop:

healthguard sandbox loop examples/sandbox_scenarios/benign_clinician_notify.yaml \
  --adapter heuristic \
  --max-steps 4 \
  -o sandbox_loop_report.md

Live LLM Evaluation

MedAgentShield can evaluate live model behavior through an OpenAI-compatible chat-completions endpoint. API keys are read from environment variables and must not be committed to the repository.

PowerShell example:

$env:HEALTHGUARD_LLM_MODEL="openai/gpt-4.1-mini"
$env:HEALTHGUARD_LLM_BASE_URL="https://openrouter.ai/api/v1"
$env:HEALTHGUARD_LLM_API_KEY="your-api-key"
$env:HEALTHGUARD_LLM_TIMEOUT_SECONDS="60"

healthguard sandbox benchmark examples\sandbox_scenarios `
  --adapter openai-compatible `
  -o sandbox_live_benchmark.md

Remove local credentials after testing:

Remove-Item Env:HEALTHGUARD_LLM_API_KEY -ErrorAction SilentlyContinue
Remove-Item Env:HEALTHGUARD_LLM_MODEL -ErrorAction SilentlyContinue
Remove-Item Env:HEALTHGUARD_LLM_BASE_URL -ErrorAction SilentlyContinue
Remove-Item Env:HEALTHGUARD_LLM_TIMEOUT_SECONDS -ErrorAction SilentlyContinue

The live adapter asks the model to emit structured JSON with proposed tool calls and a final response. The model is not trusted to enforce policy. All proposed actions are evaluated by the sandbox monitor before any mock tool is executed.

Testing

pytest --cov-fail-under=95
ruff check
mypy healthguard

Recent local validation:

157 pytest cases passed;
total coverage above 95%;
ruff passed;
mypy passed.

Documentation

Key project documents:

docs/architecture.md: reference healthcare LLM architecture;
docs/threat_model.md: STRIDE-AI threat model;
docs/control_mapping.md: multi-framework control matrix;
docs/risk_register.md: NIST SP 800-30 style risk register;
docs/agent_sandbox.md: sandbox design and roadmap;
docs/research_protocol.md: reproducible runtime-safety evaluation protocol.

Development Status

MedAgentShield is a research and reference implementation. It is suitable for studying governance evidence workflows and LLM-agent runtime safety patterns, but it is not a certified compliance product or a clinical validation system.

Current limitations:

PHI detection is conservative and regex-based;
sandbox tools are deterministic mocks, not real EHR/FHIR integrations;
the bundled benchmark suite is a seed suite, not a comprehensive safety benchmark;
live LLM results are model-dependent and may vary across runs;
framework mappings require deployment-specific legal and audit review before production use.

Related Work

Project	Focus	Relationship
microsoft/agent-governance-toolkit	AI-agent governance and OWASP Agentic Top 10	Complementary general-purpose agent governance toolkit.
emmanuelgjr/GenAI-Security-Crosswalk	GenAI security risk crosswalk	Related risk-mapping work; MedAgentShield is healthcare- and evidence-workflow focused.
The-Swarm-Corporation/MedGuard	HIPAA-oriented runtime library for LLM agents	Runtime protection layer; MedAgentShield focuses on evaluation and audit evidence.
microsoft/presidio	PII/PHI detection	Candidate backend for stronger PHI detection.
intuitem/ciso-assistant-community	GRC platform	Potential downstream destination for generated evidence artifacts.

Disclaimer

MedAgentShield is a reference toolkit. It does not provide legal advice, clinical safety validation, HIPAA certification, FDA clearance, SOC 2 attestation, ISO/IEC 42001 certification, or EU AI Act conformity assessment. Use qualified legal, clinical, security, and audit professionals for production deployments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedAgentShield

Overview

Capabilities

Governance Evidence Scanner

LLM-Agent Safety Sandbox

Reference Architecture

Repository Layout

Installation

Static Evidence Scan

Sandbox Evaluation

Live LLM Evaluation

Testing

Documentation

Development Status

Related Work

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docs		docs
examples		examples
healthguard		healthguard
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

MedAgentShield

Overview

Capabilities

Governance Evidence Scanner

LLM-Agent Safety Sandbox

Reference Architecture

Repository Layout

Installation

Static Evidence Scan

Sandbox Evaluation

Live LLM Evaluation

Testing

Documentation

Development Status

Related Work

Disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages