Skip to content

Feature/303 shielded training example#304

Merged
LorenzzoQM merged 2 commits intodevelopfrom
feature/303-shielded-training-example
Aug 25, 2025
Merged

Feature/303 shielded training example#304
LorenzzoQM merged 2 commits intodevelopfrom
feature/303-shielded-training-example

Conversation

@LorenzzoQM
Copy link
Copy Markdown
Contributor

Description

Closes #303

Adds an example script on how to train with shield using action replacement and action masking.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How should this pull request be reviewed?

  • By commit
  • All changes at once

How Has This Been Tested?

Not applicable.

Checklist

  • I have performed a self-review of my code
  • I have commented my code in hard-to-understand areas
  • I have made corresponding changes to the documentation and release notes
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works

@LorenzzoQM LorenzzoQM self-assigned this Aug 22, 2025
Copilot AI review requested due to automatic review settings August 22, 2025 22:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a comprehensive example demonstrating shielded training for reinforcement learning in satellite tasking environments. The implementation shows how to apply safety constraints during both training and testing phases using action replacement and action masking techniques.

  • Introduces a complete Jupyter notebook example with satellite environment configuration, shield implementation, and training/testing workflows
  • Implements wrapper classes for action logging, post-posed shielding (action replacement), and action masking
  • Provides a handmade shield example based on power and reaction wheel constraints for satellite safety

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
examples/training_with_shield.ipynb Complete example notebook demonstrating shielded training with action replacement and masking
examples/_default.rst Adds the new example to the documentation index
docs/source/release_notes.rst Documents the new shielded training example feature

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@LorenzzoQM LorenzzoQM force-pushed the feature/303-shielded-training-example branch from dc09012 to 6c1a181 Compare August 22, 2025 23:04
@LorenzzoQM LorenzzoQM requested a review from Copilot August 22, 2025 23:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@LorenzzoQM LorenzzoQM requested a review from Mark2000 August 25, 2025 17:06
Copy link
Copy Markdown
Contributor

@Mark2000 Mark2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@LorenzzoQM LorenzzoQM merged commit ce9358e into develop Aug 25, 2025
5 checks passed
@LorenzzoQM LorenzzoQM deleted the feature/303-shielded-training-example branch August 25, 2025 18:08
@Mark2000 Mark2000 mentioned this pull request Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add example script on how to train with shields

3 participants