Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[flake8]
max-line-length = 79
extend-ignore = E203,W503
exclude =
.git,
__pycache__,
.venv,
venv,
build,
dist,
*.egg-info
39 changes: 25 additions & 14 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,6 @@ coverage.xml
# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
Expand All @@ -66,21 +65,33 @@ dmypy.json
# Pyre type checker
.pyre/

# IDE
.vscode/
# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be added to the global gitignore or merged into this project gitignore. For a PyCharm
# project, it is recommended to ignore the whole idea folder.
.idea/
*.swp
*.swo
*~

# OS
# VS Code
.vscode/

# macOS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db

# Windows
Thumbs.db
ehthumbs.db
Desktop.ini

# Docker
.dockerignore

# Logs
*.log
# Exclude anything containing "claude" (case-insensitive)
*claude*
*Claude*
*CLAUDE*
197 changes: 195 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,195 @@
# envtorch
An environment library for RL and beyond
# EnvTorch: Agentic Execution Environments

A unified framework for CodeAct environments that supports both agent execution and RL training, built on Gym/Gymnasium APIs with PyTorch/HuggingFace integration patterns.

## Overview

EnvTorch provides a standard for agentic execution environments following the CodeAct paradigm, where actions are arbitrary Python code that can chain multiple tool calls. The framework bridges traditional RL environments with modern agent capabilities.

### Key Features

- **CodeAct Execution**: Actions are Python code strings executed in persistent contexts
- **State Persistence**: Variables and functions persist across steps within episodes
- **Tool Integration**: MCP (Model Context Protocol) support for external capabilities
- **RL Compatibility**: Transform system for reward computation and training
- **Error Handling**: Exceptions become observations for agent learning
- **Clean APIs**: Minimal, opinionous design following KISS principles

## Quick Start

```python
from src import create_codeact_env, CodeAction

# Create environment
env = create_codeact_env()
obs = env.reset()

# Execute Python code
action = CodeAction(code="""
x = 10
y = 20
result = x * y
print(f"Result: {result}")
result # Return value
""")

obs = env.step(action)
print(f"Output: {obs.execution_result.stdout}")
print(f"Return: {obs.execution_result.return_value}")
```

## Core Components

### Actions and Observations

```python
# Actions contain arbitrary Python code
action = CodeAction(code="math.sqrt(16)")

# Observations include execution results
obs = env.step(action)
print(obs.execution_result.return_value) # 4.0
print(obs.execution_result.success) # True
print(obs.execution_result.stdout) # Any print output
```

### Tool Integration

```python
from src import create_mcp_environment

# Environment with MCP tools
env = create_mcp_environment()
obs = env.reset()

# Tools available as Python objects
action = CodeAction(code="""
content = "Hello, world!"
file_write("/tmp/hello.txt", content)
result = file_read("/tmp/hello.txt")
print(f"File contents: {result}")
""")

obs = env.step(action)
```

### RL Training with Transforms

```python
from src import create_math_env_transform

# Environment that rewards correct math solutions
transform = create_math_env_transform(expected_answer=42)
env = create_codeact_env()
env.transform = transform

# Agent gets rewarded for correct answers
action = CodeAction(code="21 * 2") # Correct answer
obs = env.step(action)
print(obs.reward) # 1.0 (success) + quality bonuses
```

## Architecture

### Type System
- `Action` / `CodeAction`: Base and concrete action types
- `Observation` / `CodeObservation`: Base and concrete observation types
- `State` / `CodeState`: Environment state with execution context
- `ExecutionResult`: Detailed code execution results

### Core Classes
- `Environment`: Base class following Gym API
- `CodeActEnvironment`: Main environment for code execution
- `Transform`: Base class for observation modification
- `ToolRegistry`: Manages available tools and functions

### Transform Examples
- `CodeSafetyTransform`: Penalizes unsafe code patterns
- `MathProblemTransform`: Rewards correct numerical answers
- `CodeQualityTransform`: Evaluates code quality metrics
- `CompositeTransform`: Combines multiple transforms

## File Structure

```
src/
├── types.py # Core type definitions
├── interfaces.py # Abstract base classes
├── environment.py # Main CodeAct environment
├── transforms.py # Transform implementations
├── mcp.py # MCP integration
└── __init__.py # Clean exports
```

## Usage Patterns

### Agent Exploration
```python
env = create_codeact_env()
obs = env.reset()

# Multi-step problem solving
action1 = CodeAction(code="data = [1, 2, 3, 4, 5]")
obs = env.step(action1)

action2 = CodeAction(code="mean = sum(data) / len(data); mean")
obs = env.step(action2) # Uses persistent data from step 1
```

### RL Training Loop
```python
# Create environment with reward function
transform = create_safe_env_transform()
env = create_codeact_env()
env.transform = transform

for episode in range(100):
obs = env.reset()
action = generate_action() # From your policy
obs = env.step(action)

reward = obs.reward # Computed by transforms
# Update policy based on reward
```

### Hybrid Agent + RL
```python
# Phase 1: Agent exploration
env = create_codeact_env()
# Agent explores different solution approaches

# Phase 2: RL optimization
env.transform = optimization_transform
# Train to optimize based on exploration insights
```

## Design Principles

- **KISS Approach**: Minimal, opinionated design
- **Single Way**: One clear way to accomplish tasks
- **Pythonic**: Follows PyTorch/HuggingFace patterns
- **No Inline Comments**: Code should be self-explanatory
- **Functional Composition**: Private functions explain complex logic

## Testing

Run the test suite:
```bash
python test_unified.py
```

Run examples:
```bash
python example.py
```

## Requirements

See `requirements.txt` for dependencies. Core requirements:
- Python 3.9+
- PyTorch 2.0+
- HuggingFace datasets

## License

BSD 3-Clause License (see LICENSE file)
131 changes: 131 additions & 0 deletions example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
#!/usr/bin/env python3
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

"""
Simple example demonstrating EnvTorch environment usage.

This shows the minimal steps to get started with code execution environments.
"""

from src import CodeAction, CodeExecutionEnvironment, CodingEnv, Transform


def basic_code_execution_example():
"""Basic example using CodeExecutionEnvironment."""
print("=== Basic Code Execution Example ===")

# Create basic code execution environment
env = CodeExecutionEnvironment()

print("Note: This example shows the interface but requires Docker to actually run")
print("Environment created successfully!")

# Create an action to calculate compound interest
action = CodeAction(
code="""
# Calculate compound interest
principal = 1000
rate = 0.05
time = 3

final_amount = principal * (1 + rate) ** time
interest_earned = final_amount - principal

print(f"Principal: ${principal}")
print(f"Rate: {rate*100}%")
print(f"Time: {time} years")
print(f"Final amount: ${final_amount:.2f}")
print(f"Interest earned: ${interest_earned:.2f}")

final_amount
"""
)

print(f"Created action with code length: {len(action.code)} characters")
print()


def coding_environment_example():
"""Example using CodingEnv with safety and quality transforms."""
print("=== Coding Environment Example ===")

# Create coding environment with built-in transforms
env = CodingEnv()

print("CodingEnv created with safety and quality transforms!")
print("This environment includes:")
print("• Code safety checks")
print("• Code quality analysis")
print("• Composite transform system")

# Example of safe code
safe_action = CodeAction(
code="""
# Safe mathematical calculation
import math

def calculate_fibonacci(n):
if n <= 1:
return n
return calculate_fibonacci(n-1) + calculate_fibonacci(n-2)

# Calculate first 10 Fibonacci numbers
fib_sequence = [calculate_fibonacci(i) for i in range(10)]
print(f"First 10 Fibonacci numbers: {fib_sequence}")
fib_sequence
"""
)

print(f"Created safe action with code length: {len(safe_action.code)} characters")
print()


def transform_system_example():
"""Example showing how to create custom transforms."""
print("=== Transform System Example ===")

# Example custom transform
class RewardTransform(Transform):
"""Transform that adds rewards based on code execution results."""

def __call__(self, observation):
# This is just an example - actual implementation would need
# a proper observation object with execution results
print("Custom transform would analyze execution results here")
print("and add rewards based on success criteria")
return observation

transform = RewardTransform()
print("Created custom RewardTransform")

print("Transform system allows:")
print("• Chaining multiple transforms")
print("• Adding rewards for RL training")
print("• Custom observation processing")
print("• Safety and quality checks")
print()


if __name__ == "__main__":
print("EnvTorch Environment Examples")
print("=" * 40)
print()

basic_code_execution_example()
coding_environment_example()
transform_system_example()

print("=" * 40)
print("Examples complete! 🎉")
print()
print("Key takeaways:")
print("• CodeAction(code='...') for arbitrary Python execution")
print("• CodeExecutionEnvironment provides base functionality")
print("• CodingEnv adds safety and quality transforms")
print("• Transform system enables customization and RL training")
print("• Docker integration provides sandboxed execution")
print("=" * 40)
Loading