-
Notifications
You must be signed in to change notification settings - Fork 93
Add BrowserGym Environment Integration (MiniWoB + WebArena +more) #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR integrates WebArena, a realistic web environment for building autonomous agents, into the OpenEnv framework following the established integration patterns. Key components: - Models: WebArenaAction, WebArenaObservation, WebArenaState dataclasses - Server: WebArenaEnvironment wrapping browser_env.ScriptBrowserEnv - Client: HTTPEnvClient implementation for remote interaction - Docker: Containerized deployment with Playwright and Chromium - Documentation: Comprehensive README with examples and configuration Features: - Browser-based interaction using Playwright - Support for accessibility tree and HTML observations - Rich action space (click, type, navigate, scroll, etc.) - JSON-based task configuration - Gymnasium-compatible API - GitHub Actions integration for automated builds The implementation follows OpenEnv patterns from existing environments (echo_env, textarena_env, etc.) and provides a consistent interface for web-based agent evaluation.
This commit refactors the integration from WebArena-only to the comprehensive BrowserGym framework, which provides access to multiple web navigation benchmarks under a unified API. Key improvements over WebArena-only integration: **Training + Evaluation in One Package** - MiniWoB++: 100+ simple tasks for training (no setup required!) - WebArena: 812 realistic tasks for evaluation (requires backend) - VisualWebArena: Visual navigation tasks - WorkArena: Enterprise automation tasks **Why This is Better** 1. MiniWoB tasks work out-of-the-box without external infrastructure 2. Users can train agents (MiniWoB) before evaluating (WebArena) 3. Unified API across all benchmarks 4. Future-proof as new benchmarks are added to BrowserGym 5. Maintained by ServiceNow Research (actively developed) **Complete Training Pipeline** - Train on MiniWoB (fast, randomized, dense rewards) - Evaluate on WebArena (realistic, complex, sparse rewards) - All within the same OpenEnv interface Changes: - Replaced webarena_env/ with browsergym_env/ - Updated models for multiple benchmark support - Environment wrapper now uses gym.make() for BrowserGym - Client supports benchmark selection via environment variables - Dockerfile installs BrowserGym + MiniWoB + WebArena packages - README highlights MiniWoB for training, WebArena for evaluation - GitHub Actions workflow updated to build browsergym-env This makes OpenEnv a complete platform for web agent development: training (MiniWoB) → evaluation (WebArena/VisualWebArena).
- test_browsergym_manual.py: Manual testing script with no external dependencies - tests/envs/test_browsergym_models.py: Unit tests for models (pytest) - tests/envs/test_browsergym_environment.py: Integration tests for server (pytest) - src/envs/browsergym_env/TESTING.md: Complete testing guide Tests verify: - Model creation and validation (BrowserGymAction, Observation, State) - Client import and methods - Environment initialization (requires browsergym) - Server endpoints and HTTP communication (requires browsergym + server) - Docker integration (requires Docker) Manual tests passed: 2/2 (models + client import) BrowserGym tests skipped: requires 'pip install browsergym browsergym-miniwob'
|
|
||
| **Key Advantage**: MiniWoB tasks work out-of-the-box with no external infrastructure needed! | ||
|
|
||
| ## Quick Start - Training (MiniWoB) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a bit confusing as to what tasks might be exposed under these environments-should we make a table explaining these at the ReadMe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! added a break down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds BrowserGym environment integration to OpenEnv, providing unified access to multiple web-based agent benchmarks (MiniWoB++, WebArena, VisualWebArena, WorkArena) through a Gymnasium-compatible API.
Key Changes
- Implemented complete BrowserGym environment wrapper with HTTP client/server architecture
- Added comprehensive unit tests for models and environment server
- Configured Docker build pipeline for BrowserGym environment
- Included extensive documentation with training and evaluation examples
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/envs/browsergym_env/models.py | Defines BrowserGymAction, BrowserGymObservation, and BrowserGymState dataclasses |
| src/envs/browsergym_env/client.py | Implements HTTP client for BrowserGym environment communication |
| src/envs/browsergym_env/server/browsergym_environment.py | Wraps BrowserGym Gymnasium environment for OpenEnv compatibility |
| src/envs/browsergym_env/server/app.py | FastAPI server application with environment configuration |
| src/envs/browsergym_env/server/Dockerfile | Container specification with Playwright and browser dependencies |
| src/envs/browsergym_env/server/requirements.txt | Python dependencies for BrowserGym and related packages |
| src/envs/browsergym_env/init.py | Module exports and usage examples |
| src/envs/browsergym_env/README.md | Comprehensive documentation for environment usage |
| tests/envs/test_browsergym_models.py | Unit tests for BrowserGym data models |
| tests/envs/test_browsergym_environment.py | Integration tests for BrowserGym environment server |
| .github/workflows/docker-build.yml | Adds browsergym-env to Docker build matrix |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
|
@HamidShojanazeri I've opened a new pull request, #119, to work on those changes. Once the pull request is ready, I'll request review from you. |
Addresses feedback about unclear task availability. Added detailed tables showing: - MiniWoB tasks organized by category (click, text entry, navigation, visual, email/social) - All 100+ tasks listed with descriptions and difficulty ratings - WebArena tasks grouped by website (~812 total) - Task examples and usage patterns - Links to full task lists for each benchmark Makes it much clearer what tasks are available and how to use them.
Expanded 'Why BrowserGym?' section to better explain: - MiniWoB: synthetic web tasks (click buttons, forms) in isolated browsers - WebArena: real websites with multi-step tasks (e-commerce, GitLab) - VisualWebArena: requires visual understanding - WorkArena: enterprise software automation Added concrete examples and emphasized training vs evaluation use cases.
Remove module-level sys.path manipulation from test files: - test_browsergym_models.py: removed entirely (pytest handles imports) - test_browsergym_environment.py: moved into server fixture where it's actually needed for subprocess environment This is cleaner and follows pytest best practices.
For VisualWebArena and advanced training scenarios, users may need access to additional data like timestamps, browser state, or custom fields that BrowserGym provides. Changes: - Store full 'obs' and 'info' dicts from BrowserGym in observation.metadata - Accessible via metadata['browsergym_obs'] and metadata['browsergym_info'] - Common fields still extracted to top-level for convenience - Added documentation showing how to access timestamps, DOM objects, etc. This ensures no information loss while maintaining a clean API. The environment exposes everything BrowserGym provides - users can access whatever they need for training.
|
Hey @HamidShojanazeri , great PR! I've been trying it out whilst building an example I had errors so I opened this PR on your branch: #140 . Basically:
|
[ENHANCEMENT] Fix browsergym environment
|
Thanks for fixing the comments! |
This PR adds BrowserGym integration to OpenEnv. BrowserGym is a unified framework that wraps multiple web navigation benchmarks (MiniWoB, WebArena, VisualWebArena, etc.) under a single Gymnasium API.
MiniWoB tasks work out of the box without any external infrastructure - WebArena is still available for evaluation but requires the usual backend setup.