Skip to content

Conversation

@HamidShojanazeri
Copy link
Contributor

This PR adds BrowserGym integration to OpenEnv. BrowserGym is a unified framework that wraps multiple web navigation benchmarks (MiniWoB, WebArena, VisualWebArena, etc.) under a single Gymnasium API.

MiniWoB tasks work out of the box without any external infrastructure - WebArena is still available for evaluation but requires the usual backend setup.

This PR integrates WebArena, a realistic web environment for building autonomous
agents, into the OpenEnv framework following the established integration patterns.

Key components:
- Models: WebArenaAction, WebArenaObservation, WebArenaState dataclasses
- Server: WebArenaEnvironment wrapping browser_env.ScriptBrowserEnv
- Client: HTTPEnvClient implementation for remote interaction
- Docker: Containerized deployment with Playwright and Chromium
- Documentation: Comprehensive README with examples and configuration

Features:
- Browser-based interaction using Playwright
- Support for accessibility tree and HTML observations
- Rich action space (click, type, navigate, scroll, etc.)
- JSON-based task configuration
- Gymnasium-compatible API
- GitHub Actions integration for automated builds

The implementation follows OpenEnv patterns from existing environments
(echo_env, textarena_env, etc.) and provides a consistent interface for
web-based agent evaluation.
This commit refactors the integration from WebArena-only to the comprehensive
BrowserGym framework, which provides access to multiple web navigation benchmarks
under a unified API.

Key improvements over WebArena-only integration:

**Training + Evaluation in One Package**
- MiniWoB++: 100+ simple tasks for training (no setup required!)
- WebArena: 812 realistic tasks for evaluation (requires backend)
- VisualWebArena: Visual navigation tasks
- WorkArena: Enterprise automation tasks

**Why This is Better**
1. MiniWoB tasks work out-of-the-box without external infrastructure
2. Users can train agents (MiniWoB) before evaluating (WebArena)
3. Unified API across all benchmarks
4. Future-proof as new benchmarks are added to BrowserGym
5. Maintained by ServiceNow Research (actively developed)

**Complete Training Pipeline**
- Train on MiniWoB (fast, randomized, dense rewards)
- Evaluate on WebArena (realistic, complex, sparse rewards)
- All within the same OpenEnv interface

Changes:
- Replaced webarena_env/ with browsergym_env/
- Updated models for multiple benchmark support
- Environment wrapper now uses gym.make() for BrowserGym
- Client supports benchmark selection via environment variables
- Dockerfile installs BrowserGym + MiniWoB + WebArena packages
- README highlights MiniWoB for training, WebArena for evaluation
- GitHub Actions workflow updated to build browsergym-env

This makes OpenEnv a complete platform for web agent development:
training (MiniWoB) → evaluation (WebArena/VisualWebArena).
- test_browsergym_manual.py: Manual testing script with no external dependencies
- tests/envs/test_browsergym_models.py: Unit tests for models (pytest)
- tests/envs/test_browsergym_environment.py: Integration tests for server (pytest)
- src/envs/browsergym_env/TESTING.md: Complete testing guide

Tests verify:
- Model creation and validation (BrowserGymAction, Observation, State)
- Client import and methods
- Environment initialization (requires browsergym)
- Server endpoints and HTTP communication (requires browsergym + server)
- Docker integration (requires Docker)

Manual tests passed: 2/2 (models + client import)
BrowserGym tests skipped: requires 'pip install browsergym browsergym-miniwob'
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 31, 2025

**Key Advantage**: MiniWoB tasks work out-of-the-box with no external infrastructure needed!

## Quick Start - Training (MiniWoB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a bit confusing as to what tasks might be exposed under these environments-should we make a table explaining these at the ReadMe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! added a break down.

@init27 init27 requested a review from Copilot November 1, 2025 03:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds BrowserGym environment integration to OpenEnv, providing unified access to multiple web-based agent benchmarks (MiniWoB++, WebArena, VisualWebArena, WorkArena) through a Gymnasium-compatible API.

Key Changes

  • Implemented complete BrowserGym environment wrapper with HTTP client/server architecture
  • Added comprehensive unit tests for models and environment server
  • Configured Docker build pipeline for BrowserGym environment
  • Included extensive documentation with training and evaluation examples

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/envs/browsergym_env/models.py Defines BrowserGymAction, BrowserGymObservation, and BrowserGymState dataclasses
src/envs/browsergym_env/client.py Implements HTTP client for BrowserGym environment communication
src/envs/browsergym_env/server/browsergym_environment.py Wraps BrowserGym Gymnasium environment for OpenEnv compatibility
src/envs/browsergym_env/server/app.py FastAPI server application with environment configuration
src/envs/browsergym_env/server/Dockerfile Container specification with Playwright and browser dependencies
src/envs/browsergym_env/server/requirements.txt Python dependencies for BrowserGym and related packages
src/envs/browsergym_env/init.py Module exports and usage examples
src/envs/browsergym_env/README.md Comprehensive documentation for environment usage
tests/envs/test_browsergym_models.py Unit tests for BrowserGym data models
tests/envs/test_browsergym_environment.py Integration tests for BrowserGym environment server
.github/workflows/docker-build.yml Adds browsergym-env to Docker build matrix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Nov 1, 2025

@HamidShojanazeri I've opened a new pull request, #119, to work on those changes. Once the pull request is ready, I'll request review from you.

HamidShojanazeri and others added 6 commits October 31, 2025 21:13
Addresses feedback about unclear task availability. Added detailed tables showing:

- MiniWoB tasks organized by category (click, text entry, navigation, visual, email/social)
- All 100+ tasks listed with descriptions and difficulty ratings
- WebArena tasks grouped by website (~812 total)
- Task examples and usage patterns
- Links to full task lists for each benchmark

Makes it much clearer what tasks are available and how to use them.
Expanded 'Why BrowserGym?' section to better explain:
- MiniWoB: synthetic web tasks (click buttons, forms) in isolated browsers
- WebArena: real websites with multi-step tasks (e-commerce, GitLab)
- VisualWebArena: requires visual understanding
- WorkArena: enterprise software automation

Added concrete examples and emphasized training vs evaluation use cases.
Remove module-level sys.path manipulation from test files:
- test_browsergym_models.py: removed entirely (pytest handles imports)
- test_browsergym_environment.py: moved into server fixture where it's
  actually needed for subprocess environment

This is cleaner and follows pytest best practices.
For VisualWebArena and advanced training scenarios, users may need access
to additional data like timestamps, browser state, or custom fields that
BrowserGym provides.

Changes:
- Store full 'obs' and 'info' dicts from BrowserGym in observation.metadata
- Accessible via metadata['browsergym_obs'] and metadata['browsergym_info']
- Common fields still extracted to top-level for convenience
- Added documentation showing how to access timestamps, DOM objects, etc.

This ensures no information loss while maintaining a clean API.
The environment exposes everything BrowserGym provides - users can access
whatever they need for training.
@burtenshaw
Copy link
Collaborator

Hey @HamidShojanazeri , great PR! I've been trying it out whilst building an example I had errors so I opened this PR on your branch: #140 . Basically:

  • some font libraries have changed named
  • the benchmark modules need to be imported for the docker container to build

[ENHANCEMENT] Fix browsergym environment
@github-actions
Copy link

github-actions bot commented Nov 4, 2025

⚠️ Deployment failed for browsergym_env

  • Space repo:
  • Live URL:

Please resolve your environment.

You can iterate locally or validate fixes by running scripts/deploy_to_hf.sh --env "browsergym_env".

@init27
Copy link
Contributor

init27 commented Nov 4, 2025

Thanks for fixing the comments!

@init27 init27 merged commit ee0ce46 into main Nov 4, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. New Environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants