Skip to content

End to end local Integration test for parseImage function: See issue #1 #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 70 commits into
base: feat/tests
Choose a base branch
from

Conversation

gabrielbugarija
Copy link

NOTE: The Test is local, and without an AWS key and endpoint, spinup and spindown could not be implemented.

abrichr and others added 6 commits March 25, 2025 19:09
Introduces a runnable demo (`demo.py`) proving the concept of using an LLM
to plan a UI action based on a user goal and mocked visual elements
(generated by `synthetic_ui.py`).

Includes:
- Core planning logic and prompting (`core.py`)
- Anthropic API integration (`completions.py`)
- Pydantic structured output for LLM response
- Visualization of the target UI element
NOTE: The Test is local, and without an AWS key and endpoint, spinup and spindown could not be implemented.
@gabrielbugarija
Copy link
Author

Overview
This PR adds an integration test suite for the parse_image(...) function. The tests are designed to run entirely on a local machine and simulate end-to-end image parsing behavior using either a real API or a mocked one.

Test Capabilities

  • Uses local test images stored in the project directory
  • Encodes images in base64 and sends simulated API POST requests
  • Validates the structure and content of responses (e.g., checks for "segments")
  • Supports mocked API behavior using unittest.mock
  • Includes visual comparison of UI states using before/after screenshots
  • These tests are self-contained and can run without any cloud or AWS dependency.

Requirements

  • Make sure the following Python libraries are installed:
  • pytest – for running tests
  • Pillow (PIL) – for image handling
  • requests – for HTTP requests
  • unittest.mock – for mocking API responses
  • Test image files placed in the same directory as the script

Updates are welcome :)

abrichr added 23 commits March 29, 2025 14:44
Updates the demo script to loop through planning and simulating actions
on the synthetic UI (type, click). Includes goal completion check via LLM.
- Implements CloudWatch Alarm (CPU-based) for auto-shutdown.
- Fixes deployment errors (gpg, lambda env var).
- Refactors client init to use deployed IP directly.
- Enables successful deployment via test script.
- Refactored omnimcp.py to use OmniParserClient (resolves core test import).
- Renamed test_synthetic_ui.py -> synthetic_ui_helpers.py and updated imports.
- Commented out TestParserDeployment in test_omnimcp.py (TODO: Fix fixture).
- Marked test in test_omniparser_e2e.py as skipped (TODO: Fix connection/logic).
- Marked tests in test_omnimcp_core.py as skipped (TODO: Update mocking).

This allows CI to pass on basic tests and unblocks work on response mapping.
This merges the infrastructure for automated OmniParser server deployment on EC2, including a fix for reliable inactivity-based auto-shutdown using CloudWatch Alarms.

Key additions:
- EC2 instance provisioning, configuration (Docker install), and deployment logic for OmniParser container (`omnimcp/omniparser/server.py`).
- CloudWatch Alarm / Lambda setup for inactivity-based auto-shutdown (replaces previous flawed `rate()` trigger).
- Client logic (`omnimcp/omniparser/client.py`) updated for auto-deployment triggering and reliable initialization after deployment.
- Foundational unit tests for core logic and simulation (`tests/test_core.py`, `tests/synthetic_ui_helpers.py`), which pass.
- End-to-end test structure and files (`tests/test_omnimcp.py`, `tests/test_omniparser_e2e.py`, `tests/test_omnimcp_core.py`).
- Test configuration (`tests/conftest.py`) for managing e2e tests.
- GitHub Actions CI workflow (`.github/workflows/ci.yml`) using `uv` for linting and running passing tests.
- Associated dependency updates in `pyproject.toml`.
- Updated README with badges, demo GIF, setup instructions.

Status:
- Core deployment and client initialization verified via `test_deploy_and_parse.py`.
- Unit tests pass and are checked by CI.
- E2E tests and core tests involving client interaction are included but currently skipped/commented out due to API mismatches requiring refactoring (tracked separately).

Combines and supersedes work from PRs OpenAdaptAI#11 and OpenAdaptAI#12.
@abrichr
Copy link
Member

abrichr commented Apr 3, 2025

Thanks for the contribution, @gabrielbugarija ! Good initiative getting an end-to-end test running locally.

This isn't ready to merge for a few reasons:

  • The test mocks out the actual parse_image logic, so it's not really exercising the integration path.
  • The file should be placed under tests/, following the existing structure.
  • The image diff logic using ImageChops is fragile and wouldn't hold up in more complex scenarios.
  • The naming convention for the file (test_parseImage_local.py) doesn't follow standard Python or project conventions. Stick to lowercase with underscores, e.g. test_parse_image_integration.py.
  • Hardcoding image paths and relying on manually-placed files isn't scalable. Use fixtures or temp files where possible.

Feel free to revise this PR or propose a different issue to tackle after exams. I appreciate the effort and am happy to review future contributions.

@abrichr abrichr closed this Apr 3, 2025
@abrichr abrichr reopened this Apr 3, 2025
abrichr and others added 20 commits April 4, 2025 21:36
- Extracts the core perceive-plan-act loop from the deprecated demo.py into a reusable AgentExecutor class in omnimcp/agent_executor.py.
- Introduces cli.py as the new primary command-line entry point using python-fire.
- Adds unit tests for AgentExecutor using pytest and mocks.
- Removes the platform parameter from core.plan_action_for_ui, simplifying the planner interface.
- Updates README.md to reflect the new architecture, cli usage, and roadmap.
- Adds pyobjc-framework-Cocoa as a conditional dependency for macOS in pyproject.toml.
- Removes the deprecated demo.py script.
Mocks the take_screenshot call within AgentExecutor tests to prevent failures in headless CI environments due to missing $DISPLAY.
- Moves VisualState class to its own file (omnimcp/visual_state.py).
- Renames omnimcp/omnimcp.py to omnimcp/mcp_server.py.
- Refactors mcp_server.py to instantiate FastMCP at module level, use mcp.run(), and require OMNIPARSER_URL via config, fixing connection errors with `mcp dev`.
- Updates imports and README architecture section for file changes.
- Modifies MCP tools (click_element, type_text) to remove second state update/verification to mitigate timeouts (though underlying performance issues remain).
- Moves VisualState class to its own file (omnimcp/visual_state.py).
- Renames omnimcp/omnimcp.py to omnimcp/mcp_server.py.
- Refactors mcp_server.py to instantiate FastMCP at module level, use mcp.run(), and require OMNIPARSER_URL via config, fixing connection errors with `mcp dev`.
- Updates imports and README architecture section for file changes.
- Modifies MCP tools (click_element, type_text) to remove second state update/verification to mitigate timeouts (though underlying performance issues remain).
- Updates tests
…PIC_DEFAULT_MODEL; restore comment; setup_run_logging; update .gitignore
- add ci_mode to cli.py
- save log to run dir
- add config.LOG_DIR/RUN_OUTPUT_DIR/ANTHROPIC_DEFAULT_MODEL
- restore comment
- add setup_run_logging
- update .gitignore
- Adds OMNIPARSER_DOWNSAMPLE_FACTOR config variable to allow scaling
  screenshots before parsing (default 1.0). Implemented via new
  utils.downsample_image function called from visual_state.py.
- Adds LLM_PROVIDER and ANTHROPIC_DEFAULT_MODEL config variables.
- Updates completions.py to use config, add tenacity retries, and
  restore/use format_chat_messages (guarded by new DEBUG_FULL_PROMPTS config flag).
- Fixes test failures in test_visual_state.py related to mock data and patches.

Agent planning sequence for calculator task now appears correct with Sonnet
and downsample_factor=1.0, although perception (~7-11s) and planning
(~5-9s) steps remain slow. Downsampling factor < 1.0 can be enabled via
.env for performance testing, but may affect accuracy.
- Adds OMNIPARSER_DOWNSAMPLE_FACTOR config variable to allow scaling
  screenshots before parsing (default 1.0). Implemented via new
  utils.downsample_image function called from visual_state.py.
- Adds LLM_PROVIDER and ANTHROPIC_DEFAULT_MODEL config variables.
- Updates completions.py to use config, add tenacity retries, and
  restore/use format_chat_messages (guarded by new DEBUG_FULL_PROMPTS config flag).
- Fixes test failures in test_visual_state.py related to mock data and patches.

Agent planning sequence for calculator task now appears correct with Sonnet
and downsample_factor=1.0, although perception (~7-11s) and planning
(~5-9s) steps remain slow. Downsampling factor < 1.0 can be enabled via
.env for performance testing, but may affect accuracy.
- Defines Pydantic models (ElementTrack, ScreenAnalysis, ActionDecision, LoggedStep) in types.py based on Issue OpenAdaptAI#8 design.
- Implements SimpleElementTracker skeleton in tracking.py with fixed update logic for misses/pruning. Matching logic (_match_elements) remains placeholder.
- Adds basic passing unit tests for SimpleElementTracker in tests/test_tracking.py.
- Integrates metrics collection (step times, counts, etc.) and structured JSONL logging (LoggedStep format) into AgentExecutor.
- Adds scipy and numpy as dependencies.

This lays the groundwork for implementing robust element tracking (Issue OpenAdaptAI#8) and addresses the need for improved observability and data logging.
Fixes issues on an already submitted script. Current script allows for more versatile test sets and it overall more applicable.
@gabrielbugarija gabrielbugarija changed the base branch from main to feat/tests June 4, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants