End to end local Integration test for parseImage function: See issue #1 #15

gabrielbugarija · 2025-03-28T15:44:55Z

NOTE: The Test is local, and without an AWS key and endpoint, spinup and spindown could not be implemented.

Introduces a runnable demo (`demo.py`) proving the concept of using an LLM to plan a UI action based on a user goal and mocked visual elements (generated by `synthetic_ui.py`). Includes: - Core planning logic and prompting (`core.py`) - Anthropic API integration (`completions.py`) - Pydantic structured output for LLM response - Visualization of the target UI element

NOTE: The Test is local, and without an AWS key and endpoint, spinup and spindown could not be implemented.

gabrielbugarija · 2025-03-28T15:49:39Z

Overview
This PR adds an integration test suite for the parse_image(...) function. The tests are designed to run entirely on a local machine and simulate end-to-end image parsing behavior using either a real API or a mocked one.

Test Capabilities

Uses local test images stored in the project directory
Encodes images in base64 and sends simulated API POST requests
Validates the structure and content of responses (e.g., checks for "segments")
Supports mocked API behavior using unittest.mock
Includes visual comparison of UI states using before/after screenshots
These tests are self-contained and can run without any cloud or AWS dependency.

Requirements

Make sure the following Python libraries are installed:
pytest – for running tests
Pillow (PIL) – for image handling
requests – for HTTP requests
unittest.mock – for mocking API responses
Test image files placed in the same directory as the script

Updates are welcome :)

Updates the demo script to loop through planning and simulating actions on the synthetic UI (type, click). Includes goal completion check via LLM.

- Implements CloudWatch Alarm (CPU-based) for auto-shutdown. - Fixes deployment errors (gpg, lambda env var). - Refactors client init to use deployed IP directly. - Enables successful deployment via test script.

- Refactored omnimcp.py to use OmniParserClient (resolves core test import). - Renamed test_synthetic_ui.py -> synthetic_ui_helpers.py and updated imports. - Commented out TestParserDeployment in test_omnimcp.py (TODO: Fix fixture). - Marked test in test_omniparser_e2e.py as skipped (TODO: Fix connection/logic). - Marked tests in test_omnimcp_core.py as skipped (TODO: Update mocking). This allows CI to pass on basic tests and unblocks work on response mapping.

This merges the infrastructure for automated OmniParser server deployment on EC2, including a fix for reliable inactivity-based auto-shutdown using CloudWatch Alarms. Key additions: - EC2 instance provisioning, configuration (Docker install), and deployment logic for OmniParser container (`omnimcp/omniparser/server.py`). - CloudWatch Alarm / Lambda setup for inactivity-based auto-shutdown (replaces previous flawed `rate()` trigger). - Client logic (`omnimcp/omniparser/client.py`) updated for auto-deployment triggering and reliable initialization after deployment. - Foundational unit tests for core logic and simulation (`tests/test_core.py`, `tests/synthetic_ui_helpers.py`), which pass. - End-to-end test structure and files (`tests/test_omnimcp.py`, `tests/test_omniparser_e2e.py`, `tests/test_omnimcp_core.py`). - Test configuration (`tests/conftest.py`) for managing e2e tests. - GitHub Actions CI workflow (`.github/workflows/ci.yml`) using `uv` for linting and running passing tests. - Associated dependency updates in `pyproject.toml`. - Updated README with badges, demo GIF, setup instructions. Status: - Core deployment and client initialization verified via `test_deploy_and_parse.py`. - Unit tests pass and are checked by CI. - E2E tests and core tests involving client interaction are included but currently skipped/commented out due to API mismatches requiring refactoring (tracked separately). Combines and supersedes work from PRs OpenAdaptAI#11 and OpenAdaptAI#12.

abrichr · 2025-04-03T02:10:00Z

Thanks for the contribution, @gabrielbugarija ! Good initiative getting an end-to-end test running locally.

This isn't ready to merge for a few reasons:

The test mocks out the actual parse_image logic, so it's not really exercising the integration path.
The file should be placed under tests/, following the existing structure.
The image diff logic using ImageChops is fragile and wouldn't hold up in more complex scenarios.
The naming convention for the file (test_parseImage_local.py) doesn't follow standard Python or project conventions. Stick to lowercase with underscores, e.g. test_parse_image_integration.py.
Hardcoding image paths and relying on manually-placed files isn't scalable. Use fixtures or temp files where possible.

Feel free to revise this PR or propose a different issue to tackle after exams. I appreciate the effort and am happy to review future contributions.

- Extracts the core perceive-plan-act loop from the deprecated demo.py into a reusable AgentExecutor class in omnimcp/agent_executor.py. - Introduces cli.py as the new primary command-line entry point using python-fire. - Adds unit tests for AgentExecutor using pytest and mocks. - Removes the platform parameter from core.plan_action_for_ui, simplifying the planner interface. - Updates README.md to reflect the new architecture, cli usage, and roadmap. - Adds pyobjc-framework-Cocoa as a conditional dependency for macOS in pyproject.toml. - Removes the deprecated demo.py script.

Mocks the take_screenshot call within AgentExecutor tests to prevent failures in headless CI environments due to missing $DISPLAY.

- Moves VisualState class to its own file (omnimcp/visual_state.py). - Renames omnimcp/omnimcp.py to omnimcp/mcp_server.py. - Refactors mcp_server.py to instantiate FastMCP at module level, use mcp.run(), and require OMNIPARSER_URL via config, fixing connection errors with `mcp dev`. - Updates imports and README architecture section for file changes. - Modifies MCP tools (click_element, type_text) to remove second state update/verification to mitigate timeouts (though underlying performance issues remain).

- Moves VisualState class to its own file (omnimcp/visual_state.py). - Renames omnimcp/omnimcp.py to omnimcp/mcp_server.py. - Refactors mcp_server.py to instantiate FastMCP at module level, use mcp.run(), and require OMNIPARSER_URL via config, fixing connection errors with `mcp dev`. - Updates imports and README architecture section for file changes. - Modifies MCP tools (click_element, type_text) to remove second state update/verification to mitigate timeouts (though underlying performance issues remain). - Updates tests

…PIC_DEFAULT_MODEL; restore comment; setup_run_logging; update .gitignore

- add ci_mode to cli.py - save log to run dir - add config.LOG_DIR/RUN_OUTPUT_DIR/ANTHROPIC_DEFAULT_MODEL - restore comment - add setup_run_logging - update .gitignore

- Adds OMNIPARSER_DOWNSAMPLE_FACTOR config variable to allow scaling screenshots before parsing (default 1.0). Implemented via new utils.downsample_image function called from visual_state.py. - Adds LLM_PROVIDER and ANTHROPIC_DEFAULT_MODEL config variables. - Updates completions.py to use config, add tenacity retries, and restore/use format_chat_messages (guarded by new DEBUG_FULL_PROMPTS config flag). - Fixes test failures in test_visual_state.py related to mock data and patches. Agent planning sequence for calculator task now appears correct with Sonnet and downsample_factor=1.0, although perception (~7-11s) and planning (~5-9s) steps remain slow. Downsampling factor < 1.0 can be enabled via .env for performance testing, but may affect accuracy.

- Defines Pydantic models (ElementTrack, ScreenAnalysis, ActionDecision, LoggedStep) in types.py based on Issue OpenAdaptAI#8 design. - Implements SimpleElementTracker skeleton in tracking.py with fixed update logic for misses/pruning. Matching logic (_match_elements) remains placeholder. - Adds basic passing unit tests for SimpleElementTracker in tests/test_tracking.py. - Integrates metrics collection (step times, counts, etc.) and structured JSONL logging (LoggedStep format) into AgentExecutor. - Adds scipy and numpy as dependencies. This lays the groundwork for implementing robust element tracking (Issue OpenAdaptAI#8) and addresses the need for improved observability and data logging.

Fixes issues on an already submitted script. Current script allows for more versatile test sets and it overall more applicable.

abrichr and others added 6 commits March 25, 2025 19:09

autoshutdown working; discovery warning

40eb170

just use boto3 for discovery

d2361dd

INACTIVITY_TIMEOUT_MINUTES = 60

bdfda9a

add demo_output/login_screen.png, login_screen_highlighted.png

e95203a

End to end local Integration test for parseImage function: See issue #1

ea652f1

NOTE: The Test is local, and without an AWS key and endpoint, spinup and spindown could not be implemented.

abrichr added 23 commits March 29, 2025 14:44

feat(demo): Add multi-step planning and simulation

bbd4ecf

Updates the demo script to loop through planning and simulating actions on the synthetic UI (type, click). Includes goal completion check via LLM.

add demo_output_multistep

4b01bb0

feat(demo): Add dimming and text annotation to highlights

2a936e5

Add tests; ci.yml

aaf7887

ruff

af59011

ci.yml: uv venv

d8b7bbb

pyproject.toml: Only include the main package source directory

5c8a004

uv add ruff

36a1f87

ruff

c7f195e

feat(demo): Add MVP Demo for LLM UI Action Planning

bc1acbc

documentation

41c9230

Merge infrastructure: EC2 deploy + auto-shutdown

f15270c

Merge tests: Unit and E2E tests for deployment and core

13f4461

wip

e3daa78

wip

06b73df

fix(omniparser): Correct deployment, auto-shutdown, and client init

0e2c419

- Implements CloudWatch Alarm (CPU-based) for auto-shutdown. - Fixes deployment errors (gpg, lambda env var). - Refactors client init to use deployed IP directly. - Enables successful deployment via test script.

ruff

18632ea

ruff

8e58552

ruff format, ruff check

189ec97

feat: Add demo GIF, generation script, and update README

0b77c78

update README

1610d22

abrichr added 6 commits April 1, 2025 13:36

fix test_omnimcp_core.py

8771b47

remove test_deploy_screenshot.png

1ce041a

add demo_synthetic.py

ecdc363

replace make_gif.sh with make_gif.py; update README

edcb55e

feat(agent): Implement working multi-step calculator demo loop

35a2bdc

fix omnimcp_demo.gif

1776cdf

abrichr closed this Apr 3, 2025

abrichr reopened this Apr 3, 2025

abrichr and others added 20 commits April 4, 2025 21:36

fix(tests): Mock take_screenshot in AgentExecutor tests for CI

a8c54db

Mocks the take_screenshot call within AgentExecutor tests to prevent failures in headless CI environments due to missing $DISPLAY.

refactor: Introduce AgentExecutor for core loop orchestration

fd224b0

update README.md

0a5a2a9

fix tests

8683e59

fix tests

e16496d

fix cli.py; save log to run dir; config.LOG_DIR/RUN_OUTPUT_DIR/ANTHRO…

8a58cd6

…PIC_DEFAULT_MODEL; restore comment; setup_run_logging; update .gitignore

fix smoke test

2d5b912

delay imports in cli.py; add ci_mode arg

1e745b5

fix(cli): add ci_mode to cli.py

0763da3

- add ci_mode to cli.py - save log to run dir - add config.LOG_DIR/RUN_OUTPUT_DIR/ANTHROPIC_DEFAULT_MODEL - restore comment - add setup_run_logging - update .gitignore

ruff

c716ea9

add DEBUG_FULL_PROMPTS to .env.example

ed8c77f

Merge branch 'OpenAdaptAI:main' into main

6246521

Delete test_parseImage_local.py

2bff659

Updated end-end parse image testing

95d6123

Fixes issues on an already submitted script. Current script allows for more versatile test sets and it overall more applicable.

gabrielbugarija changed the base branch from main to feat/tests June 4, 2025 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

End to end local Integration test for parseImage function: See issue #1 #15

End to end local Integration test for parseImage function: See issue #1 #15

Uh oh!

gabrielbugarija commented Mar 28, 2025

Uh oh!

gabrielbugarija commented Mar 28, 2025

Uh oh!

abrichr commented Apr 3, 2025

Uh oh!

Uh oh!

End to end local Integration test for parseImage function: See issue #1 #15

Are you sure you want to change the base?

End to end local Integration test for parseImage function: See issue #1 #15

Uh oh!

Conversation

gabrielbugarija commented Mar 28, 2025

Uh oh!

gabrielbugarija commented Mar 28, 2025

Uh oh!

abrichr commented Apr 3, 2025

Uh oh!

Uh oh!