Update python testing pipeline scripts and services

galalqassas · galalqassas · commit 14df6cdad55a · 2026-03-28T14:32:12.000+02:00
diff --git a/src/extension/pythonTestingPipeline/README.md b/src/extension/pythonTestingPipeline/README.md
@@ -1,105 +1,73 @@
 # Python Automated Testing Pipeline
 
-A three-agent system for automated Python testing, security analysis, and coverage improvement.
+Current docs for the CLI, API, GUI, and extension-facing parts of the pipeline.
 
-## Overview
+## Flow
 
-This pipeline uses three specialized AI agents to ensure code quality:
-
-1.  **Identification Agent**: Finds test scenarios (edge cases, security, critical paths).
-2.  **Implementation Agent**: Generates PyTest scripts with security awareness.
-3.  **Evaluation Agent**: Runs tests, checks coverage (target 90%), and analyzes security.
-
-**Key Features:**
-
-- **Auto-Improvement**: Iteratively generates tests until coverage goals are met.
-- **Security Analysis**: Detects SQLi, XSS, secrets, and more.
-- **Robustness**: Auto-fixes syntax errors, rotates API keys, and handles rate limits.
-
-## Architecture
-
-```
-┌─────────────────────────────────────────────────────────────────────────┐
-│                     Python Testing Pipeline                              │
-├─────────────────────────────────────────────────────────────────────────┤
-│   ┌──────────────────┐    JSON     ┌──────────────────┐                │
-│   │   Identification │ ─────────▶ │  Human Approval  │                 │
-│   │      Agent       │            │     (Review)     │                 │
-│   └──────────────────┘            └────────┬─────────┘                 │
-│            │                               ▼                            │
-│            │                      ┌──────────────────┐                 │
-│            │                      │  Implementation  │                 │
-│            │                      │      Agent       │                 │
-│            │                      └────────┬─────────┘                 │
-│            │                               ▼                            │
-│            │                      ┌──────────────────┐                 │
-│            │                      │    Evaluation    │◀────────────┐   │
-│            │                      │   + Security     │             │   │
-│            │                      └────────┬─────────┘             │   │
-│            │                               │ Coverage < 90%?       │   │
-│            │                               ▼         Yes           │   │
-│            │                      ┌──────────────────┐             │   │
-│            │                      │  Generate More   │─────────────┘   │
-│            │                      │     Tests        │                 │
-│            │                      └──────────────────┘                 │
-└─────────────────────────────────────────────────────────────────────────┘
-```
-
-## Usage
-
-### VS Code Integration
-
-Use the command in Copilot Chat:
-
-```
-@workspace /generatePythonTests ./my_project
-```
-
-### CLI Usage
-
-Run the standalone script:
-
-```bash
-# Basic usage
-python pythonTestingPipeline.py ./my_project
-
-# Common options
-python pythonTestingPipeline.py ./my_project --coverage       # Measure coverage
-python pythonTestingPipeline.py ./my_project --auto-approve   # Skip manual review
-python pythonTestingPipeline.py ./my_project --no-run-tests   # Generate only
+```text
+Identify -> Approve or refine -> Implement -> Run tests -> Evaluate
+                                      ^                         |
+                                      |------ improve loop -----|
+Artifacts: tests, prompts, report, governance, coverage report
 ```
 
-## Configuration
-
-**Requirements:**
-
-- Python 3.10+
-- `pip install pytest pytest-cov openai matplotlib`
-- VS Code + GitHub Copilot (for extension usage)
-
-**LLM Setup:**
-Configure `scripts/llm_config.py` and `scripts/.env`.
-
-- **Keys**: `GROQ_API_KEY`, `GROQ_API_KEY_1`, etc. (auto-rotates on 429 errors).
-- **Models**: Defaults to `openai/gpt-oss-120b`, falls back to `groq/compound`, `llama`, etc.
-
-## Agents & Communication
-
-Agents communicate via JSON.
-
-- **Identification**: Outputs `test_scenarios` (description, priority).
-- **Implementation**: Receives scenarios, outputs raw PyTest code.
-- **Evaluation**: Outputs `execution_summary`, `code_coverage_percentage`, and `security_issues`.
-
-**Security Checks:**
-The pipeline flags **Critical** to **Low** severity issues including:
-
-- SQL/Command Injection & XSS
-- Path Traversal & Data Exposure
-- Weak Authentication & Hardcoded Secrets
-
-## Contributing
-
-1.  Follow existing patterns.
-2.  Add unit tests (`npm run test:unit`).
-3.  Ensure TypeScript compilation passes.
+## Entry Points
+
+- CLI
+  Run:
+  `python src/extension/pythonTestingPipeline/scripts/pythonTestingPipeline.py <codebase_path>`
+  Common options: `--auto-approve`, `--no-run-tests`, `--output-dir`, `--model`.
+  `--coverage` and `--run-tests` are compatibility flags.
+- API
+  Run:
+  `uvicorn src.extension.api.main:app --reload`
+- GUI
+  Run:
+  `python -m src.extension.GUI.main`
+- VS Code extension
+  The command palette action is `Agentic Testing: Generate Tests`
+  (`agentic-testing.generateTests`).
+  It currently handles folder selection and progress UI, but it does not yet run
+  the full end-to-end pipeline.
+- Internal model tools
+  `generatePythonTests`, `implementPythonTests`, and `evaluatePythonTests` are
+  internal tools, not public slash commands.
+
+## Outputs
+
+Default output location: `<codebase_path>/tests`, unless `--output-dir` is set.
+
+Generated artifacts may include:
+- `test_generated_<timestamp>.py`
+- `prompts_<run_id>.json`
+- `report_<run_id>.md`
+- `governance_<run_id>.json`
+- `coverage_report_<run_id>.json`
+
+## Runtime Notes
+
+- Python 3.10+ is the practical baseline for the code in `src/extension`.
+- Pipeline-related Python dependencies live in `requirements.txt`.
+- `scripts/llm_config.py` is the source of truth for model ordering.
+- Current model selection prefers Ollama-hosted models first, then Groq-backed
+  fallbacks.
+- `GROQ_API_KEY`, `GROQ_API_KEY_1`, and similar variables are rotated when the
+  client needs another key.
+- Safety checks are implemented in `scripts/prompt_safety.py`.
+
+## Current Caveats
+
+- Coverage is effectively always collected when generated tests are executed.
+- API prompt-history discovery can legitimately return no runs.
+- API pipeline status is in-memory only.
+- The Python CLI is the most complete execution path today.
+
+## Keep In Sync
+
+When updating docs here, cross-check:
+- `scripts/pythonTestingPipeline.py`
+- `scripts/llm_config.py`
+- `src/extension/api/main.py`
+- `src/extension/api/schemas.py`
+- `package.json`
+- `src/extension.ts`
diff --git a/src/extension/pythonTestingPipeline/common/pythonTestingPipelineService.ts b/src/extension/pythonTestingPipeline/common/pythonTestingPipelineService.ts
@@ -47,10 +47,11 @@ export interface IPythonTestingPipelineService {
 	identifyTestScenarios(codebasePath: string, targetFiles: readonly string[] | undefined, token: CancellationToken): Promise<ITestScenariosOutput>;
 
 	/**
-	 * Requests human approval for identified scenarios.
+	 * Approval hook for identified scenarios.
+	 * The current node-side implementation is non-interactive and returns the scenarios unchanged.
 	 * @param scenarios The identified test scenarios
 	 * @param token Cancellation token
-	 * @returns Approved scenarios (may be modified by user)
+	 * @returns Approved scenarios
 	 */
 	requestApproval(scenarios: ITestScenariosOutput, token: CancellationToken): Promise<ITestScenariosOutput>;
 
diff --git a/src/extension/pythonTestingPipeline/common/types.ts b/src/extension/pythonTestingPipeline/common/types.ts
@@ -105,7 +105,10 @@ export interface IPythonTestingPipelineOptions {
 	readonly testOutputDir?: string;
 	/** Optional: whether to run tests automatically after generation. Defaults to true. */
 	readonly autoRunTests?: boolean;
-	/** Optional: collect coverage data when running tests. Defaults to true. */
+	/**
+	 * Optional compatibility flag for callers that want to express a coverage preference.
+	 * The current node-side pipeline implementation still collects coverage whenever tests run.
+	 */
 	readonly collectCoverage?: boolean;
 	/** Optional: target coverage percentage (default: 90) */
 	readonly targetCoverage?: number;
diff --git a/src/extension/pythonTestingPipeline/node/pythonTestingPipelineService.ts b/src/extension/pythonTestingPipeline/node/pythonTestingPipelineService.ts
@@ -186,14 +186,14 @@ export class PythonTestingPipelineService implements IPythonTestingPipelineServi
 	}
 
 	/**
-	 * Requests human approval for identified scenarios.
-	 * In a real implementation, this would show a UI for user interaction.
+	 * Placeholder approval step.
+	 * This implementation currently returns the identified scenarios unchanged.
 	 */
 	async requestApproval(
 		scenarios: ITestScenariosOutput,
 		_token: CancellationToken
 	): Promise<ITestScenariosOutput> {
-		// Placeholder - in a real implementation, show UI dialog
+		// Placeholder - interactive approval is not wired into the node service yet.
 		return scenarios;
 	}
 
diff --git a/src/extension/pythonTestingPipeline/scripts/pythonTestingPipeline.py b/src/extension/pythonTestingPipeline/scripts/pythonTestingPipeline.py
@@ -3,11 +3,13 @@
 Python Automated Testing Pipeline
 
 Usage:
-    python pythonTestingPipeline.py <codebase_path> [--coverage] [--auto-approve] \\
-        [--no-run-tests]
+    python pythonTestingPipeline.py <codebase_path> [--auto-approve] \\
+        [--no-run-tests] [--output-dir <dir>] [--model <model>]
 
 Note:
     Generated tests are run by default unless --no-run-tests is supplied.
+    The --coverage flag is accepted for compatibility, but coverage is already
+    collected automatically when tests are executed.
 
 Example:
     python pythonTestingPipeline.py ./my_project --auto-approve
@@ -475,6 +477,8 @@ def run_pipeline(
     ) -> dict:
         """
         Runs the complete testing pipeline.
+
+        Coverage is collected automatically whenever test execution is enabled.
         """
         print("=" * 60)
         print("🚀 Python Automated Testing Pipeline")
@@ -969,7 +973,14 @@ def main():
         action="store_true",
         help="Do not run generated tests",
     )
-    parser.add_argument("--coverage", action="store_true", help="Collect coverage")
+    parser.add_argument(
+        "--coverage",
+        action="store_true",
+        help=(
+            "Compatibility flag; coverage is already collected automatically "
+            "when tests run"
+        ),
+    )
     parser.add_argument(
         "--auto-approve", action="store_true", help="Auto-approve scenarios"
     )