This project is setup using the uv
package manager. To run the project, you have to install uv
and then run the below commands.
uv venv
uv sync
Before installing any package you need to make sure to activate the environment, you can do this by running
Mac OS / Linux:
source .venv/bin/activate
Windows:
.venv/Scripts/activate
This section outlines the guidelines for contributing to the Global Classifier project. Please read through these before submitting any changes.
The project is organized into several key directories to maintain clarity and modularity:
configs/
: Holds global configuration files essential for different parts of the project.docs/
: Contains all project documentation, including architectural diagrams (e.g.,classifier-architecture.drawio
), setup guides, technical explanations, and usage manuals.DSL/
: Contains components related to DSLs belonging different to BYK stack services.GUI/
: Contains the source code, assets, and build configurations for the project's Graphical User Interface.local-classifier/
: A copy of the local-classifier repo for module re-use purposes. Will be discarded after initial releasesrc/
: Contains the core source code for the Global Classifier. This is further divided into modules for specific functionalities like:dataset-generation/
: Scripts and tools for creating and preparing datasets.inference/
: Code related to running model predictions.model-training/
: Scripts and notebooks for training machine learning models.tests/
: Unit, integration, or end-to-end tests for thesrc/
components.
Understanding this structure will help you locate relevant files and understand the project's architecture.
We use Ruff for linting Python code to ensure consistency and catch potential errors early. Ruff is an extremely fast Python linter and formatter, written in Rust.
How Ruff Works (Example):
Consider the following Python code snippet which has a few style issues:
import os,sys # Multiple imports on one line
def process_data(data, unused_param): # Unused function parameter
print ("Processing") # Print statement with extra space
if data is not None:
return True
else:
return False
When you run Ruff on this code (e.g., ruff check .
or ruff format . --check
), it will flag these issues:
- An error for multiple imports on one line (
import os,sys
). Ruff would suggestimport os; import sys
or separate lines. - An error for the
unused_param
not being used within theprocess_data
function. - Formatting issues might also be flagged if
ruff format
is used or its rules are enabled inruff check
.
All Python contributions must be free of Ruff linting errors. You can check your code by running ruff check .
and ruff format .
in the relevant directory.
This project uses uv as the primary package manager for Python dependencies. uv
is a fast Python package installer and resolver, designed as a drop-in replacement for pip
and pip-tools
.
You will typically use uv
to manage virtual environments and install dependencies listed in requirements.txt
files found within various modules (especially in the local-classifier/
subdirectories and src/
).
Example command to create a virtual environment and install dependencies for a module:
uv venv # Create a virtual environment in .venv
uv pip install -r requirements.txt # Install dependencies
Ensure your development environment is set up using uv
for consistency.
If suppose you have already created your environment using any other framework like conda
or venv
, then simply create a new uv
project and copy your existing code into the project while making sure no path references are broken.
To maintain a high standard of code quality and ensure project stability, the following practices are enforced:
- Ruff Linting is Mandatory: All submitted Python code must pass Ruff linting checks.
- Build Success: Automated builds (e.g., via GitHub Actions) will only succeed if all checks, including Ruff linting, pass. Pull requests with failing checks will not be merged.
Please run Ruff locally to check your code before pushing changes or creating a pull request. This helps streamline the review process and maintain a clean codebase.
The project follows a three-tier branching workflow to streamline development, testing, and integration.
- wip (work in progress): Primary branch for ongoing work. All new features and fixes are merged here first.
- testing: Integration branch where code from WIP is validated by automated tests and QA.
- dev: Development-ready branch. Code that passes testing is merged here for further staging or release processes.
- Fork the repository and clone it locally.
- Create a new feature/fix branch based off
wip
. - Make your changes, run Ruff linting and formatting, commit your changes, and ensure all checks pass.
- Push your branch to the remote and open a Pull Request targeting
wip
. - After review approval, maintainers merge your changes into
testing
. - Automated tests and QA are executed on
testing
. - Once testing is successful, maintainers merge
testing
intodev
. - From
dev
, code may proceed through further release pipelines or staging environments.
All Python modules in this project require comprehensive unit tests. Follow these guidelines when writing tests:
- Test Framework: Use
pytest
for all Python unit tests. - Test Location: Place tests in the
src/tests/
directory, mirroring the structure of the module being tested. - Naming Convention: Name test files with the
test_
prefix (e.g.,test_classifier.py
). - Coverage: Aim for at least 80% code coverage for all modules.
- Test Isolation: Each test should be independent and not rely on the state of other tests.
Example of a well-structured test:
import pytest
from src.inference.classifier import classify_text
def test_classify_text_empty_input():
"""Test classification behavior with empty input."""
result = classify_text("")
assert result == "unknown"
def test_classify_text_valid_input():
"""Test classification with valid sample text."""
sample = "This is a sample technical query about databases."
result = classify_text(sample)
assert result in ["database", "technical"]
All frontend components in the GUI directory require automated tests using Playwright:
-
Test Directory: Place Playwright tests in
GUI/tests/
. -
Coverage Requirements: Tests must cover:
- All critical user flows
- Component rendering
- State management
- Error handling scenarios
-
Multi-browser Testing: Tests should run against at least two majors browsers (Chrome and Firefox).
Example Playwright test structure:
import { test, expect } from '@playwright/test';
test.describe('Classifier UI', () => {
test('should display classification results correctly', async ({ page }) => {
await page.goto('/classifier');
await page.fill('#input-text', 'Sample query about Azure services');
await page.click('#classify-button');
// Check if results appear
const results = await page.locator('.classification-results');
await expect(results).toBeVisible();
// Verify correct classification appears
const category = await page.locator('.category-label').textContent();
expect(['cloud', 'azure']).toContain(category);
});
});
All tests must pass before PR approval and merge into the wip
branch.