Skip to content

Conversation

llbbl
Copy link

@llbbl llbbl commented Sep 1, 2025

Set up comprehensive Python testing infrastructure

Summary

This PR establishes a complete testing infrastructure for the ML/NLP projects collection, providing a solid foundation for writing and running tests across all modules (chatbot, embeddings, machine translation, pos tagging, sentiment analysis, and text generation).

Changes Made

Package Management

  • Poetry Configuration: Set up pyproject.toml with Poetry as the package manager
  • Dependencies: Migrated and organized dependencies including:
    • Production: TensorFlow 2.13+, PyTorch 2.0+, NumPy, PyYAML, Requests
    • Testing: pytest, pytest-cov, pytest-mock as dev dependencies

Testing Configuration

  • pytest Setup: Comprehensive pytest configuration with:
    • Test discovery patterns for test_*.py and *_test.py
    • Coverage reporting with 80% threshold requirement
    • HTML and XML coverage reports (htmlcov/, coverage.xml)
    • Custom markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.slow

Directory Structure

tests/
├── __init__.py
├── conftest.py              # Shared fixtures
├── test_infrastructure.py   # Validation tests
├── unit/
│   └── __init__.py
└── integration/
    └── __init__.py

Testing Fixtures (conftest.py)

Comprehensive set of ML/NLP-focused fixtures:

  • File System: temp_dir, temp_checkpoint_dir, sample_text_file
  • ML Frameworks: sample_tensorflow_tensor, sample_torch_tensor, sample_numpy_array
  • Configurations: sample_config, mock_model_config, sample_yaml_config
  • Data: sample_text_data, sample_dataset_info, small_batch_data, mock_tokenizer
  • Reproducibility: reset_random_seeds (auto-applied to all tests)

Additional Improvements

  • Updated .gitignore: Added testing artifacts, Claude settings, model files, and IDE configurations
  • Validation Tests: Created test_infrastructure.py with 16 tests verifying all components work correctly

Running Tests

Basic Commands

# Run all tests
poetry run pytest

# Run with verbose output
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_infrastructure.py

# Run with coverage (default behavior)
poetry run pytest --cov

# Run only unit tests
poetry run pytest -m unit

# Run only integration tests  
poetry run pytest -m integration

# Skip slow tests
poetry run pytest -m "not slow"

Coverage Reports

  • Terminal: Coverage summary displayed after test run
  • HTML: Detailed report generated in htmlcov/index.html
  • XML: Machine-readable report in coverage.xml

Verification

The infrastructure has been validated with comprehensive tests covering:

  • ✅ Basic pytest functionality
  • ✅ All fixtures are working correctly
  • ✅ TensorFlow, PyTorch, and NumPy integration
  • ✅ Test markers and organization
  • ✅ Temporary file handling
  • ✅ Random seed consistency for reproducible tests
  • ✅ Coverage tracking setup

Next Steps

Developers can now:

  1. Write unit tests in tests/unit/ for individual functions and classes
  2. Write integration tests in tests/integration/ for module interactions
  3. Use fixtures from conftest.py for common test data and configurations
  4. Run tests with poetry run pytest to ensure code quality
  5. Monitor coverage to maintain high test coverage standards

Notes

  • All dependencies are managed through Poetry and the lock file is committed
  • Coverage threshold is set to 80% - failing tests will report coverage warnings
  • The infrastructure supports both TensorFlow and PyTorch workflows
  • Random seeds are automatically reset for each test to ensure reproducibility

- Add Poetry package manager with pyproject.toml configuration
- Install core dependencies: TensorFlow, PyTorch, NumPy, PyYAML, Requests
- Add testing dependencies: pytest, pytest-cov, pytest-mock
- Create complete test directory structure (tests/, tests/unit/, tests/integration/)
- Configure pytest with coverage reporting (80% threshold, HTML/XML output)
- Add comprehensive conftest.py with ML-focused fixtures
- Update .gitignore for testing artifacts and ML model files
- Create validation tests to verify infrastructure functionality
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant