fix: test infrastructure broken + added CI by suyashkumar102 · Pull Request #175 · KathiraveluLab/DREAMS

suyashkumar102 · 2026-04-07T16:55:32Z

I cloned the repository to run the test suite and noticed that several test files were failing immediately due to import errors. After investigating, it turned out that some dependencies (e.g., Flask, bson, networkx) were not properly included in the test environment.

To address this, I:

Split out a requirements-dev.txt to clearly define development/test dependencies
Added a conftest.py to centralize shared fixtures used across tests
Introduced a basic GitHub Actions workflow to run tests in CI

While making these changes, I also noticed that clustering.py was using print() statements instead of structured logging, which isn’t ideal for production code. I’ve replaced those with logging calls.

With these updates, tests now run successfully using:

pip install -r requirements-dev.txt
pytest tests/ -v

I also added a short README in the tests/ directory to document how to run the test suite, since that was previously missing.

Please let me know if this causes any issues on your end or if you’d prefer a different approach.

Merge dev to main

gemini-code-assist

Code Review

This pull request establishes a robust testing framework by introducing development dependencies, comprehensive documentation, and shared pytest fixtures. It also refines the keyword clustering logic with improved logging and cluster analysis. Feedback highlights opportunities to optimize test performance by mocking heavy AI models and database connections during setup, improving computational efficiency with NumPy vector operations, and ensuring cross-platform compatibility in the test documentation.

gemini-code-assist · 2026-04-07T16:57:35Z

tests/conftest.py

+    app = create_app(test_config=test_config)
+
+    # Mock MongoDB to avoid requiring a running MongoDB instance
+    # Individual tests can override this if they need real DB access
+    mock_mongo = MagicMock()
+    app.mongo = mock_mongo
+
+    yield app


The create_app function instantiates DreamsPipeline, which likely loads heavy AI models. Doing this for every test will significantly slow down the test suite. Additionally, MongoClient should be mocked before create_app is called to avoid any attempts to connect to a real database during initialization. Wrapping the setup in a patch context ensures these heavy dependencies are replaced with mocks during testing.

Suggested change

app = create_app(test_config=test_config)

# Mock MongoDB to avoid requiring a running MongoDB instance

# Individual tests can override this if they need real DB access

mock_mongo = MagicMock()

app.mongo = mock_mongo

yield app

with patch('dreamsApp.app.DreamsPipeline'), \

patch('dreamsApp.app.MongoClient'):

app = create_app(test_config=test_config)

# Mock MongoDB to avoid requiring a running MongoDB instance

# Individual tests can override this if they need real DB access

mock_mongo = MagicMock()

app.mongo = mock_mongo

yield app

gemini-code-assist · 2026-04-07T16:57:35Z

dreamsApp/core/extra/clustering.py

-        # Debug: Log the cluster labels to see how the data is being clustered
-        logger.debug(f"Cluster labels: {cluster_labels}")
+        unique_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
+        noise_count = sum(1 for label in cluster_labels if label == -1)


Since cluster_labels is a numpy array (returned by HDBSCAN), using a generator expression with sum() is less efficient than using numpy's built-in vector operations.

Suggested change

noise_count = sum(1 for label in cluster_labels if label == -1)

noise_count = np.count_nonzero(cluster_labels == -1)

gemini-code-assist · 2026-04-07T16:57:35Z

dreamsApp/core/extra/clustering.py

-        logger.debug(f"Cluster labels: {cluster_labels}")
+        unique_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
+        noise_count = sum(1 for label in cluster_labels if label == -1)
+        logger.info(f"HDBSCAN produced {unique_clusters} clusters for user {user_id} ({noise_count} noise points)")


Logging per-user clustering results at INFO level can lead to extremely verbose logs in production if the number of users is large. It is generally better to use DEBUG for per-item processing details and reserve INFO for high-level process summaries (like the one at line 60).

Suggested change

logger.info(f"HDBSCAN produced {unique_clusters} clusters for user {user_id} ({noise_count} noise points)")

logger.debug(f"HDBSCAN produced {unique_clusters} clusters for user {user_id} ({noise_count} noise points)")

gemini-code-assist · 2026-04-07T16:57:35Z

tests/conftest.py

+import pytest
+import os
+import tempfile
+from unittest.mock import MagicMock


Add patch to the imports to support mocking heavy dependencies during application setup in the app fixture.

Suggested change

from unittest.mock import MagicMock

from unittest.mock import MagicMock, patch

gemini-code-assist · 2026-04-07T16:57:36Z

tests/README.md

+
+# HTML report (opens in browser)
+pytest tests/ -v --cov=dreamsApp --cov-report=html
+open htmlcov/index.html


The open command is specific to macOS. To make the instructions cross-platform, consider using python -m webbrowser or simply instructing the user to open the file in their browser.

Suggested change

open htmlcov/index.html

python -m webbrowser htmlcov/index.html

ayusrjn · 2026-04-07T20:12:15Z

Hi @suyashkumar102,
Thank you for your contributions. However, the import error you mentioned has already been fixed by #148. PR #148 fixes all the test issues.
We are still in the early stages of the project. Current tests are just toy implementations. We don't need a CI pipeline just yet.
However, your logger changes are correct.
And one more thing, keep your PR directed to the dev branch.

suyashkumar102 · 2026-04-08T11:10:17Z

Hi @ayusrjn,

Thank you for taking the time to review this, I really appreciate the feedback!
My PR focuses on a different layer - the test infrastructure itself:

requirements-dev.txt: Separating test dependencies from production
conftest.py: Shared pytest fixtures for consistent test setup
.github/workflows/test.yml: Automated CI to catch issues early
tests/README.md: Documentation for running tests
clustering.py: Production-grade logging

Regarding CI, I saw @pradeeban mentioned the codebase is largely toy implementation right now, which makes sense for a research project. My thinking was that even early-stage projects benefit from automated testing to prevent regressions as more contributions take place.

Have retargeted this to the dev branch per your workflow.

Also, @pradeeban - would love to hear your thoughts on priorities. Happy to focus wherever it's most valuable.

ayusrjn · 2026-04-08T14:14:51Z

Hi @suyashkumar102,
The comment was more of a suggestion. The import error fixes I mentioned were based on your PR description, which noted that several tests were failing due to import errors.

Your changes are valid for the current state of the repo. We are still in the research stage, and the current architecture is most likely to be changed. Like, we might move away from Flask completely. Introducing the test setup right now for the current architecture might become an additional overhead for the GSoC'26 contributor.

I do agree with your CI point.

I think this still can be merged as it doesn't introduce any issue and the CI is useful. It might be worth focusing on making the codebase more flexible and modular first.

suyashkumar102 · 2026-04-08T15:30:29Z

@ayusrjn
Thanks so much for the context and support! Really appreciate the insight about focusing on making things more modular first, that's a great point and definitely the right approach for long term flexibility. Will keep that in mind for future contributions!

pradeeban and others added 2 commits April 4, 2026 20:10

Merge pull request KathiraveluLab#173 from KathiraveluLab/dev

98292ed

Merge dev to main

fix: test infrastructure broken, added CI and cleaned up debug code

d7ba557

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

refactor: optimize test performance and cross-platform compatibility

5a503ec

suyashkumar102 changed the base branch from main to dev April 8, 2026 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: test infrastructure broken + added CI#175

fix: test infrastructure broken + added CI#175
suyashkumar102 wants to merge 3 commits intoKathiraveluLab:devfrom
suyashkumar102:fix/test-infrastructure-and-logging

suyashkumar102 commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

ayusrjn commented Apr 7, 2026 •

edited

Loading

Uh oh!

suyashkumar102 commented Apr 8, 2026

Uh oh!

ayusrjn commented Apr 8, 2026

Uh oh!

suyashkumar102 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	noise_count = sum(1 for label in cluster_labels if label == -1)
	noise_count = np.count_nonzero(cluster_labels == -1)

	logger.info(f"HDBSCAN produced {unique_clusters} clusters for user {user_id} ({noise_count} noise points)")
	logger.debug(f"HDBSCAN produced {unique_clusters} clusters for user {user_id} ({noise_count} noise points)")

	from unittest.mock import MagicMock
	from unittest.mock import MagicMock, patch

	open htmlcov/index.html
	python -m webbrowser htmlcov/index.html

Conversation

suyashkumar102 commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ayusrjn commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

suyashkumar102 commented Apr 8, 2026

Uh oh!

ayusrjn commented Apr 8, 2026

Uh oh!

suyashkumar102 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ayusrjn commented Apr 7, 2026 •

edited

Loading