Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
1c7fff7
Remove telemetry functionality
al-rigazzi Jul 28, 2025
346cbbd
Clean up remaining telemetry references and fix imports
al-rigazzi Jul 28, 2025
9ffd0bf
Remove final telemetry references from codebase
al-rigazzi Jul 28, 2025
78f748c
Fix indirect tests after telemetry removal
al-rigazzi Jul 28, 2025
b5b038d
Remove SmartDashboard integration and references
al-rigazzi Jul 28, 2025
0e50ad5
Fix mypy type annotation errors in CLI plugin system
al-rigazzi Jul 28, 2025
dcfc6d4
Fix remaining test failures and clean up telemetry remnants
al-rigazzi Jul 28, 2025
ce82ba6
Clean up remaining telemetry references in test files
al-rigazzi Jul 28, 2025
90a0f2f
make style
al-rigazzi Jul 28, 2025
f4154c2
Fix test expectations for new output file structure
al-rigazzi Jul 28, 2025
45c40d3
Fix all lint errors to unblock CI/CD
al-rigazzi Jul 28, 2025
811d573
Last fixes
al-rigazzi Jul 28, 2025
58aec22
Fix
al-rigazzi Jul 28, 2025
98b316b
Indirect timestamp functionality added back
al-rigazzi Jul 28, 2025
3a7b22b
Remove indirect entrypoint and corresponding tests
al-rigazzi Jul 28, 2025
5ae411c
Remove spurious files
al-rigazzi Jul 28, 2025
db4c360
Fix test failures and clean up remaining telemetry references
al-rigazzi Jul 28, 2025
4908c50
Remove lingering files
al-rigazzi Jul 28, 2025
26ebfda
Fix lingering output files in test_symlinking and test_output_files
al-rigazzi Jul 28, 2025
65812e5
Refine changelog
al-rigazzi Jul 28, 2025
9f9fd67
Remove unused error class
al-rigazzi Jul 28, 2025
a6c472c
Remove proxyable command
al-rigazzi Jul 28, 2025
7ec4165
Restore step information in dictified model
al-rigazzi Jul 29, 2025
356cbc7
Fix serialize calls
al-rigazzi Jul 29, 2025
ef93676
Remove unused telemetry fixtures from conftest.py
al-rigazzi Jul 29, 2025
b59392d
Remove defensive mkdirs
al-rigazzi Jul 29, 2025
2db93bb
Revert symlinking test
al-rigazzi Jul 29, 2025
4329ab5
Remove obsolete lines
al-rigazzi Jul 29, 2025
a893b34
Implement consistent metadata directory pattern
al-rigazzi Jul 29, 2025
2e86885
Removed unused completion status logic
al-rigazzi Jul 29, 2025
cac1d8f
Reinstate metadata_dir
al-rigazzi Jul 29, 2025
af08e35
Fix style
al-rigazzi Jul 29, 2025
79d1737
Fix lint
al-rigazzi Jul 31, 2025
7fcff0c
Fix metatdata_dir occurrences
al-rigazzi Jul 31, 2025
c2cceb2
Make metadata_dir mandatory in _create_batch_job_step
al-rigazzi Jul 31, 2025
a0b0b30
Make metadata_dir mandatory in _create_job_step
al-rigazzi Jul 31, 2025
d9171bf
Refactor metadata directory management to use LaunchedManifestBuilder
al-rigazzi Jul 31, 2025
5b6aacf
Remove unused pylint pragma
al-rigazzi Jul 31, 2025
2c7d698
Remove redundant mkdirs
al-rigazzi Jul 31, 2025
cc0c2c5
Revert _launch_orchestrator signature to remove metadata_dir parameter
al-rigazzi Jul 31, 2025
541e8a6
Restore entity-type-specific metadata directories
al-rigazzi Jul 31, 2025
df9bdb2
Fix controller
al-rigazzi Jul 31, 2025
a259ab5
Add tests
al-rigazzi Jul 31, 2025
f3e969a
make style
al-rigazzi Jul 31, 2025
8124c5f
Remove useless mkdirs
al-rigazzi Aug 1, 2025
79d374b
Udpate serialization path
al-rigazzi Aug 1, 2025
f7f67c1
Fix tests
al-rigazzi Aug 1, 2025
a355829
make style
al-rigazzi Aug 1, 2025
87ea2f4
Update metadata_dir structure
al-rigazzi Aug 4, 2025
523f681
Update changelog
al-rigazzi Aug 4, 2025
811f752
make style
al-rigazzi Aug 4, 2025
c9af73d
Revert symlinking test parameterization
al-rigazzi Aug 4, 2025
1678d9a
Revert test_symlink parameterization
al-rigazzi Aug 4, 2025
c02fd61
Use type, not stringified type
al-rigazzi Aug 4, 2025
3df5f66
Fix test
al-rigazzi Aug 4, 2025
0f9610e
Remove hard-coded .smartsim occurrences
al-rigazzi Aug 4, 2025
a92cfe7
Update dragon log dir
al-rigazzi Aug 4, 2025
bf37dcc
Update changelog
al-rigazzi Aug 4, 2025
233cba3
Update smartsim/_core/_cli/cli.py
al-rigazzi Aug 13, 2025
b9a7c79
Address MattToast's code review feedback (items 1-3)
al-rigazzi Aug 13, 2025
9eecc7d
Remove unused run_id from manifest system
al-rigazzi Aug 13, 2025
fabaab8
make style
al-rigazzi Aug 13, 2025
6e60ef4
Merge branch 'develop' of https://github.com/CrayLabs/SmartSim into d…
al-rigazzi Aug 13, 2025
70e1e37
Minor changes to headers
al-rigazzi Aug 13, 2025
4aa8289
Update copyright
al-rigazzi Aug 13, 2025
43cd3f3
Remove LaunchedManifest classes and clean up telemetry code
al-rigazzi Aug 13, 2025
ad33426
Fix orchestrator checkpoint saving
al-rigazzi Aug 14, 2025
540ee02
Changelog refinement
al-rigazzi Aug 14, 2025
1f5098e
Fix database host setup in orchestrator launch
al-rigazzi Aug 14, 2025
57b4cf3
Fix metadata directory uniqueness for multiple model runs
al-rigazzi Aug 14, 2025
88cd1ab
Move TStepLaunchMetaData to controller_utils.py and remove serialize.py
al-rigazzi Aug 14, 2025
1e3319e
Remove unused code
al-rigazzi Aug 14, 2025
63afd5b
Remove unused test file
al-rigazzi Aug 15, 2025
295e3b9
Revert wrong indentation
al-rigazzi Aug 15, 2025
11c511e
Remove comments
al-rigazzi Aug 15, 2025
b91cca2
Remove comments
al-rigazzi Aug 15, 2025
b3f5f3e
Make style
al-rigazzi Aug 15, 2025
0a82a14
Address simple part MattToast's comments
al-rigazzi Aug 28, 2025
790823e
Fixed lint
al-rigazzi Aug 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,11 @@ build:
fi
pre_create_environment:
- git clone --depth 1 https://github.com/CrayLabs/SmartRedis.git smartredis
- git clone --depth 1 https://github.com/CrayLabs/SmartDashboard.git smartdashboard
post_create_environment:
- python -m pip install .[dev,docs]
- cd smartredis; python -m pip install .
- cd smartredis/doc; doxygen Doxyfile_c; doxygen Doxyfile_cpp; doxygen Doxyfile_fortran
- ln -s smartredis/examples ./examples
- cd smartdashboard; python -m pip install .
pre_build:
- pip install typing_extensions==4.8.0
- pip install pydantic==1.10.13
Expand Down
140 changes: 0 additions & 140 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@

from __future__ import annotations

import asyncio
from collections import defaultdict
from dataclasses import dataclass
import json
Expand All @@ -43,18 +42,15 @@
import uuid
import warnings
from subprocess import run
import time

import psutil
import pytest

import smartsim
from smartsim import Experiment
from smartsim._core.launcher.dragon.dragonConnector import DragonConnector
from smartsim._core.launcher.dragon.dragonLauncher import DragonLauncher
from smartsim._core.config import CONFIG
from smartsim._core.config.config import Config
from smartsim._core.utils.telemetry.telemetry import JobEntity
from smartsim.database import Orchestrator
from smartsim.entity import Model
from smartsim.error import SSConfigError, SSInternalError
Expand Down Expand Up @@ -706,143 +702,7 @@ def config() -> Config:
return CONFIG


class MockSink:
"""Telemetry sink that writes console output for testing purposes"""

def __init__(self, delay_ms: int = 0) -> None:
self._delay_ms = delay_ms
self.num_saves = 0
self.args: t.Any = None

async def save(self, *args: t.Any) -> None:
"""Save all arguments as console logged messages"""
self.num_saves += 1
if self._delay_ms:
# mimic slow collection....
delay_s = self._delay_ms / 1000
await asyncio.sleep(delay_s)
self.args = args


@pytest.fixture
def mock_sink() -> t.Type[MockSink]:
return MockSink


@pytest.fixture
def mock_con() -> t.Callable[[int, int], t.Iterable[t.Any]]:
"""Generates mock db connection telemetry"""

def _mock_con(min: int = 1, max: int = 254) -> t.Iterable[t.Any]:
for i in range(min, max):
yield [
{"addr": f"127.0.0.{i}:1234", "id": f"ABC{i}"},
{"addr": f"127.0.0.{i}:2345", "id": f"XYZ{i}"},
]

return _mock_con


@pytest.fixture
def mock_mem() -> t.Callable[[int, int], t.Iterable[t.Any]]:
"""Generates mock db memory usage telemetry"""

def _mock_mem(min: int = 1, max: int = 1000) -> t.Iterable[t.Any]:
for i in range(min, max):
yield {
"total_system_memory": 1000 * i,
"used_memory": 1111 * i,
"used_memory_peak": 1234 * i,
}

return _mock_mem


@pytest.fixture
def mock_redis() -> t.Callable[..., t.Any]:
def _mock_redis(
conn_side_effect=None,
mem_stats=None,
client_stats=None,
coll_side_effect=None,
):
"""Generate a mock object for the redis.Redis contract"""

class MockConn:
def __init__(self, *args: t.Any, **kwargs: t.Any) -> None:
if conn_side_effect is not None:
conn_side_effect()

async def info(self, *args: t.Any, **kwargs: t.Any) -> t.Dict[str, t.Any]:
if coll_side_effect:
await coll_side_effect()

if mem_stats:
return next(mem_stats)
return {
"total_system_memory": "111",
"used_memory": "222",
"used_memory_peak": "333",
}

async def client_list(
self, *args: t.Any, **kwargs: t.Any
) -> t.Dict[str, t.Any]:
if coll_side_effect:
await coll_side_effect()

if client_stats:
return next(client_stats)
return {"addr": "127.0.0.1", "id": "111"}

async def ping(self):
return True

return MockConn

return _mock_redis


class MockCollectorEntityFunc(t.Protocol):
@staticmethod
def __call__(
host: str = "127.0.0.1",
port: int = 6379,
name: str = "",
type: str = "",
telemetry_on: bool = False,
) -> "JobEntity": ...


@pytest.fixture
def mock_entity(test_dir: str) -> MockCollectorEntityFunc:
def _mock_entity(
host: str = "127.0.0.1",
port: int = 6379,
name: str = "",
type: str = "",
telemetry_on: bool = False,
) -> "JobEntity":
test_path = pathlib.Path(test_dir)

entity = JobEntity()
entity.name = name if name else str(uuid.uuid4())
entity.status_dir = str(test_path / entity.name)
entity.type = type
entity.telemetry_on = True
entity.collectors = {
"client": "",
"client_count": "",
"memory": "",
}
entity.config = {
"host": host,
"port": str(port),
}
entity.telemetry_on = telemetry_on
return entity

return _mock_entity


class CountingCallable:
Expand Down
2 changes: 0 additions & 2 deletions doc/api/smartsim_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ Experiment
Experiment.reconnect_orchestrator
Experiment.preview
Experiment.summary
Experiment.telemetry

.. autoclass:: Experiment
:show-inheritance:
Expand Down Expand Up @@ -368,7 +367,6 @@ Orchestrator
Orchestrator.set_max_clients
Orchestrator.set_max_message_size
Orchestrator.set_db_conf
Orchestrator.telemetry
Orchestrator.checkpoint_file
Orchestrator.batch

Expand Down
57 changes: 36 additions & 21 deletions doc/changelog.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
# Changelog

Listed here are the changes between each release of SmartSim,
SmartRedis and SmartDashboard.
Listed here are the changes between each release of SmartSim and SmartRedis.

Jump to:
- {ref}`SmartRedis changelog<smartredis-changelog>`
- {ref}`SmartDashboard changelog<smartdashboard-changelog>`

## SmartSim

To be released at some point in the future

Description

- **BREAKING CHANGE**: Removed telemetry functionality, LaunchedManifest tracking
classes, and SmartDashboard integration
- Update copyright headers from 2021-2024 to 2021-2025 across the entire codebase
- Python 3.12 is now supported; where available, installed TensorFlow version is now 2.16.2, PyTorch is 2.7.1.
- Python 3.12 is now supported; where available, installed TensorFlow version
is now 2.16.2, PyTorch is 2.7.1.
- Drop Python 3.9 support
- Terminate LSF and LSB support
- Implement workaround for Tensorflow that allows RedisAI to build with GCC-14
Expand All @@ -23,20 +24,43 @@ Description

Detailed Notes

- Copyright headers have been updated from "2021-2024" to "2021-2025" across 271 files
including Python source files, configuration files, documentation, tests, Docker files,
shell scripts, and other supporting files to reflect the new year.
- **BREAKING CHANGE**: Removed telemetry functionality, LaunchedManifest tracking
system, and SmartDashboard integration.
This includes complete removal of the telemetry monitor and collection system,
telemetry configuration classes (`TelemetryConfiguration`,
`ExperimentTelemetryConfiguration`), all telemetry-related API methods
(`Experiment.telemetry`, `Orchestrator.telemetry`), telemetry collectors and
sinks, and the `watchdog` dependency. Also removed SmartDashboard integration
and CLI plugin, along with the indirect entrypoint launching mechanism.
Additionally removed the `LaunchedManifest`, `_LaunchedManifestMetadata`, and
`LaunchedManifestBuilder` classes that were used for telemetry data collection
during entity launches. Simplified the controller launch workflow by removing
telemetry metadata tracking and launch manifest serialization. Cleaned up the
`serialize.py` module by removing orphaned telemetry functions (80% code
reduction), preserving only essential type definitions. Updated all test files
to remove LaunchedManifest dependencies and deleted obsolete telemetry test
files. The core `Manifest` class for entity organization remains unchanged,
maintaining backward compatibility for entity management while removing the
telemetry overhead. Enhanced the metadata directory system to use a centralized
`.smartsim/metadata/` structure for job output files with entity-specific
subdirectories (`ensemble/{name}`, `model/{name}`, `database/{name}`) and
proper symlink management.
([SmartSim-PR789](https://github.com/CrayLabs/SmartSim/pull/789))
- Copyright headers have been updated from "2021-2024" to "2021-2025" across
271 files including Python source files, configuration files, documentation,
tests, Docker files, shell scripts, and other supporting files to reflect the
new year.
([SmartSim-PR790](https://github.com/CrayLabs/SmartSim/pull/790))
- Python 3.12 is now supported. TensorFlow 2.16.2 and PyTorch 2.7.1 library files
are installed as part of `smart build` process when available. On Mac, ONNX runtime
1.22.0 is now installed, together with ONNX 1.16.
- Python 3.12 is now supported. TensorFlow 2.16.2 and PyTorch 2.7.1 library
files are installed as part of `smart build` process when available. On Mac,
ONNX runtime 1.22.0 is now installed, together with ONNX 1.16.
([SmartSim-PR785](https://github.com/CrayLabs/SmartSim/pull/785))
- Python 3.9 will not be supported anymore, the last stable version of SmartSim
with support for Python 3.9 will be 0.8.
([SmartSim-PR781](https://github.com/CrayLabs/SmartSim/pull/781))
- After the supercomputer Summit was decommissioned, a decision was made to
terminate SmartSim's support of the LSF launcher and LSB scheduler. If
this impacts your work, please contact us.
terminate SmartSim's support of the LSF launcher and LSB scheduler. If this
impacts your work, please contact us.
([SmartSim-PR780](https://github.com/CrayLabs/SmartSim/pull/780))
- Fix typos in the `train_surrogate` tutorial documentation.
([SmartSim-PR758](https://github.com/CrayLabs/SmartSim/pull/758))
Expand Down Expand Up @@ -1104,12 +1128,3 @@ Description:
```{include} ../smartredis/doc/changelog.md
:start-line: 2
```

------------------------------------------------------------------------

(smartdashboard-changelog)=
## SmartDashboard

```{include} ../smartdashboard/doc/changelog.md
:start-line: 2
```
6 changes: 0 additions & 6 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,6 @@
sr_advanced_topics
api/smartredis_api

.. toctree::
:maxdepth: 2
:caption: SmartDashboard

smartdashboard

.. toctree::
:maxdepth: 2
:caption: Reference
Expand Down
7 changes: 0 additions & 7 deletions doc/smartdashboard.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docker/docs/dev/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,6 @@ RUN git clone https://github.com/CrayLabs/SmartRedis.git --branch develop --dept
&& python -m pip install . \
&& rm -rf ~/.cache/pip

# Install smartdashboard
RUN git clone https://github.com/CrayLabs/SmartDashboard.git --branch develop --depth=1 smartdashboard \
&& cd smartdashboard \
&& python -m pip install . \
&& rm -rf ~/.cache/pip

# Install docs dependencies and SmartSim
RUN NO_CHECKS=1 SMARTSIM_SUFFIX=dev python -m pip install .[docs]

Expand Down
1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,6 @@ class BuildError(Exception):
"GitPython<=3.1.43",
"protobuf<=3.20.3",
"jinja2>=3.1.2",
"watchdog>4,<5",
"pydantic>2",
"pyzmq>=25.1.2",
"pygithub>=2.3.0",
Expand Down
4 changes: 3 additions & 1 deletion smartsim/_core/_cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,9 @@ def __init__(self, menu: t.List[MenuItemConfig]) -> None:
)

self.register_menu_items(menu)
self.register_menu_items([plugin() for plugin in plugins])
# Register plugin menu items (currently empty since all plugins were removed)
plugin_items = [plugin() for plugin in plugins]
self.register_menu_items(plugin_items)

def execute(self, cli_args: t.List[str]) -> int:
if len(cli_args) < 2:
Expand Down
17 changes: 2 additions & 15 deletions smartsim/_core/_cli/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,5 @@ def process_execute(
return process_execute


def dashboard() -> MenuItemConfig:
return MenuItemConfig(
"dashboard",
(
"Start the SmartSim dashboard to monitor experiment output from a "
"graphical user interface. This requires that the SmartSim Dashboard "
"Package be installed. For more infromation please visit "
"https://github.com/CrayLabs/SmartDashboard"
),
dynamic_execute("smartdashboard", "Dashboard"),
is_plugin=True,
)


plugins = (dashboard,)
# No plugins currently available
plugins: t.Tuple[t.Callable[[], MenuItemConfig], ...] = ()
1 change: 0 additions & 1 deletion smartsim/_core/_cli/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,6 @@ def test_install(
with_onnx: bool,
) -> None:
exp = Experiment("ValidationExperiment", exp_path=location, launcher="local")
exp.telemetry.disable()
port = find_free_port() if port is None else port

with _make_managed_local_orc(exp, port) as client:
Expand Down
Loading