🚀 feat(inspect): Add progressbar #3045

ashwinvaidya17 · 2025-10-23T06:07:37Z

📝 Description

Provide a clear summary of the changes and the issue that has been addressed.
🛠️ Fixes # (issue number)

✨ Changes

Select what type of change your PR is:

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).
🏷️ My PR title follows conventional commit format.

For more information about code review checklists, see the Code Review Checklist.

Signed-off-by: Ashwin Vaidya <[email protected]>

application/backend/src/db/schema.py

application/backend/src/services/job_service.py

application/backend/src/services/training_service.py

maxxgx

Thanks for the PR @ashwinvaidya17. Looks good overall, left some comments

application/backend/src/pydantic_models/job.py

application/backend/src/api/endpoints/job_endpoints.py

application/backend/src/pydantic_models/job.py

application/backend/src/services/job_service.py

application/backend/src/services/training_service.py

application/backend/src/utils/callbacks.py

application/backend/src/services/training_service.py

Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot

Pull Request Overview

This PR adds a progress bar feature to the inspect interface that displays real-time training progress with the ability to cancel jobs. The implementation includes both frontend UI components and backend progress tracking infrastructure.

Key changes:

Adds a new status bar component with integrated progress display for training jobs
Implements real-time progress tracking using Server-Sent Events (SSE) and Lightning callbacks
Introduces job cancellation functionality with corresponding API endpoints

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
application/ui/src/routes/inspect/inspect.tsx	Adds StatusBar component to the inspect layout
application/ui/src/features/inspect/statusbar/statusbar.component.tsx	Creates status bar container component
application/ui/src/features/inspect/statusbar/items/progressbar.component.tsx	Implements progress bar with job polling and cancellation
application/backend/src/utils/callbacks.py	Adds Lightning callback for progress synchronization
application/backend/src/services/training_service.py	Integrates progress tracking into training workflow
application/backend/src/services/job_service.py	Adds progress streaming and job cancellation methods
application/backend/src/pydantic_models/job.py	Extends job model with stage field and cancellation response
application/backend/src/db/schema.py	Adds stage column to job database schema
application/backend/src/api/endpoints/job_endpoints.py	Adds progress streaming and cancellation endpoints

Files not reviewed (1)

application/ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-31T08:45:44Z

application/backend/src/utils/callbacks.py

+    def setup(self, trainer: Trainer, pl_module: LightningModule, stage: str) -> None:
+        self._send_progress(0, stage)


The stage parameter should be of type RunningStage to match the _send_progress method signature, but it's declared as str. This will cause a type error.

Copilot · 2025-10-31T08:45:45Z

application/backend/src/utils/callbacks.py

+    def teardown(self, trainer: Trainer, pl_module: LightningModule, stage: RunningStage) -> None:
+        self._send_progress(1.0, stage)


The teardown method signature uses RunningStage type but the setup method uses str type. These should be consistent, and both should use RunningStage type.

application/backend/src/services/training_service.py

Copilot · 2025-10-31T08:45:46Z

application/backend/src/services/job_service.py

+                    cached_still_running = await cls.is_job_still_running(job_id=job_id)
+                    last_status_check = now
+                still_running = cached_still_running
+                yield json.dumps({"progress": job.progress, "stage": job.stage})


The job object is fetched once outside the loop but never refreshed. Progress and stage values will remain static throughout the streaming. The job should be refetched from the database in each iteration.

application/backend/src/api/endpoints/job_endpoints.py

Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Files not reviewed (1)

application/ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

application/backend/src/services/training_service.py

application/ui/src/features/inspect/statusbar/items/progressbar.component.tsx

application/backend/src/services/job_service.py

application/backend/src/services/training_service.py

Co-authored-by: Copilot <[email protected]> Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Files not reviewed (1)

application/ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

application/backend/src/services/training_service.py

application/backend/src/services/job_service.py

maxxgx · 2025-10-31T16:06:08Z

application/backend/src/db/schema.py

    id: Mapped[str] = mapped_column(primary_key=True, default=lambda: str(uuid4()))
    project_id: Mapped[str] = mapped_column(ForeignKey("projects.id"))
    type: Mapped[str] = mapped_column(String(64), nullable=False)
+    stage: Mapped[JobStage] = mapped_column(Enum(JobStage), nullable=False)


What's the advantage of declaring an Enum column in the DB schema?

What is the values that is actually stored in DB, e.g. int or str?

I wonder what happens if you add/remove job stages, will queries still work?

Hmm not sure how it will respond when we modify it. I was following Mark's suggestion #3045 (comment). Maybe we can achieve this in a different manner?

I've run some testing with Enum columns. It seems to be safe, at least with StrEnum, it saves the enum as a string. Adding/removing values in the code does not seem to require the DB to be updated.

maxxgx · 2025-10-31T16:12:16Z

application/backend/src/pydantic_models/job.py

+class JobStage(StrEnum):
+    """Job stages follow PyTorch Lightning stages with the addition of idle stage.
+
+    See ``lightning.pytorch.trainer.states.RunningStage`` for more details.
+    """
+
+    IDLE = "idle"
+    TRAINING = "train"
+    SANITY_CHECKING = "sanity_check"
+    VALIDATING = "validate"
+    TESTING = "test"
+    PREDICTING = "predict"


Not sure if we are going to have other types of jobs in the future, but these are only related to training and there is overlap with job.status. Perhaps, it's more appropriate to store the training stage as part of job.message string

We will have to extract it from the string if we want to show it in the progress bar.

The job message can be reset based on the current stage. So you could just use the whole message.

ATM, it seems you're not setting job.message at all.

In Geti classic, job.step.message was displayed:

Looks like JobStage corresponds to job.step in the image above. However, our job progress is global, so we don't have this mechanism of tracking individual stage progress. Not sure how well this works in terms of UX: users would see progress resetting from 100 to 0 (e.g. from stage "training" to "validate"), but they don't know the total number of stages elapsed or remaining.

application/backend/src/pydantic_models/job.py

maxxgx · 2025-10-31T16:17:19Z

application/backend/src/services/training_service.py

+            logger.debug("Syncing progress with db stopped")
+            synchronization_task.cancel()


You could put this 2 lines in a finally block

maxxgx · 2025-10-31T16:20:56Z

application/backend/src/utils/callbacks.py

+    # Test callbacks
+    def on_test_start(self, trainer: Trainer, pl_module: LightningModule) -> None:
+        """Called when testing starts."""
+        self._send_progress(0, JobStage.TESTING)


How come the progress is 0 here?

The idea is to treat each stage as having different progress instead of clubbing train->test as a single progress.
Currently the progressbar will first show the training progress, and then the testing progress. This just makes the progress a bit verbose.

Added a comment above regarding this "staged" progress tracking approach.

Co-authored-by: Max Xiang <[email protected]> Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 1 comment.

Files not reviewed (1)

application/ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-03T10:15:24Z

application/backend/src/services/training_service.py

        # Capture pytorch stdout logs into logger
        with redirect_stdout(LoggerStdoutWriter()):  # type: ignore[type-var]
-            engine.train(model=anomalib_model, datamodule=datamodule)
+            engine.fit(model=anomalib_model, datamodule=datamodule)


Changed from engine.train() to engine.fit() - this appears to be an API update but should be verified that this method exists and provides the same functionality.

Signed-off-by: Ashwin Vaidya <[email protected]>

MarkRedeman · 2025-11-03T13:14:03Z

application/ui/src/routes/inspect/inspect.tsx

-                </InferenceProvider>
-            </SelectedMediaItemProvider>
-        </Grid>
+        <div style={{ display: 'flex', flexDirection: 'column', height: '100%' }}>


Let's revert this change, the extra div is not needed.

MarkRedeman · 2025-11-03T13:15:39Z

application/ui/src/routes/inspect/inspect.tsx

-        </Grid>
+        <div style={{ display: 'flex', flexDirection: 'column', height: '100%' }}>
+            <Grid
+                areas={['toolbar sidebar', 'canvas sidebar']}


Suggested change

areas={['toolbar sidebar', 'canvas sidebar']}

areas={['toolbar sidebar', 'canvas sidebar', 'footer sidebar']}

The Grid component is essentially a wrapper around css's grid properties we can add a new footer grid cell that is used for displaying the sidebar.

MarkRedeman · 2025-11-03T13:16:25Z

application/ui/src/routes/inspect/inspect.tsx

+        <div style={{ display: 'flex', flexDirection: 'column', height: '100%' }}>
+            <Grid
+                areas={['toolbar sidebar', 'canvas sidebar']}
+                rows={['size-800', 'minmax(0, 1fr)']}


With the above change this should make it so that the footer takes up a minimal height, (do check this though, we might need to tweak this)

Suggested change

rows={['size-800', 'minmax(0, 1fr)']}

rows={['size-800', 'minmax(0, 1fr)', 'auto']}

MarkRedeman · 2025-11-03T13:17:38Z

application/ui/src/routes/inspect/inspect.tsx

+                        <Sidebar />
+                    </InferenceProvider>
+                </SelectedMediaItemProvider>
+            </Grid>
+            <StatusBar />


Suggested change

<Sidebar />

</InferenceProvider>

</SelectedMediaItemProvider>

</Grid>

<StatusBar />

<Sidebar />

<Footer />

</InferenceProvider>

</SelectedMediaItemProvider>

</Grid>

Let's move the statusbar into a Footer component and put it inside of the grid.

MarkRedeman · 2025-11-03T13:20:44Z

application/ui/src/features/inspect/statusbar/statusbar.component.tsx

+export const StatusBar = () => {
+    return (
+        <View gridArea={'statusbar'} backgroundColor={'gray-100'} width={'100%'} height={'30px'} overflow={'hidden'}>


Suggested change

export const StatusBar = () => {

return (

<View gridArea={'statusbar'} backgroundColor={'gray-100'} width={'100%'} height={'30px'} overflow={'hidden'}>

export const Footer = () => {

return (

<View gridArea={'footer'} backgroundColor={'gray-100'} width={'100%'} height={'size-400'} overflow={'hidden'}>

For spacing we try to adhere to spectrum's styling guidelines. This makes it so that the applications' UI is more consistent.

MarkRedeman · 2025-11-03T13:22:12Z

application/ui/src/features/inspect/statusbar/items/progressbar.component.tsx

Currently this component feels very non-idomatic react code. I will send you a component that should follow more of our guidelines.

Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot

Pull Request Overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.

Files not reviewed (1)

application/ui/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-04T10:11:52Z

application/ui/src/features/inspect/jobs/show-job-logs.component.tsx

    return (
        <Flex direction='column' gap='size-25'>
-            {query.data?.map((line, idx) => <Text key={idx}> {line}</Text>)}
+            {query.data?.map((line, idx) => <Text key={idx}> {line.text}</Text>)}


The code attempts to access line.text but the fetchSSE function now yields the entire parsed JSON object. This will cause a runtime error if the object doesn't have a text property.

Suggested change

{query.data?.map((line, idx) => <Text key={idx}> {line.text}</Text>)}

{query.data?.map((line, idx) => (

<Text key={idx}>

{typeof line === 'object' && line !== null && 'text' in line

? line.text

: JSON.stringify(line)}

</Text>

))}

Copilot · 2025-11-04T10:11:53Z

application/backend/src/services/training_service.py

        engine = Engine(
            default_root_dir=model.export_path,
            logger=[trackio, tensorboard],
+            devices=[0],  # Only single GPU training is supported for now


Hardcoded device index [0] should be derived from the device parameter or made configurable. This creates inconsistency with the device parameter handling.

Copilot · 2025-11-04T10:11:53Z

application/backend/src/services/training_service.py

+        finally:
+            logger.debug("Syncing progress with db stopped")
+            synchronization_task.cancel()


The synchronization_task variable is referenced in the finally block but may not be defined if an exception occurs before line 79-83. This will cause an UnboundLocalError.

Copilot · 2025-11-04T10:11:53Z

application/backend/src/utils/callbacks.py

+
+    def on_test_epoch_end(self, trainer: Trainer, pl_module: LightningModule) -> None:
+        """Called when a test epoch ends."""
+        progress = (trainer.current_epoch + 1) / trainer.max_epochs if trainer.max_epochs else 0.5


Magic number 0.5 (50%) is used as fallback progress when max_epochs is not available. This should be documented or made configurable to clarify the intended behavior.

Copilot · 2025-11-04T10:11:54Z

application/backend/src/utils/callbacks.py

+
+    def on_predict_epoch_end(self, trainer: Trainer, pl_module: LightningModule) -> None:
+        """Called when a prediction epoch ends."""
+        progress = (trainer.current_epoch + 1) / trainer.max_epochs if trainer.max_epochs else 0.5


Duplicated progress calculation logic with magic number 0.5. This should be extracted to a helper method to avoid code duplication and ensure consistency.

Signed-off-by: Ashwin Vaidya <[email protected]>

Add progress

4ff613c

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 requested review from MarkRedeman and maxxgx October 23, 2025 06:07

ashwinvaidya17 requested a review from samet-akcay as a code owner October 23, 2025 06:07

Merge branch 'feature/geti-inspect' into ashwin/feat/progress_bar_sse

12e5c52

MarkRedeman reviewed Oct 23, 2025

View reviewed changes

application/backend/src/db/schema.py Outdated Show resolved Hide resolved

application/backend/src/services/job_service.py Outdated Show resolved Hide resolved

application/backend/src/services/training_service.py Outdated Show resolved Hide resolved

maxxgx reviewed Oct 23, 2025

View reviewed changes

MarkRedeman reviewed Oct 23, 2025

View reviewed changes

application/backend/src/services/training_service.py Show resolved Hide resolved

ashwinvaidya17 added 2 commits October 31, 2025 09:22

Merge branch 'feature/geti-inspect' into ashwin/feat/progress_bar_sse

66b2f46

Merge fixes

f3855a2

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 marked this pull request as draft October 31, 2025 08:44

Copilot AI review requested due to automatic review settings October 31, 2025 08:44

Copilot AI reviewed Oct 31, 2025

View reviewed changes

Fix progress bar

12130d2

Signed-off-by: Ashwin Vaidya <[email protected]>

ashwinvaidya17 marked this pull request as ready for review October 31, 2025 13:01

Copilot AI review requested due to automatic review settings October 31, 2025 13:01

Copilot AI reviewed Oct 31, 2025

View reviewed changes

Update application/backend/src/services/job_service.py

4e05737

Co-authored-by: Copilot <[email protected]> Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot AI review requested due to automatic review settings October 31, 2025 13:13

Copilot AI reviewed Oct 31, 2025

View reviewed changes

application/backend/src/services/training_service.py Show resolved Hide resolved

application/backend/src/services/training_service.py Show resolved Hide resolved

application/backend/src/services/job_service.py Show resolved Hide resolved

maxxgx reviewed Oct 31, 2025

View reviewed changes

Update application/backend/src/pydantic_models/job.py

004e880

Co-authored-by: Max Xiang <[email protected]> Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot AI review requested due to automatic review settings November 3, 2025 10:14

Copilot AI reviewed Nov 3, 2025

View reviewed changes

Use finally block

9776eb4

Signed-off-by: Ashwin Vaidya <[email protected]>

maxxgx linked an issue Nov 3, 2025 that may be closed by this pull request

📋 [TASK] Add training progress #3073

Open

MarkRedeman reviewed Nov 3, 2025

View reviewed changes

Add Mark's changes

e29cb09

Signed-off-by: Ashwin Vaidya <[email protected]>

Copilot AI review requested due to automatic review settings November 4, 2025 10:10

Copilot AI reviewed Nov 4, 2025

View reviewed changes

Use job.message for informing training stage

89ebcc8

Signed-off-by: Ashwin Vaidya <[email protected]>

		def setup(self, trainer: Trainer, pl_module: LightningModule, stage: str) -> None:
		self._send_progress(0, stage)

		def teardown(self, trainer: Trainer, pl_module: LightningModule, stage: RunningStage) -> None:
		self._send_progress(1.0, stage)

		logger.debug("Syncing progress with db stopped")
		synchronization_task.cancel()

	areas={['toolbar sidebar', 'canvas sidebar']}
	areas={['toolbar sidebar', 'canvas sidebar', 'footer sidebar']}

	rows={['size-800', 'minmax(0, 1fr)']}
	rows={['size-800', 'minmax(0, 1fr)', 'auto']}

-            {query.data?.map((line, idx) => <Text key={idx}> {line.text}</Text>)}
+            {query.data?.map((line, idx) => (
+                <Text key={idx}>
+                    {typeof line === 'object' && line !== null && 'text' in line
+                        ? line.text
+                        : JSON.stringify(line)}
+                </Text>
+            ))}

🚀 feat(inspect): Add progressbar #3045

Are you sure you want to change the base?

🚀 feat(inspect): Add progressbar #3045

Uh oh!

Conversation

ashwinvaidya17 commented Oct 23, 2025

📝 Description

✨ Changes

✅ Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxxgx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maxxgx Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxxgx Oct 31, 2025 •

edited

Loading