Skip to content

Conversation

anson627
Copy link

@anson627 anson627 commented Aug 27, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

in clusterloader2 exec measurement, add a parameter called streamOutput, if it is true, then redirect commands output to stdout/stderr to better monitor progress, for example,

config.yaml

...
  - name: Waiting for jobs to be finished
    measurements:
    - Identifier: WaitForFinishedJobs
      Method: Exec
      Params:
        timeout: 30m
        streamOutput: true
        command:
        - "bash"
        - "wait-for-jobs.sh"
        - "100"
...

wait-for-jobs.sh

#!/bin/bash

expect_completed=$1
echo "waiting for $expect_completed Jobs to be completed successfully"

while true; do
    num_completed=0
    num_failed=0
    num_running=0
    num_pending=0
    
    # Print table header
    echo "╭─────────────┬───────────┬─────────┬─────────┬────────╮"
    echo "│ Namespace   │ Completed │ Running │ Pending │ Failed │"
    echo "├─────────────┼───────────┼─────────┼─────────┼────────┤"
    
    for ns in $(kubectl get ns --no-headers -o custom-columns=":metadata.name" | grep '^test-'); do
        # Initialize namespace-specific counters
        ns_completed=0
        ns_running=0
        ns_pending=0
        ns_failed=0
        
        # Get job statuses and count them
        jobs_output=$(kubectl get jobs -n "$ns" --no-headers 2>/dev/null)
        if [[ -n "$jobs_output" ]]; then
            # Process each job line
            while IFS= read -r line; do
                if [[ -n "$line" ]]; then
                    # Parse job fields: NAME STATUS COMPLETIONS DURATION AGE
                    status=$(echo "$line" | awk '{print $2}')
                    completions=$(echo "$line" | awk '{print $3}')
                    
                    # Determine job status based on kubectl output
                    if [[ "$status" == "Complete" ]]; then
                        num_completed=$((num_completed+1))
                        ns_completed=$((ns_completed+1))
                    elif [[ "$status" == "Running" ]]; then
                        # Check if completions field matches pattern X/Y to distinguish running vs pending
                        if [[ "$completions" =~ ^([0-9]+)/([0-9]+)$ ]]; then
                            current=${BASH_REMATCH[1]}
                            total=${BASH_REMATCH[2]}
                            
                            if [[ "$current" -gt 0 && "$current" -lt "$total" ]]; then
                                num_running=$((num_running+1))
                                ns_running=$((ns_running+1))
                            else
                                num_pending=$((num_pending+1))
                                ns_pending=$((ns_pending+1))
                            fi
                        else
                            num_running=$((num_running+1))
                            ns_running=$((ns_running+1))
                        fi
                    elif [[ "$status" == "Failed" ]]; then
                        num_failed=$((num_failed+1))
                        ns_failed=$((ns_failed+1))
                    else
                        num_pending=$((num_pending+1))
                        ns_pending=$((ns_pending+1))
                    fi
                fi
            done <<< "$jobs_output"
        fi
        
        # Print namespace row in table format
        printf "│ %-11s │ %9s │ %7s │ %7s │ %6s │\n" "$ns" "$ns_completed" "$ns_running" "$ns_pending" "$ns_failed"
    done
    
    # Print table footer
    echo "╰─────────────┴───────────┴─────────┴─────────┴────────╯"
    echo

    echo "Job Status Summary:"
    echo "  Completed: $num_completed"
    echo "  Running: $num_running"
    echo "  Pending: $num_pending"
    echo "  Failed: $num_failed"

    if [[ "$num_completed" -ge "$expect_completed" ]]; then
        break;
    fi

  sleep 30
done

this should print out test progress from job creation to completion among all namespaces in real-time

I0827 15:36:12.060017   70282 simple_test_executor.go:162] Step "[step: 03] Create jobs" started
I0827 15:36:19.887239   70282 simple_test_executor.go:183] Step "[step: 03] Create jobs" ended
I0827 15:36:19.887265   70282 simple_test_executor.go:162] Step "[step: 04] Waiting for jobs to be finished" started
I0827 15:36:19.894821   70282 exec.go:81] Running [bash wait-for-jobs.sh 100] with timeout 30m0s, attempt 1
I0827 15:36:19.894966   70282 exec.go:92] Streaming command output to stdout/stderr in real-time
waiting for 100 Jobs to be completed successfully
╭─────────────┬───────────┬─────────┬─────────┬────────╮
│ Namespace   │ Completed │ Running │ Pending │ Failed │
├─────────────┼───────────┼─────────┼─────────┼────────┤
│ test-1      │         0 │       0 │     500 │      0 │
│ test-10     │         0 │       0 │     500 │      0 │
│ test-2      │         0 │       0 │     500 │      0 │
│ test-3      │         0 │       0 │     500 │      0 │
│ test-4      │         0 │       0 │     500 │      0 │
│ test-5      │         0 │       0 │     500 │      0 │
│ test-6      │         0 │       0 │     500 │      0 │
│ test-7      │         0 │       0 │     500 │      0 │
│ test-8      │         0 │       0 │     500 │      0 │
│ test-9      │         0 │       0 │     500 │      0 │
╰─────────────┴───────────┴─────────┴─────────┴────────╯

Job Status Summary:
  Completed: 0
  Running: 0
  Pending: 5000
  Failed: 0

this is very handy when job test is making slow progress or no progress at all

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 27, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @anson627. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 27, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anson627
Once this PR has been reviewed and has the lgtm label, please assign mborsz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants