Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Signed-off-by: hongkuanz <[email protected]>
Copy link
Contributor

@hhzhang16 hhzhang16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include screenshots or a short demo in the MR description?

AIPERF_PREFILL_ATTN_DP_NUM_REQ_RATIO = 4

# Cost calculation defaults
GPU_COST_PER_HOUR = 3.0 # Cost per GPU per hour in dollars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this consistent among all our GPUs?

logger.addHandler(console_handler)

# Color palette for chart datasets
CHART_COLORS = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if p_idx ever exceeds 8, the colors will be repeated; is that expected functionality?

"#7f7f7f", # gray
]

WEB_UI_SELECTION_TIMEOUT = 3600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a long timeout (1 hour), maybe it should be shorter

Comment on lines +66 to +70
data["prefill"]["chart"]["target_line"]["value"] = args.ttft
data["prefill"]["chart"]["target_line"]["label"] = f"Target TTFT: {args.ttft} ms"

data["decode"]["chart"]["target_line"]["value"] = args.itl
data["decode"]["chart"]["target_line"]["label"] = f"Target ITL: {args.itl} ms"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the UI user aware that we set defaults if they do not choose to set it?

    parser.add_argument(
        "--ttft",
        type=float,
        default=config.get("sla", {}).get("ttft", 50.0),
        help="target Time To First Token (float, in milliseconds)",
    )
    parser.add_argument(
        "--itl",
        type=float,
        default=config.get("sla", {}).get("itl", 10.0),
        help="target Inter Token Latency (float, in milliseconds)",
    )

logger.info(f"Selection received: {plot_type}, row {row_idx}")

# Store selection for later confirmation
if plot_type == "cost":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this stored locally/ephemerally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants