xla_model.RateTracker doesn't have a docstring and its behavior is subtle and potentially confusing.

## 📚 Documentation

The `RateTracker` class in https://github.com/pytorch/xla/blob/fe3f23c62c747da30595cb9906d929b926aae6e4/torch_xla/core/xla_model.py doesn't have a docstring.  This class is [used in lots of tests](https://github.com/search?q=repo%3Apytorch%2Fxla%20RateTracker&type=code), including [this one](https://github.com/pytorch/xla/blob/master/test/test_train_mp_mnist.py) that is referenced from the [main documentation](https://pytorch.org/xla/release/2.2/index.html), so new PyTorch/XLA users may see it as a natural and supported way to track and report training efficiency metrics.

`RateTracker`'s behavior is subtle and potentially confusing, since tracking throughput can involve measuring data at different granularities (e.g. batch, example, or, for LLMs, tokens) and reporting per-accelerator, per-host, or globally.  Here is what I think the answers to these are; please correct me.

Following the examples in those tests, (where the batch size is added to the tracker at each training step), I think that `rate` measures the examples (not tokens) per second seen during the last batch (specifically, since the last time `.rate()` was called) and `global_rate` measures the same for the whole training run.  Therefore the expectation is that global_rate will be slow in the beginning but after compilation and other one-time costs it will rise and typically approach the per-batch training rate, though the latter may vary.

In terms of what granularity of devices the metrics reflect, for SPMD, I think these will be both global metrics (for the whole training job), but for other distribution strategies, I think they're per-device.

Is that right? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

xla_model.RateTracker doesn't have a docstring and its behavior is subtle and potentially confusing. #6760

📚 Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xla_model.RateTracker doesn't have a docstring and its behavior is subtle and potentially confusing. #6760

Description

📚 Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions