Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSA Things not in PSA required to fully implement the INT draft #510

Open
jafingerhut opened this issue Dec 6, 2017 · 8 comments
Open

Comments

@jafingerhut
Copy link
Collaborator

jafingerhut commented Dec 6, 2017

Reference: Draft version of INT spec dated Oct 17, 2017 retrieved here: https://github.com/p4lang/p4-spec/blob/master/applications/telemetry/SPEC_Inband_Network_Telemetry.pdf

In particular, Section 3 "What to monitor"

First, the list of things that are supported by the latest PSA draft as of 2017-Dec-05:

  • switch id - a number that should come from control plane configuration
  • ingress port id - either ingress_port, or if desired, that value mapped to some less platform-specific value, e.g. via a P4 table
  • ingress timestamp - PSA defines an ingress and egress timestamp that might not be taken at the precise moment that INT specifies, e.g. ingress timestamp in PSA is allowed to be when the packet began ingress parsing, and egress timestamp is when the packet began egress parsing, which might be significantly different than the time the packet left the physical port, at least if features like Ethernet pause flow control are in use.
  • egress port id - egress standard metadata value egress_port could be used, or as for ingress port above, could be mapped to some other numeric value via a table.
  • hop latency - egress minus ingress timestamp is close, with same caveats as mentioned above for timestamps

Now the ones that PSA has limited or no capabilities to enable:

  • egress port TX link utilization - Perhaps this could be done in some approximate fashion using a per egress port meter during egress processing, or a register in egress with per egress port state. Detail: What should the TX link utilization be measured as during times when Ethernet pause flow control prevents sending packets? 100%? 0%? It should 'stick' at the value it had before pause began, and remain there until pause is over?
  • queue occupancy - Adding such a capability has been brought up in the PSA work group before, but twice has been postponed for later consideration. One could implement it with a Register extern if the same extern instance could be accessed from both ingress and egress (e.g. add to the queue length in ingress, subtract from it in egress), but it is expected that the highest performance PSA implementations will not enable the same extern to be accessed from both ingress and egress. Actually, now that I think on it a bit more, even such a register would not enable you to implement it correctly, because packet drops due to congestion cannot be observed from a PSA P4 program.
  • queue congestion status - Similar to queue occupancy, although if the packet buffer has dynamic maximum queue lengths allowed, that change as different queues grow and shrink, such a value requires more visibility into that dynamic mechanism than PSA defines today.
@jafingerhut
Copy link
Collaborator Author

@jklr @mhira1 Just a note to both of you about this issue, and what features are included in the PSA draft today, vs. which are not.

@jafingerhut jafingerhut changed the title PSA Things not in PSA required to implement things in INT draft PSA Things not in PSA required to fully implement the INT draft Dec 6, 2017
@jklr
Copy link
Contributor

jklr commented Dec 6, 2017

@jafingerhut Thanks for the great summary. These all make sense.
One question: does PSA define the unit of time (hence timestamp)? Exactly where the ingress/egress tstamps are taken in the e2e pipeline would likely differ across different architectures; but the timestamp unit and possibly its format are something the arch or apps WG may discuss together down the road.

@jafingerhut
Copy link
Collaborator Author

PSA defines that the unit is allowed to vary from one PSA implementation to another. It does recommend that the value available in the P4 program advance at least as fast as once per microsecond, that it advance at a constant rate over time, and that it take at least one hour before it wraps around back to 0 (as any finite-sized representation must, eventually).

I had started proposing PSA functions for converting this to other units like microseconds or nanoseconds, but that did not go very far, I didn't push for it very hard, and it is not part of the current PSA draft. One difficulty there is how expensive it can be for fast hardware implementations to do multiplication or division of large fixed-point numbers, which timestamps are. Unless you have a clock that just happens to run at 1.000 GHz, getting nanoseconds requires multiplication or division of some kind. It is possible, just maybe not easy to get people to agree on something that they want in the PSA, at least in v1.0.

@jklr
Copy link
Contributor

jklr commented Dec 7, 2017

I see. Some architecture may have a h/w logic to compensate the frequency difference btw the oscillator and the timestamp clock, some may not, hence the P4 level solution was sought out. It makes sense to start PSA v1.0 from the most common building blocks and assumptions.

@jklr
Copy link
Contributor

jklr commented Aug 16, 2018

@jafingerhut
As we discussed, these queueing system intrinsic metadata available at egress would be great additions for telemetry and other use cases.

  • queue ID
  • queueing latency
  • enqueue-time queue occupancy
  • dequeue-time queue occupancy

They all provide unique information. For example, queueing latency is not necessarily a direct linear function of enqueue-time occupancy, the queue could be blocked by congestion happening at higher priority queues. And the delta between egress and ingress timestamps may include variable delay added by parser(s).

Making these metadata also available at ingress is a different story.
Having them as egress intrinsic metadata seems a meaningful start.

@edugrasa
Copy link

I was wondering if there has been any progress on this issue? Specially regarding the metadata on queue occupancy. Thanks a lot!

@jafingerhut
Copy link
Collaborator Author

There has not been any progress to define capabilities in PSA that would enable these features of the INT specification.

If I understand correctly, INT explicitly allows different switches to use different units for some of these things, e.g. different time units, different units for buffer occupancy (bytes, or '80-byte cells', or '157-byte cells'), with the units conveyed not in every packet, but some control-plane time mechanism. If that is true, then it significantly simplifies the job of specifying capabilities for this, because the units can be allowed to vary across different implementations. (Getting multiple implementers to agree on common units in the fast path for these things is probably still a bit early now).

@jklr
Copy link
Contributor

jklr commented Aug 8, 2019

That's right. The Apps WG started a YANG model for each vendor to be able to specify implementation-specific metadata semantics.
https://github.com/p4lang/p4-applications/blob/master/telemetry/code/models/p4-dtel-metadata-semantics.yang

The model may evolve to specify each field format, in addition to semantics.
https://github.com/p4lang/p4-applications/wiki/Meeting-minutes:-Feb-28,-2019#yang-model-revision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants