PSA How to define behavior of tables whose control plane operations are not atomic? #567

jafingerhut · 2018-02-01T20:13:22Z

At the 2018-Jan-31 PSA work group meeting, discussing this PR: #554

I proposed adding an annotation for tables in P4 source code that would indicate to the compiler that control plane operations for this table need not be atomic relative to table apply operations in the data plane. This could enable more P4 programs to be compiled for some targets (e.g. range match_kind fields in TCAMs, maybe bigger keys, bigger number of bits of action parameters, etc.)

As I was starting to write down how to do that, a couple of issues came to mind. My plan is not to include any of this in the PSA spec v1.0, unless we somehow magically come to agreement on these things by Feb. 10 or so. It is easier to add things to a spec later than to take things away that were added earlier.

For the specific example of a range match_kind implementation in TCAMs, a straightforward way of defining "what is the behavior of control plane updates, if it isn't atomic?" is something like "an individual control plane update might be implemented as a sequence of multiple atomic control plane updates, where apply operations for packets can occur between each of them, and the behavior is restricted to match behavior such that only a subset of packets that should match the new entry will, but some will continue to match the original table state".

If we had an annotation on tables that limited the implementation's behavior to that, would that be enough to cover the desires of most implementations? It has the advantage that a control plane can continue to allow a P4 program to do apply operations, and know that if those 'intermediate state' entries that are in transition cannot be matched, then it is OK for some packets to continue using the table safely, e.g. because there is an always-exact-match ingress port field in the key, and the control plane has temporarily steered traffic for that ingress port to avoid that table while its entries are being modified.

Another possible annotation might mean: from the time a control plane operation begins on this table until it is complete, apply operation behavior on this table is completely undefined. That is much easier to meet for an implementation, but means that the control plane needs to steer all packet processing to avoid doing apply operations on that table until it is safe again. That is a fair amount more painful to use in control software.

Between those two points, I am sure there are other intermediate levels of defined-ness that can be imagined, but I don't think we want to imagine them all. Two or three different annotations for this seems like the most I would want to explain to anyone.

P4 Runtime API consequences - Note the phrase above "until it is safe again". For any of the proposals above, there will be a time after the controller initiates an operation on a table during which behavior will be different than it would be for a guaranteed-atomic-operation table. If the control plane needs to steer some traffic away from that table until the table is back in a good state again, the controller needs a way to know when it is safe again, before it makes updates to other state in the program, e.g. typically earlier tables, such that the data plane can start doing apply operations on that table again.

From controller to data plane, we want to enable the possibility of many update operations in flight from controller to data plane, i.e. we do not want to restrict things to the controller doing stop-and-wait on each individual control plane operation. Performance would be unacceptably low.

One possible way: Have 'write barrier' commands in the stream of controller-to-PSA-device stream of commands, similar to barrier instructions used for cache coherency as used in processors. Suppose table X is the one that steers traffic so that it can do apply operations on table Y, or not. X is guaranteed to have atomic control plane updates, but Y has been annotated so that it does not. The controller would use a sequence of operations like this:

(1) write X to steer traffic away from Y (or maybe only a subset of Y's entries relevant to the update we plan to do on Y)
(2) barrier
(3) do one or more updates to Y that are not atomic
(4) barrier
(5) write X to steer traffic back to Y

The barrier operations do not change anything in the hardware by themselves. They serve only to tell the local agent software for the PSA device: do not reorder table update operations around this barrier.

The barrier numbered (2) would ensure that any writes in step (1) were actually committed into the data plane before proceeding. There could be no "Oh, update (1) is in flight in one command queue in my implementation, and I went ahead with an update for (3) in a different queue, and woops! One of the udpate (3) operations actually physically occurred before (1)."

The barrier numbered (4) has the same meaning. It is there to ensure that all non-atomic updates to Y are complete before data plane traffic can start using them, which can happen as soon as (5) is done.

jafingerhut · 2018-02-01T20:14:14Z

@antoninbas Has there been any discussion in the P4 Runtime API group about the order that control plane operations actually take effect in P4 devices?

jafingerhut · 2018-02-04T22:28:03Z

Another example of an extern where a single control plane update operation might not actually be atomic in existing P4 devices today, and restricting what a PSA device can do in intermediate states might be worth explicitly mentioning: ActionSelectors.

What control plane operations might not be atomic on ActionSelectors, and why?

Some ActionSelector implementations I have seen and heard of have a data plane implementation that is restricted to a power-of-2 number of elements in a group. In order to support arbitrary sized groups from the control plane, individual elements are duplicated in order to have a nearly-equal number of copies of each element.

Thus adding an element, removing an element, or updating the action parameters of an element, are not actually atomic in the sense that every packet is guaranteed to see the state of the ActionSelector before or after the one control plane operation. The control plane operation is typically implemented as a sequence of multiple operations that are atomic from the data plane perspective, where the intermediate states seen by the data plane can have some of these duplicate elements in the old state, and some in the new. The intermediate states do not last for long, but it can be for hundreds to thousands of packet forwarding times.

In practice, for the typical use cases, it isn't a big deal. But it isn't truly atomic as defined in the PSA spec right now, either.

antoninbas · 2018-02-05T19:13:42Z

@jafingerhut my understanding is that when the Write RPC completes, all the operations in the Write batch are guaranteed to have been committed to HW. So there is a natural barrier between Write batches. However, the implementation may re-order operations within a batch when possible (this second statement is not set in stone yet, but I think that's what was agreed upon a while back). I have opened p4lang/PI#300 to confirm.

Therefore if you want to guarantee that operation X happens before Y, put them in 2 different batches and wait until X completes before sending Y. Hopefully that answers the question you directed at me.

Regarding the more general question discussed in this issue, I would really like to force targets to ensure that each basic operation (a single operation in a Write batch) is performed atomically with respect to dataplane traffic. Maybe the target's compiler backend should give a warning when this is not achievable (e.g. if this is how range match is implemented) and there should be an annotation / pragma / compiler flag to silence the warning. This seems to be very similar to what you are suggesting here.

jafingerhut · 2018-02-05T20:12:40Z

@antoninbas In the latest PSA spec, it is currently required that every individual control plane operation, i.e. a single table entry add, delete, or modify, is atomic with respect to data plane table 'apply' operations.

It is required that a compiler give an error if it cannot guarantee this for a particular table (e.g. too large a key, too many bits in an action's parameters, etc.)

I fully expect that there will be some tables and/or other externs in P4_16 where sometimes, on some implementations, someone will want to allow table "foo" to violate this assumption. I fully agree that the control plane needs to know about these things, because it can completely change the way you write control plane code if table "foo" is guaranteed atomic in the data plane, vs. not.

This issue exists precisely because I hope that a future version of the PSA spec will define standard annotations one can use on tables/etc. in PSA that are an explicit signal to the compiler "I am OK if this table's control plane operations are not atomic". The part I don't understand well enough to define this yet (and thus why this issue doesn't have a pull request yet) is "what should this annotation mean precisely?", i.e. should the control plane steer the data plane from ever doing apply calls on a table during times when control plane operations might be occurring on it, or can we make finer-grained guarantees like "the table is guaranteed to be in an intermediate state where a subset of the new table entries are matched correctly, but the rest are still matched by the old criteria before the control plane update began". That last case is one that sounds attractive, because it lets a control plane and data plane still use a table that one is doing some restricted kinds of control plane updates on.

jafingerhut · 2018-02-08T16:00:17Z

Here are some proposals for a few kinds of "non-atomic control plane updates, but restricted in how they are non-atomic" that are motivated by non-atomic updates I have seen in hardware implementations, and are restricted in their "bad intermediate" behavior in a way that those writing controller software still have some guarantees of how bad the intermediate behavior will be.

match behavior is atomic, action behavior temporarily undefined

Example, the search key is small enough that the implementation guarantees that the correct table entry will always be matched in all intermediate states, but the action selected, or the action parameters used, may temporarily be wrong.

Reason why a data plane might have this behavior: the action parameters are so large that the implementation cannot update them all atomically, but only in two or more 'chunks'.

What a controller can do about it: steer traffic away from matching any entries being updated in this way, or if it suits your desired behavior, avoid using modify operations on the table. In some use cases, deleting a table entry, then adding a new entry with the same key but different action is good enough.

(implementation detail: It is often easy for a data plane to make adding and deleting table entries atomic, even if changing the key or action parameters are large, because most will have a single valid bit per entry that can be atomically cleared to remove the entry, or can be set last to make it valid, even if it takes multiple separate atomic writes to fill in the key and/or action parameters.)

match behavior guaranteed to atomically update in 'subset at a time' fashion, action updates are atomic

The 'subset at a time' fashion is like that described for range matches in a TCAM described in the first comment of this issue.

A controller may be written to allow packet processing to continue matching against such table entries while they are being updated in this less-atomic way, but only if the 'subset of packets match the old entry, subset of packets match the new entry' behavior is acceptable for your use case.

The controller need not any traffic away from a table if it would not match the removed/added table entry, because of the 'subset at a time' guarantee of the data plane.

match behavior is undefined, action updates are atomic

This is the 'anything goes' on match behavior during intermediate states. A controller should steer traffic away from doing apply calls on this table completely before making changes to such a table, and not steer traffic back to doing apply calls on it until all such changes are complete in the data plane.

I don't have a particular example in mind of an implementation that might do this, other than perhaps some kind of algorithmic TCAM/range-matching beast that did not use pointer-flipping techniques in its update implementation, leading to the result that no useful guarantees can be made about intermediate matching behavior during updates.

Note that one might have a table that has both of properties #1 and #2, or #1 and #3. #1 is focused on atomicity of actions performed on a match, #2 and #3 are focused on whether the correct table entry will always be matched in intermediate states.

mihaibudiu · 2018-02-08T16:29:08Z

This range if behaviors is so broad that to me it suggests that atomicity should not be part of the PSA definition at all. The definition should say that each target that implements PSA has to specifiy its behavior in detail. This sounds bad, but networking people are used to inconsistencies. Even if individual updates are atomic, usually network invariants will require multiple atomic updates across multiple devices - which no one hopes to achieve in practice.

jafingerhut · 2018-02-08T17:14:19Z

@mbudiu-vmw I believe the common case will be that most tables used in practice will be implementable as fully atomic (as currently defined in the PSA draft) on most PSA implementations.

These proposals are for what I expect will be less commonly used tables, e.g. range matching in TCAMs, actions with huge number of bits in action parameters, and a handful of other cases. Maybe I've spent too long thinking about and dealing with these things in the past, so the list of 3 'relaxations' of atomicity seem everyday to me.

If in practice a program like switch.p4 with 50+ tables had one or two with these annotations on them, would it seem so daunting then?

mihaibudiu · 2018-02-08T17:16:12Z

Why do you think that only 2 annotations would be useful for switch.p4?
In general, how do you identify - as a programmer - tables that should be atomically updated?
How are they different from other tables in your application?

jafingerhut · 2018-02-08T17:21:20Z

The intent is you don't need these annotations for fun. You only use them when an implementation can't do updates on a particular table atomically. In switch.p4, if you used, say, range matching only on the ACL (access control list) match tables (which is the most common place I have seen such a match_kind), then you put the annotation for non-atomic variant #2 on there, and now the control plane knows what it can rely upon for that table. If a few implementations can do it atomically, no problem -- you have already written the control plane and the annotation so it can be non-atomic in style #2, and it works just fine on implementations that do better.

The thing is, without such an annotation on those tables, I fully believe that only a small fraction of implementations could actually implement updates on those tables atomically as stated in the PSA spec. By having the annotation, you acknowledge that fact and enable the program to be way more portable.

If you didn't have the annotation, then it would be writing down the restriction on a napkin somewhere.

mihaibudiu · 2018-02-08T17:23:47Z

The problem is that switch.p4 is supposed to be target-independent.
The annotations you are adding reflect capabilities of your target.

You are in fact saying: my target cannot do range tables atomically.
What should a programmer do if they don't know which target switch.p4 will run on?

jafingerhut · 2018-02-08T17:25:09Z

@mbudiu-vmw I tried to address that in an earlier comment that probably crossed in flight with your most recent one -- put the annotation on there because you expect most implementations need it, and if an implementation does better, then your control plane and program works just fine with them, too. You have prepared for the less-atomic behavior, but the implementation did better at it than you needed it to. No problem.

mihaibudiu · 2018-02-08T17:25:35Z

Also, when you write switch.p4, which ones of the tables do you really want atomically updated?

jafingerhut · 2018-02-08T17:26:53Z

When you write switch.p4, you expect almost all of them to be atomically updated, and most PSA implementations should be able to do most of them atomically.

If you want a set of guidelines of red flags to look out for, at least ones I know about, they are in comments above: range matching in tables that also have ternary matching, and action selectors. There may be one or two more.

mihaibudiu · 2018-02-08T17:30:06Z

My question is: from the point of view of your network application, which tables are you OK with being updated non-atomically? This looks like a very difficult question to answer - you either say "all" to be safe, or "none" because you realize you can't have all. In other words, will switch.p4 break if the range tables are updated non-atomically? What happens if they aren't - are you willing to not run switch.p4 at all?

mihaibudiu · 2018-02-08T17:33:57Z

What I am saying is that you may want some kind of documentation in the PSA definition, and not in switch.p4. "Atomic updates" is a property of the target, and not of the program run on the target. The program that runs on a target may only be correct if some updates are atomic, but that sounds very difficult for programmers to think about.

mihaibudiu · 2018-02-08T17:34:37Z

Clarification: not in the generic PSA definition, but in a specific PSA implementation.

jafingerhut · 2018-02-08T17:42:47Z

It is very common in network devices, including configurable- but not-programmable ones that have been around for decades, to have dependencies between tables, and to want to implement 1700 adds/removes from a later table, in a way that a data packet should either see all of the new 1700 entries, or none of them. There are well known techniques for doing this -- e.g. you put a 'label' or 'pointer' as a result in an earlier table, and make it part of the search key in the later table, and write your control plane software so that the 'label' of the new 1700 entries you are writing one at a time is a new value not in use by any packet in the earlier table. When all 1700 are done, then you atomically update one entry in the earlier table, and the next packet to go through might match on one of those 1700 entries, but no earlier packet could have possibly done so.

This is bread-and-butter stuff for people writing control plane code in such devices. They spend many waking hours fretting over such things. I have, and probably will again. Bugs happen for such reasons if you don't do it correctly, the kinds that are no fun to track down because they affect the flowing traffic for only a few milliseconds.

That is when all of the tables have atomic updates, which in practice is easily achievable for most tables.

We could drop this issue, and simply keep the PSA as is -- every table update must be atomic. What will that achieve in practice? Violations of the spec, where the violations are few and far between, but they will be there (if people use the features I mentioned in their programs). In practice, is that the end of the world? Certainly not. But I would prefer if we could have a way where someone who knows the red flags in advance could say, "Oh, yeah, this range matching this is non-atomic on most PSA implementations. I am going to write my control plane code to handle that possibility, add the annotation on the one or two tables that do that, and now I know it should compile on those targets, too, and if there are any other tables it cannot do atomically, the compiler will give an error. Then I will think about those cases and figure out how to handle them."

Some alternatives are:

Don't have the annotation, and such features become useless on many important high speed targets. They will become unusable, or more likely, people will use them anyway, and do whatever the compiler says it needs to compile those, even though the target implements such-and-such a table update non-atomically. You write it down on a napkin somewhere. You move on. The napkin disappears eventually. The non-atomic behavior stays.

Another possibility is to simply acknowledge the red-flag cases that we know about now and write them into the PSA, by saying something like "since some high speed targets will not be able to implement atomic updates on tables with ternary and range match_kind's, those will always be assumed to have non-atomic behavior in style #2 (from my earlier comment, the 'subset at a time' atomic kind). Since some high speed target will not be able to implement ActionSelector atomic updates, always assume that they might be done in the 'subset at a time' variety as well.

What happens if we come across another that isn't one of those two?

jafingerhut · 2018-02-08T17:44:19Z

Atomicity of control plane operations (or lack of them) is not automatically incorrect. Correctness/incorrectness here is not a property of the P4 program alone. It is a property of the combined control plane + P4 program.

(Note: this observation is true of just about any property of a system that includes P4-programmable device. Correctness is determined by the control plane code combined with the P4 program. Some bugs in a system might be fixable by only changing the P4 program, but most are typically in the control plane alone, and some are in the interaction between them).

mihaibudiu · 2018-02-08T19:13:46Z

Yes, your last statement is the key observation. Whatever annotation that may be, it does not really belong either to P4 or to PSA or the PI, but to the whole ensemble. We don't have any such language mechanisms yet. I am not sure if usual annotations are sufficient.

jafingerhut · 2018-02-08T19:38:47Z

The annotations I am proposing above are properties of the behavior of a PSA implementation (plus its local agent software that helps implement control plane updates) alone, independently of the controller.

What the PSA implementation can do atomically, vs. not, may have an effect on how you write controller software in order for the entire system to avoid the transient bad packet processing behavior.

jafingerhut mentioned this issue Feb 1, 2018

PSA Add section on atomicity of control plane operations #554

Merged

mihaibudiu added the portable switch architecture label Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PSA How to define behavior of tables whose control plane operations are not atomic? #567

PSA How to define behavior of tables whose control plane operations are not atomic? #567

jafingerhut commented Feb 1, 2018

jafingerhut commented Feb 1, 2018

jafingerhut commented Feb 4, 2018 •

edited

Loading

antoninbas commented Feb 5, 2018

jafingerhut commented Feb 5, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018 •

edited

Loading

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

jafingerhut commented Feb 8, 2018 •

edited

Loading

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

PSA How to define behavior of tables whose control plane operations are not atomic? #567

PSA How to define behavior of tables whose control plane operations are not atomic? #567

Comments

jafingerhut commented Feb 1, 2018

jafingerhut commented Feb 1, 2018

jafingerhut commented Feb 4, 2018 • edited Loading

antoninbas commented Feb 5, 2018

jafingerhut commented Feb 5, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018 • edited Loading

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

jafingerhut commented Feb 8, 2018 • edited Loading

mihaibudiu commented Feb 8, 2018

jafingerhut commented Feb 8, 2018

jafingerhut commented Feb 4, 2018 •

edited

Loading

jafingerhut commented Feb 8, 2018 •

edited

Loading

jafingerhut commented Feb 8, 2018 •

edited

Loading