Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions docs/source/apis/api_leveling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Llama Stack API Stability Leveling

In order to provide a stable experience in Llama Stack, the various APIs need different stability levels indicating the level of support, backwards compatability, and overall production readiness.

## Different Levels

### v1alpha

- Little to no expectation of support between versions
- Breaking changes are permitted
- Datatypes and parameters can break
- Routes can be added and removed

#### Graduation Criteria

- an API can graduate from `v1alpha` to `v1beta` if the team has identified the extent of the non-optional routes and the shape of their parameters/return types for the API eg. `/v1/openai/chat/completions`. Optional types can change.
- CRUD must stay stable once in `v1beta`. This is a commitment to backward compatibility, guaranteeing that most code you write against the v1beta version will not break during future updates. We may make additive changes (like adding a new, optional field to a response), but we will not make breaking changes (like renaming an existing "modelName" field to "name", changing an ID's data type from an integer to a string, or altering an endpoint URL).
- for OpenAI APIs, a comparison to the OpenAI spec for the specific API can be done to ensure completeness.

### v1beta

- API routes remain consistent between versions
- Parameters and return types are not ensured between versions
- API, besides minor fixes and adjustments, should be _almost_ v1. Changes should not be drastic.

#### Graduation Criteria

- an API can graduate from `v1beta` to `v1` if the API surface and datatypes are complete as identified by the team. The parameters and return types that are mandatory for each route are stable. All aspects of graduating from `v1alpha1` to `v1beta` apply as well.
- Optional parameters, routes, or parts of the return type can be added after graduating to `v1`

### v1 (stable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is v1, how do things become v2? is there a v2alpha process also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v2 is something I think I'd avoid, especially given the openai routes. given the criteria here is rather loose (v1 apis can make breaking changes between Y-streams), I don't see a reason to graduate to v2 unless openai does.

I personally have seen most APIs who use this stability schema stick to v1 unless a major refactor occurs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section for this outlining how a migration to v2 would go


- Considered stable
- Backwards compatible between Z-streams
- Y-stream breaking changes must go through the proper approval and announcement process.
- Datatypes for a route and its return types cannot change between Z-streams
- Y-stream datatype changes should be sparing, unless the changes are additional net-new parameters
- Must have proper conformance testing as outlined in https://github.com/llamastack/llama-stack/issues/3237

### v2+ (Major Versions)

Introducing a new major version like `/v2` is a significant and disruptive event that should be treated as a last resort. It is reserved for essential changes to a stable `/v1` API that are fundamentally backward-incompatible and cannot be implemented through additive, non-breaking changes or breaking changes across X/Y-Stream releases (x.y.z).

If a `/v2` version is deemed absolutely necessary, it must adhere to the following protocol to ensure a sane and predictable transition for users:

#### Lifecycle Progression

A new major version must follow the same stability lifecycle as `/v1`. It will be introduced as `/v2alpha`, mature to `/v2beta`, and finally become stable as `/v2`.

#### Coexistence:

The new `/v2` API must be introduced alongside the existing `/v1` API and run in parallel. It must not replace the `/v1` API immediately.

#### Deprecation Policy:

When a `/v2` API is introduced, a clear and generous deprecation policy for the `/v1` API must be published simultaneously. This policy must outline the timeline for the eventual removal of the `/v1` API, giving users ample time to migrate.

### API Stability vs. Provider Stability

The leveling introduced in this document relates to the stability of the API and not specifically the providers within the API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can perhaps add tables which show which providers are considered stable vs. not. at various points we have wanted to do this but never actually managed to run tests and do the maintenance given the surface area. anyhow just a drive-by comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, but this should likely be a separate document about provider stability guarantees.


Providers can iterate as much as they want on functionality as long as they work within the bounds of an API. If they need to change the API, then the API should not be `/v1`, or those breaking changes can only happen on a y-stream release basis.

### Approval and Announcement Process for Breaking Changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably also include something like this to define a protocol for when there is a breaking change - #3260. A PR that is titled a specific way will not fail the oasdiff check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 it'll make it easier for calling out in the release notes and any sorts of additional announcements (e.g., in discord, email, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so by this you mean: I should add a bullet here describing how the PR title and commit message should include an indicator of a breaking change? I can add that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a section here, luckily conventional commits outlines how to handle this: https://www.conventionalcommits.org/en/v1.0.0/#specification


- **PR Labeling**: Any pull request that introduces a breaking API change must be clearly labeled with `breaking-change`.
- **PR Title/Commit**: Any pull request that introduces a breaking API change must contain `BREAKING CHANGE` in the title and commit footer. Alternatively, the commit can include `!`, eg. `feat(api)!: title goes here` This is outlined in the [conventional commits documentation](https://www.conventionalcommits.org/en/v1.0.0/#specification)
- **Maintainer Review**: At least one maintainer must explicitly acknowledge the breaking change during review by applying the `breaking-change` label. An approval must come with this label or the acknowledgement this label has already been applied.
- **Announcement**: Breaking changes require inclusion in release notes and, if applicable, a separate communication (e.g., Discord, Github Issues, or GitHub Discussions) prior to release.

If a PR has proper approvals, labels, and commit/title hygiene, the failing API conformance tests will be bypassed.


## Enforcement

### Migration of API routes under `/v1alpha`, `/v1beta`, and `/v1`

Instead of placing every API under `/v1`, any API that is not fully stable or complete should go under `/v1alpha` or `/v1beta`. For example, at the time of this writing, `post_training` belongs here, as well as any OpenAI-compatible API whose surface does not exactly match the upstream OpenAI API it mimics.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we place any OpenAI-compatible APIs anywhere other than /v1, that means that clients in the wild won't be able to find that API to use it, right? An OpenAI client or the various frameworks in the wild will expect all of the OpenAI APIs to live at the same place, behind a /v1 URL, and not be in different /v1alpha or similar URLs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 100% agree, and if we merge this and then find that openai APIs qualify as not v1, I think we should re-evalute this proposal.

I am really just concerned with placing APIs under their proper levels to convey to users our level of confidence in the API. all sorts of new routes, parameters, etc, can be added to a v1 API, just following the proper processes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the OpenAI stuff. Do you consider the extensions of the OpenAI API we add to be kosher or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think openAI apis are all v1 by default for usability concerns, and the net-new additions we make to them are OK as long as clients can upgrade to use them in a non-breaking manner. This is the grey area between concerns about API structure and API implementation (providers).

Generally speaking, any new routes and their implementations (providers) are ok for these APIs as long as between Z-stream releases, users aren't broken.

I think we could write a whole separate doc just about this "implementation" concern, but this doc is primarily about enforcing API structure norms.


This migration is crucial as we get Llama Stack in the hands of users who intend to productize various APIs. A clear view of what is stable and what is actively being developed will enable users to pick and choose various APIs to build their products on.

This migration will be a breaking change for any API moving out of `/v1`. Ideally, this should happen before 0.3.0 and especially 1.0.0.

### `x-stability` tags in the OpenAPI spec for oasdiff

`x-stability` tags allow tools like oasdiff to enforce different rules for different stability levels; these tags should match the routes: [oasdiff stability](https://github.com/oasdiff/oasdiff/blob/main/docs/STABILITY.md)

### Testing

The testing of each stable API is already outlined in [issue #3237](https://github.com/llamastack/llama-stack/issues/3237) and is being worked on. These sorts of conformance tests should apply primarily to `/v1` APIs only, with `/v1alpha` and `/v1beta` having any tests the maintainers see fit as well as basic testing to ensure the routing works properly.

### New APIs going forward

Any subsequently introduced APIs should be introduced as `/v1alpha`