-
Notifications
You must be signed in to change notification settings - Fork 198
Add a formal semver 2.0.0 version type #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-PR371-semver2.0
Are you sure you want to change the base?
Add a formal semver 2.0.0 version type #371
Conversation
First crack at adding a formal version type in response to CVEProject#362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
I recommend you resubmit the PR with a change in both It will be best to target a JSON schema validation instead of programmatically verifying versions when they are specific like this scenario with a clear semver-2.0.0 compliance being tested. Secondly, we should follow/extend the current schema model and extend it to satisfy this need instead of a completely new JSON schema fields like See the current versions.md document which has some examples https://github.com/CVEProject/cve-schema/blob/main/schema/docs/versions.md
The one we don't current have is the So your Example will actually look like
You need to build a JSON schema validator to work with such data, with versionType frozen with enum as |
Thank for the comment and I can update the json in this PR once we get to consensus 👍 With respect to the range fields themselves, after seeing you rewrite my example I think it makes sense to simplify and create new fields so that a parser doesn't need to implement conditional logic based on the combination of fields present. I think this will make for simpler and more maintainable code long term. Maybe more people can chime in on this point. As for the regex it looks like the one you're suggesting is the second of the two provided on semver.org. Albeit with a leading and trailing For documentation's sake here are the two
|
…for the expressions of "everything under X" or "everything over Y"
Had a thought hit me about one sided ranges, so I added two more examples
Which allow someone to express the idea of |
…-02-20. The status conversation will happen another day
@sei-vsarvepalli where does the |
The field |
Gotcha. Then I guess the difference between the two approaches in schema terms is to add a I've written a pretty simple parser in python for my proposal. It assumes perfect data (validated) and that the data is semver-2.0.0, but I think it gets the point across on the simplicity of parsing. Feel free to play around with it as well by changing the specific parameters in the test. I think I covered all the cases and it can probably be simplified further. import json
test_json_string = """
{
"versionType": "semver-2.0.0",
"status": "affected",
"exclusiveLowerBound": "1.2.3-alpha",
"inclusiveUpperBound": "2.3.4+build17"
}
"""
def parse_decoded_json(json):
if json.get("exactly"):
return f'= {json.get("exactly")}'
if json.get("inclusiveLowerBound"):
lower = f'{">= "+json.get("inclusiveLowerBound")}'
elif json.get("exclusiveLowerBound"):
lower = f'{"> "+json.get("exclusiveLowerBound")}'
else:
lower = ""
if json.get("inclusiveUpperBound"):
upper = f'{"<= "+json.get("inclusiveUpperBound")}'
elif json.get("exclusiveUpperBound"):
upper = f'{"< "+json.get("exclusiveUpperBound")}'
else:
upper = ""
return f'{lower}, {upper}'
the_json = json.loads(test_json_string)
print(parse_decoded_json(the_json)) I initially had lower = f'{">= "+json.get("inclusiveLowerBound") if json.get("inclusiveLowerBound") else "> "+ json.get("exclusiveLowerBound")}'
upper = f'{"<= "+json.get("inclusiveUpperBound") if json.get("inclusiveUpperBound") else "< "+ json.get("exclusiveUpperBound")}' However that doesn't handled one sided ranges and I wanted to get some code up before today's qwg meeting. I also haven't had time to make a complete comparison parser, but translating the section if json.get("exactly"):
return f'= {json.get("exactly")}' results in something that needs to look like if json.get("version") and (not json.get("lessThan") or not json.get("greaterThan") or not json.get("lessThanOrEqual")):
return f'= {json.get("version")}' as the code needs to be sure that the parameter |
@sei-vsarvepalli the new properties are in as of commit 62db169, however I'm not sure how to express the valid combinations of parameters for the semver 2.0.0 version type. Do I need to do something like a
Where the first option in the one of is the entire current payload and the other is the semver 2.0.0? Maybe you know a simpler approach? |
If this is valid then still need to ensure version type is set to semver-2.0.0 for these combinations
I let this stew for a bit and I think 046dadd is in the right direction. I think its possible to only allow those parameter combinations when the version type is semver 2.0.0, but not sure how to encode that yet. |
@sei-vsarvepalli Ok, so I'm trying to run the tests locally and it seems I need to rebuild However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file? |
What tests are you running? It looks like the starting point of your repo is Your JSON file is also mangled, the line 323 is missing a comma. When I run test against your branch I get this error
|
Thanks for pointing out the comma. Added that in. I'm trying to run the node validation suite with
Which made me think that the validation is failing to match a case on the versions section and hence looking into |
In hopes that we're moving forward with the RFD process I've gone ahead and added a basic RFD to this PR Given the history and length of this PR I wasn't sure how much to capture in the RFD itself, but I'm happy to keep the conversation going and to add/subtract as people feel is necessary 👍 |
Someone asked in the last QWG meeting what the current status of open issues is for this topic, so I am going to take a crack at summarizing. If I've missed anything, let me know. Issues
|
schema/docs/versions.md
Outdated
|
||
Type identifier: `semver-2.0.0` | ||
Formally specified here at https://semver.org/spec/v2.0.0.html | ||
`semver-2.0.0` is new type introduced to formally specify usage of semantic versioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "new type" → "type"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Removed in c9fde50
schema/docs/versions.md
Outdated
A complete definition of this version type can be viewed here | ||
https://semver.org/spec/v2.0.0.html#backusnaur-form-grammar-for-valid-semver-versions | ||
|
||
In the interest of simplicity the `semver-2.0.0` version type has two parameters which define a continuous range. `lowerBound` and `upperBound` each must be a valid semver triple with optional pre-release/build extensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this hasn't been rewritten from a prior version of the proposal which used inclusiveLowerBound
et al.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to do this as a review (my apologies), but I've left a few comments with fixups or trying to resolve open conversations.
All good on the delay. I get it. Many thanks and I've commented on each nit with fixes as well as doing a minor update to the rfd text 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the open question on whether to include the addition of the greaterThan
and greaterThanOrEqual
bounds. I don't have a strong opinion either way on that topic, so I'm marking as approved from me!
I have a different perspective on the remaining concerns about this proposal. To summarize: it makes CVE Records harder to understand, requires immediate action by some types of consumers, reduces interoperability, is based on a JSON schema document that does not have a valid JSON syntax, and - on the chance that it isn't used much and needed to be rolled back - has rollback costs that were not documented in the proposal. It is possible to add strictly validated support for various types of SemVer strings while avoiding these problems (by building on the alternative proposed in 2023 in the #263 and #264 issues). Efforts by people working on the various SemVer specifications have value for CVE consumers and should be used in the right way. Expanding these points: The current schema has a behavior that perhaps isn't universally understood: any field that holds a version number can alternatively hold a string that isn't a version number (such as '0' or '*'). This behavior has been documented for several years and a large number of parties understand it, and it is used in many CVE Records. This behavior can be kept regardless of how strictly the field is validated when it is, in fact, a version number. (For example, '0' and '*' can be special cases in which the string is not validated against any regular expression.) The proposed schema changes the behavior in two ways:
I believe that this is not a resolved issue. It may be better to preserve the '0' and '*' design for all version types (including semver-2.0.0) until CVE Record Format 6.0, where it would be abandoned in favor of a design that is more widely agreed upon. (This would be better both because it reduces consumer confusion in general, and because it may make 2.0.0 adoption faster: it allows
Similarly, programmatic parsing of greaterThan and greaterThanOrEqual would need to be introduced into products immediately, or else they will not correctly interpret some of the newer CVE Records. Version range information in a previously required format will become optional, even when semver-2.0.0 is not used. Support for The proposed schema introduces ambiguity such as:
(which is accepted by the proposed schema). This is clearly supposed to be a version range, but according to schema/docs/versions.md, it is both true that 4.0 is the beginning of the range and 5.0 is the beginning of the range. Perhaps the intent was that a version range starts with Also, it introduces strange asymmetry between lessThan and lessThanOrEqual:
e.g., lessThan can now define a range with no expressed lower bound, but lessThanOrEqual cannot. https://github.com/CVEProject/cve-schema/blob/7ba977b083cec619cb93b810075a7406c6ce9ef2/schema/CVE_Record_Format.json isn't even a valid JSON document, because of the trailing commas here:
and here:
and here:
and here:
The changes array is only affected by introducing a oneOf with one subschema:
This means that non-semver strings such as "2.5" can occur in the changes array when semver-2.0.0 is used, e.g., this is accepted by the proposed schema:
(also, of course, any oneOf with only one subschema is unnecessary) The CVE consumer survey did not confirm (or ask about) demand for semver-2.0.0. This might make it more likely that producers won't adopt it. Because the proposal includes a rollback plan, accepting the proposal commits the CVE Program to the rollback workload. Rollback has significant administrative costs because the CVE Program does not unilaterally change container data without involvement of the container owners. For example, with previous data changes such as 5.0.0 to 5.1.0, there were multiple communications to container owners instructing them to change their own container data to the 5.1.0 format, individualized help to some, ultimately a deadline, and then the CVE Program forced changes so that the entire database complied with the 5.1.0 schema. In other words, when the QWG accepts a data property that might rarely be used, without a broad set of data producers stating that they plan to use it, the QWG is imposing a future administrative burden that needs to be factored into the cost/benefit calculation. |
Seems like the trailing commas can just be deleted. Thanks for catching them @ElectricNroff. To make the issues clearer, it sounds like: For
|
@darakian, would there be a problem with splitting out the introduction of Separately, on the |
I've gone ahead and address some of the trailing commas (line numbers would help for the others 🙇) as well as the asymmetry in parameter requirements. I believe we already discussed
I believe you meant to use If you think cve services should provide range checking I'd love to work with you on building that out 👍
Oh, good catch. For what its worth it looks to me like versions can already mismatch today too. eg.
So, I asked back here #371 (comment) if a reference implementation would be helpful and it seems like maybe it would be. I do wonder if the cve website could simply display the versions as string though as I believe that's how current versions are handled.
I'm not in love with that, but I could be open to it. I'd like to get broader consensus before entertaining the idea. |
Regarding having CVE Services check version bounds, since it's not possible within the schema constraints: in the Package URL proposal we've recently agreed that CVE Services would be responsible for validating Package URLs, since Package URL parsing is too complex to constrain in a regex inside the schema. I think it's fine that some constraints end up in CVE Services when they can't be done in the schema. |
I had intentionally used
but either one is valid in the proposed schema. This isn't an inherited problem. There are now two properties that have the same meaning in this context ( More importantly, a semver-2.0.0 data producer needs to be aware of:
I believe rule 2 is too ridiculous and we shouldn't ship a schema with that behavior, because the support costs would be too high. Rule 3 had also been harmful to data integrity, because it conflates the concepts of "don't know" with "a version named 0.0.0-0 existed and was vulnerable." |
To find the remaining trailing commas without local tools, one can use websites such as jsonlint.com
It only identifies one of the trailing commas at a time. I don't know how many remain (there's at least one). |
I'll get to the rest of the trailing commas later today. Thanks for the tool 👍
To address these point by point
|
First crack at adding a formal version type in response to #362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic
Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
Another thought is that maybe this should be a retroactive definition of the
semver
type. That would likely be breaking for some of the current records though.The goal here is to have strict validation provided by cve services