Skip to content

Add a formal semver 2.0.0 version type #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 57 commits into
base: feature-PR371-semver2.0
Choose a base branch
from

Conversation

darakian
Copy link

@darakian darakian commented Dec 9, 2024

First crack at adding a formal version type in response to #362 (comment) Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic

Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.

Another thought is that maybe this should be a retroactive definition of the semver type. That would likely be breaking for some of the current records though.

The goal here is to have strict validation provided by cve services

First crack at adding a formal version type in response to
CVEProject#362 (comment)
Any others which are agreed upon should be spun up in their own PRs so that conversations in the PRs can be kept on topic

Happy to expand this if people think the full semver spec should be in this repo as well. I went back and forth on that.
@sei-vsarvepalli
Copy link
Contributor

I recommend you resubmit the PR with a change in both schema/docs/CVE_Record_Format_bundled_adpContainer.json and schema/docs/CVE_Record_Format_bundled_cnaContainer.json focusing on the version field. This PR with change to just example.md will not be useful without a schema based validation, as example.md is only a human friendly markdown.

It will be best to target a JSON schema validation instead of programmatically verifying versions when they are specific like this scenario with a clear semver-2.0.0 compliance being tested.

Secondly, we should follow/extend the current schema model and extend it to satisfy this need instead of a completely new JSON schema fields like exclusiveUpperBound - it is not really as initiative as lessThan

See the current versions.md document which has some examples

https://github.com/CVEProject/cve-schema/blob/main/schema/docs/versions.md

{
  "version": "2.0.0",
  "versionType": "semver",
  "lessThanOrEqual": "2.5.1",
  "status": "affected"
}

The one we don't current have is the exclusiveLowerBound that you mention. However the other examples can be mapped according to the current schema. Potentially we can add as greaterThan boolean field which when present the version field should be treated as ">" instead of ">=" which is the current default "version" field.

So your Example will actually look like

            {
               "versionType": "semver-2.0.0",
               "version": "1.2.3-alpha",
               "lessThan": "2.3.4+build17"
             }
             {
               "versionType": "semver-2.0.0",
               "version": "3.4.5-beta",
               "greaterThan": true,
               "lessThanOrEqual": "4.5.6+assembly88"
             }
             {
               "versionType": "semver-2.0.0",
               "version": "5.6.7-gamma",
             }
             {
               "versionType": "semver-2.0.0",
               "version": "6.7.8-delta",
             }

You need to build a JSON schema validator to work with such data, with versionType frozen with enum as semver-2.0.0 and valid regex to "version", "lessThanOrEqual" and "lessThan" fields require regex validator
/^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$/
Finally provide the additional "greaterThan" boolean field perhaps that will treat version as ">" instead of ">=".

@darakian
Copy link
Author

darakian commented Feb 13, 2025

Thank for the comment and I can update the json in this PR once we get to consensus 👍

With respect to the range fields themselves, after seeing you rewrite my example I think it makes sense to simplify and create new fields so that a parser doesn't need to implement conditional logic based on the combination of fields present. I think this will make for simpler and more maintainable code long term. Maybe more people can chime in on this point.

As for the regex it looks like the one you're suggesting is the second of the two provided on semver.org. Albeit with a leading and trailing /.

For documentation's sake here are the two

One with named groups for those systems that support them (PCRE [Perl Compatible Regular Expressions, i.e. Perl, PHP and R], Python and Go).

^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?P<buildmetadata>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
and

one with numbered capture groups instead (so cg1 = major, cg2 = minor, cg3 = patch, cg4 = prerelease and cg5 = buildmetadata) that is compatible with ECMA Script (JavaScript), PCRE (Perl Compatible Regular Expressions, i.e. Perl, PHP and R), Python and Go.

^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

…for the expressions of "everything under X" or "everything over Y"
@darakian
Copy link
Author

Had a thought hit me about one sided ranges, so I added two more examples

            {
              "versionType": "semver-2.0.0",
              "exclusiveUpperBound": "1.0.0",
            }
            {
              "versionType": "semver-2.0.0",
              "inclusiveLowerBound": "9.0.0",
            }

Which allow someone to express the idea of everything under X or everything over Y. The former of those two is reasonably common.

…-02-20. The status conversation will happen another day
@darakian
Copy link
Author

So your Example will actually look like
...

           {
             "versionType": "semver-2.0.0",
             "version": "3.4.5-beta",
             "greaterThan": true,
             "lessThanOrEqual": "4.5.6+assembly88"
           }

@sei-vsarvepalli where does the greaterThan parameter come from? I'm prepping my comparison of the two representations for thursday's QWG and I can't find a reference to this parameter in the docs. Searching the repo for the string brings back only this PR
https://github.com/search?q=repo%3ACVEProject%2Fcve-schema%20greaterThan&type=code
Am I missing something?

@sei-vsarvepalli
Copy link
Contributor

So your Example will actually look like
...

           {
             "versionType": "semver-2.0.0",
             "version": "3.4.5-beta",
             "greaterThan": true,
             "lessThanOrEqual": "4.5.6+assembly88"
           }

@sei-vsarvepalli where does the greaterThan parameter come from? I'm prepping my comparison of the two representations for thursday's QWG and I can't find a reference to this parameter in the docs. Searching the repo for the string brings back only this PR https://github.com/search?q=repo%3ACVEProject%2Fcve-schema%20greaterThan&type=code Am I missing something?

The field greaterThan does not exit today. It could be an option if you want to maintain the other fields as-is and then add something without having to recreate a new field. Appending a not-required field greaterThan is a non-breaking change, allowing other versionType fields to adopt something similar as we move towards enforcing stricter schema checks.

@darakian
Copy link
Author

darakian commented Feb 27, 2025

Gotcha. Then I guess the difference between the two approaches in schema terms is to add a greaterThan parameter vs adding the inclusiveLowerBound, exclusiveLowerBound, inclusiveUpperBound, exclusiveUpperBound, and exactly parameters.

I've written a pretty simple parser in python for my proposal. It assumes perfect data (validated) and that the data is semver-2.0.0, but I think it gets the point across on the simplicity of parsing. Feel free to play around with it as well by changing the specific parameters in the test. I think I covered all the cases and it can probably be simplified further.

import json

test_json_string = """
{
    "versionType": "semver-2.0.0", 
    "status": "affected", 
    "exclusiveLowerBound": "1.2.3-alpha",
    "inclusiveUpperBound": "2.3.4+build17"
    }
"""

def parse_decoded_json(json):
	if json.get("exactly"):
		return f'= {json.get("exactly")}'

	if json.get("inclusiveLowerBound"):
		lower = f'{">= "+json.get("inclusiveLowerBound")}'
	elif json.get("exclusiveLowerBound"):
		lower = f'{"> "+json.get("exclusiveLowerBound")}'
	else:
		lower = ""

	if json.get("inclusiveUpperBound"):
		upper = f'{"<= "+json.get("inclusiveUpperBound")}'
	elif json.get("exclusiveUpperBound"):
		upper = f'{"< "+json.get("exclusiveUpperBound")}'
	else:
		upper = ""

	return f'{lower}, {upper}'

the_json = json.loads(test_json_string)
print(parse_decoded_json(the_json))

I initially had

lower = f'{">= "+json.get("inclusiveLowerBound") if json.get("inclusiveLowerBound") else "> "+ json.get("exclusiveLowerBound")}'
upper = f'{"<= "+json.get("inclusiveUpperBound") if json.get("inclusiveUpperBound") else "< "+ json.get("exclusiveUpperBound")}'

However that doesn't handled one sided ranges and I wanted to get some code up before today's qwg meeting. I also haven't had time to make a complete comparison parser, but translating the section

if json.get("exactly"):
	return f'= {json.get("exactly")}'

results in something that needs to look like

if json.get("version") and (not json.get("lessThan") or not json.get("greaterThan") or not json.get("lessThanOrEqual")):
		return f'= {json.get("version")}'

as the code needs to be sure that the parameter version stands alone. Having a new parameter with a single function simplifies that logic.

@darakian darakian changed the base branch from main to feature-PR371-semver2.0 February 28, 2025 17:17
@darakian
Copy link
Author

darakian commented Mar 5, 2025

@sei-vsarvepalli the new properties are in as of commit 62db169, however I'm not sure how to express the valid combinations of parameters for the semver 2.0.0 version type. Do I need to do something like a oneOf for the versions block itself? eg.

"versions": {
                    "oneOf": [
                    "type": "array",
                    "description": "Set of product versions or version ranges related to the vulnerability. The versions satisfy the CNA Rules [8.1.2 requirement](https://cve.mitre.org/cve/cna/rules.html#section_8-1_cve_entry_information_requirements). Versions or defaultStatus may be omitted, but not both.",
                    "minItems": 1,
                    "uniqueItems": true,
                    "items": {
                        "type": "object",
                        ...

Where the first option in the one of is the entire current payload and the other is the semver 2.0.0? Maybe you know a simpler approach?

If this is valid then still need to ensure version type is set to semver-2.0.0 for these combinations
@darakian
Copy link
Author

darakian commented Mar 6, 2025

I let this stew for a bit and I think 046dadd is in the right direction. I think its possible to only allow those parameter combinations when the version type is semver 2.0.0, but not sure how to encode that yet.

@darakian
Copy link
Author

darakian commented Mar 12, 2025

@sei-vsarvepalli Ok, so I'm trying to run the tests locally and it seems I need to rebuild dist/cve5validator.js. When attempting to do so though I get Error: Cannot find module '../../docs/CVE_JSON_bundled.json'. It looks like that file got renamed here
a3babe8

However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file?

@sei-vsarvepalli
Copy link
Contributor

sei-vsarvepalli commented Mar 12, 2025

@sei-vsarvepalli Ok, som I'm trying to run the tests locally and it seems I need to rebuild dist/cve5validator.js. When attempting to do so though I get Error: Cannot find module '../../docs/CVE_JSON_bundled.json'. It looks like that file got renamed here a3babe8

However that file doesn't seem to reference the CVE schema file that I've been making edits to, so I'm a little confused how this all works for local testing. Am I missing something basic here? Am I editing the wrong file?

What tests are you running? It looks like the starting point of your repo is main which has diverged quite a bit too. Perhaps start with either the develop branch or feature-144-SSVC seem more that target for 5.2.0 where this semver update is expected to be bundled.

Your JSON file is also mangled, the line 323 is missing a comma. When I run test against your branch I get this error

ParserError: Error parsing ./cve-schema/schema/cve-schema.json: missed comma between flow collection entries (324:29)

 321 |                                     {"required": ["exclusiv ...
 322 |                                 ]
 323 |                             }
 324 |                             {
-----------------------------------^

@darakian
Copy link
Author

Thanks for pointing out the comma. Added that in.

I'm trying to run the node validation suite with node validate.js ../tests/valid/semver2-0-0.json. The test is running fwiw. I'm getting the following

jon~/g/!/c/s/s/Node_Validator:add-semver-2.0.0-versionType❯❯❯ node validate.js ../tests/valid/semver2-0-0.json
../tests/valid/semver2-0-0.json is invalid:
[
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/0/maxProperties',
    keyword: 'maxProperties',
    params: { limit: 2 },
    message: 'must NOT have more than 2 properties'
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/1/required',
    keyword: 'required',
    params: { missingProperty: 'version' },
    message: "must have required property 'version'"
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/2/required',
    keyword: 'required',
    params: { missingProperty: 'version' },
    message: "must have required property 'version'"
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf/3/required',
    keyword: 'required',
    params: { missingProperty: 'version' },
    message: "must have required property 'version'"
  },
  {
    instancePath: '/containers/cna/affected/0/versions/0',
    schemaPath: '#/properties/versions/items/oneOf',
    keyword: 'oneOf',
    params: { passingSchemas: null },
    message: 'must match exactly one schema in oneOf'
  },
  {
    instancePath: '/cveMetadata/state',
    schemaPath: '#/properties/state/enum',
    keyword: 'enum',
    params: { allowedValues: [Array] },
    message: 'must be equal to one of the allowed values'
  },
  {
    instancePath: '',
    schemaPath: '#/oneOf',
    keyword: 'oneOf',
    params: { passingSchemas: null },
    message: 'must match exactly one schema in oneOf'
  }
]
Summary: Validation FAILED for 1 out of 1 files!

Which made me think that the validation is failing to match a case on the versions section and hence looking into build.js. I could rebase this branch but it doesn't feel like that's an issue here.

@darakian
Copy link
Author

In hopes that we're moving forward with the RFD process I've gone ahead and added a basic RFD to this PR
Rendered RFD here

Given the history and length of this PR I wasn't sure how much to capture in the RFD itself, but I'm happy to keep the conversation going and to add/subtract as people feel is necessary 👍

@alilleybrinker
Copy link

alilleybrinker commented Jul 16, 2025

Someone asked in the last QWG meeting what the current status of open issues is for this topic, so I am going to take a crack at summarizing. If I've missed anything, let me know.


Issues

  • Should the new version type be called "semver-2.0.0"?
    • Resolved: Yes it should. While it might be reasonable to call it "semver-2.0", failing to fully enumerate the segments would make the version associated with the SemVer version type in the CVE Record Format not actually a valid SemVer version, as SemVer requires all three segments are filled out.
  • Should the "semver-2.0.0" type include support for * as a segment value?
    • Resolved: No, it shouldn't. This would make the values permitted with the "semver-2.0.0" version type not be valid SemVer values per the SemVer spec.
  • Should the proposal include the addition of greaterThan and greaterThanOrEqual fields in the versions array object?


Type identifier: `semver-2.0.0`
Formally specified here at https://semver.org/spec/v2.0.0.html
`semver-2.0.0` is new type introduced to formally specify usage of semantic versioning.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "new type" → "type"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Removed in c9fde50

A complete definition of this version type can be viewed here
https://semver.org/spec/v2.0.0.html#backusnaur-form-grammar-for-valid-semver-versions

In the interest of simplicity the `semver-2.0.0` version type has two parameters which define a continuous range. `lowerBound` and `upperBound` each must be a valid semver triple with optional pre-release/build extensions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this hasn't been rewritten from a prior version of the proposal which used inclusiveLowerBound et al.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in
bc077f5
and
64774b5

Copy link

@alilleybrinker alilleybrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to do this as a review (my apologies), but I've left a few comments with fixups or trying to resolve open conversations.

@darakian
Copy link
Author

All good on the delay. I get it. Many thanks and I've commented on each nit with fixes as well as doing a minor update to the rfd text 👍

Copy link

@alilleybrinker alilleybrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the open question on whether to include the addition of the greaterThan and greaterThanOrEqual bounds. I don't have a strong opinion either way on that topic, so I'm marking as approved from me!

@ElectricNroff
Copy link

Someone asked in the last QWG meeting what the current status of open issues is for this topic, so I am going to take a crack at summarizing.

I have a different perspective on the remaining concerns about this proposal.

To summarize: it makes CVE Records harder to understand, requires immediate action by some types of consumers, reduces interoperability, is based on a JSON schema document that does not have a valid JSON syntax, and - on the chance that it isn't used much and needed to be rolled back - has rollback costs that were not documented in the proposal. It is possible to add strictly validated support for various types of SemVer strings while avoiding these problems (by building on the alternative proposed in 2023 in the #263 and #264 issues). Efforts by people working on the various SemVer specifications have value for CVE consumers and should be used in the right way.

Expanding these points:

The current schema has a behavior that perhaps isn't universally understood: any field that holds a version number can alternatively hold a string that isn't a version number (such as '0' or '*'). This behavior has been documented for several years and a large number of parties understand it, and it is used in many CVE Records. This behavior can be kept regardless of how strictly the field is validated when it is, in fact, a version number. (For example, '0' and '*' can be special cases in which the string is not validated against any regular expression.) The proposed schema changes the behavior in two ways:

  • '0' and '*' are forbidden in semver-2.0.0 but not forbidden in semver
  • as a replacement for this '0' and '*' design (which, admittedly, quite a few people dislike), there is a replacement design that is partially implemented (see the mention of lessThan and lessThanOrEqual asymmetry below)

I believe that this is not a resolved issue. It may be better to preserve the '0' and '*' design for all version types (including semver-2.0.0) until CVE Record Format 6.0, where it would be abandoned in favor of a design that is more widely agreed upon. (This would be better both because it reduces consumer confusion in general, and because it may make 2.0.0 adoption faster: it allows semver speakers to republish CVE Records declaring their version numbers as semver-2.0.0 compliant, without any changes to how they have expressed version ranges in previously published CVE Records.)

This proposal has been designed to be very low impact. In the base case both record producers and record consumers can simply ignore the new data type. Adoption of the new data type into systems that process CVE records should be quite straight forward as semantic versioning is well supported across many languages. Once records begin to be produced with `semver-2.0.0` values a record consumer will be able to build reliable vulnerability managment automation based on the data.
says "In the base case both record producers and record consumers can simply ignore the new data type." There are important record consumer use cases that cannot ignore it. One example is the cve.org website. Its objective is to present all version information in a human-readable form. It renders lessThan with "before" and lessThanOrEqual with "through" but does not recognize greaterThan or greaterThanOrEqual. (Also, it recognizes additional syntax such as '*' to mean that this is not a version number, but instead is expressing the separate fact that there is no upper bound.) And, of course, data producers will be free to publish records that only use greaterThanOrEqual, and do not have parallel information in the old syntax. A wide variety of similar resources (web-based products and standalone products) may be similarly affected, including commercial products that cannot be updated on demand when the CVE Record Format changes, but instead can only be updated on a pre-defined release schedule (which would be achievable if the CVE Program had a public roadmap). This is another reason to keep the '0' and '*' design for all version types, until a roadmap with significant advance notice can be published.

Similarly, programmatic parsing of greaterThan and greaterThanOrEqual would need to be introduced into products immediately, or else they will not correctly interpret some of the newer CVE Records. Version range information in a previously required format will become optional, even when semver-2.0.0 is not used.

Support for greaterThan does more harm than good. Unlike lessThan, lessThanOrEqual, and greaterThanOrEqual, there is a broader set of cases where greaterThan information becomes wrong as new versions are introduced. Also, greaterThan cannot be programmatically converted to the OSV concept of introduced (the concept of introduced is highly valued by many non-OSV users as well). Admittedly, there could, in theory, be cases where greaterThan is the only known attribute of what is affected. Instances of this seem negligible, however. (I looked at many of the NVD instances of "versionStartExcluding" and every one I saw was a data-entry error, where typically versionStartIncluding was the intended property.)

The proposed schema introduces ambiguity such as:

         "versions": [
            {
              "version": "4.0",
              "status": "affected",
              "greaterThanOrEqual": "5.0",
              "versionType": "semver"
            }

(which is accepted by the proposed schema). This is clearly supposed to be a version range, but according to schema/docs/versions.md, it is both true that 4.0 is the beginning of the range and 5.0 is the beginning of the range. Perhaps the intent was that a version range starts with version if there is an upper bound, but starts with greaterThanOrEqual if there is no upper bound; however, this is not implemented. Even if were implemented, it is unclear why consumers would benefit by having two different terms that mean the start of a version range.

Also, it introduces strange asymmetry between lessThan and lessThanOrEqual:

     {
         "required": ["status", "versionType", "lessThan"]
     },
     {
         "required": ["version", "status", "versionType", "lessThanOrEqual"]
     }

e.g., lessThan can now define a range with no expressed lower bound, but lessThanOrEqual cannot.

https://github.com/CVEProject/cve-schema/blob/7ba977b083cec619cb93b810075a7406c6ce9ef2/schema/CVE_Record_Format.json isn't even a valid JSON document, because of the trailing commas here:

  {
    "required": ["status", "versionType", "greaterThanOrEqual"]
  },
],

and here:

  "greaterThanOrEqual": { "$ref": "#/definitions/semver-2.0.0-version" },
}

and here:

    {"$ref": "#/definitions/version"},
]

and here:

          },
       },
       "additionalProperties": false

The changes array is only affected by introducing a oneOf with one subschema:

   oneOf": [
      {"$ref": "#/definitions/version"}
   ]

This means that non-semver strings such as "2.5" can occur in the changes array when semver-2.0.0 is used, e.g., this is accepted by the proposed schema:

"versions": [
  {
    "version": "1.0.0",
    "status": "affected",
    "lessThan": "3.0.4",
    "versionType": "semver-2.0.0",
    "changes": [{"at": "2.5", "status": "unknown"}]
  }
]

(also, of course, any oneOf with only one subschema is unnecessary)

The CVE consumer survey did not confirm (or ask about) demand for semver-2.0.0. This might make it more likely that producers won't adopt it. Because the proposal includes a rollback plan, accepting the proposal commits the CVE Program to the rollback workload. Rollback has significant administrative costs because the CVE Program does not unilaterally change container data without involvement of the container owners. For example, with previous data changes such as 5.0.0 to 5.1.0, there were multiple communications to container owners instructing them to change their own container data to the 5.1.0 format, individualized help to some, ultimately a deadline, and then the CVE Program forced changes so that the entire database complied with the 5.1.0 schema. In other words, when the QWG accepts a data property that might rarely be used, without a broad set of data producers stating that they plan to use it, the QWG is imposing a future administrative burden that needs to be factored into the cost/benefit calculation.

@alilleybrinker
Copy link

Seems like the trailing commas can just be deleted. Thanks for catching them @ElectricNroff.

To make the issues clearer, it sounds like:

For greaterThan and greaterThanOrEqual

  • You strongly object to the introduction of greaterThan and greaterThanOrEqual.
    • If greaterThan and greaterThanOrEqual are introduced, it seems there are some cases missed in current constraints which permit incoherent bounds, which should be fixed.
  • You are concerned about CVE consumers needing to update their version-bound logic to handle the new cases introduced by these fields.
    • You specifically raise the need to give substantial notice if these fields are added.

For the semver-2.0.0 version type

  • You strongly want to preserve support for * and 0 in all version types, regardless of whether the specifications on which those version types are based permit those values.
  • You are concerned that CVE consumers will need to update their version-handling logic to handle version types specified in the new version type.

@alilleybrinker
Copy link

@darakian, would there be a problem with splitting out the introduction of greaterThan and greaterThanOrEqual into a separate proposal, leaving this one to only add the new semver-2.0.0 version type?

Separately, on the semver-2.0.0 issues raised by @ElectricNroff: I strongly oppose supporting * or 0 as valid values in version fields when the semver-2.0.0 type is used. While these may be valid values in existing version types in the CVE Record Format, they are not valid values in the SemVer specification. The central goal of this new type is to exactly implement the SemVer specification, so permitting spec-invalid values violates the core premise of the new type. If CNAs do not want to give up their use of * and 0 in version bounds, they can continue to use the existing type.

@darakian
Copy link
Author

darakian commented Jul 17, 2025

I've gone ahead and address some of the trailing commas (line numbers would help for the others 🙇) as well as the asymmetry in parameter requirements. I believe we already discussed * and 0 back here.
#371 (comment)
My position has not changed and andy sums it up well; the goal is to be semver compliant and special cases break that.

The proposed schema introduces ambiguity such as:

         "versions": [
            {
              "version": "4.0",
              "status": "affected",
              "greaterThanOrEqual": "5.0",
              "versionType": "semver"
            }

I believe you meant to use semver-2.0.0 as the type there, but either way it's also possible to input invalid data with the current version types. Change greaterThanOrEqual: 5.0 to lessThan: 0.1 and you have the same problem. You can't guard against that with schema validation so, we would need cve services to enforce that if its desired. I know we talked about this in some of the QWG meetings, but it is also touched on a bit here
#371 (comment)
So, this is not ambiguity which is introduced, but rather inherited.

If you think cve services should provide range checking I'd love to work with you on building that out 👍

This means that non-semver strings such as "2.5" can occur in the changes array when semver-2.0.0 is used, e.g., this is accepted by the proposed schema:

Oh, good catch. For what its worth it looks to me like versions can already mismatch today too. eg. semver could be used as the type for affected with Custom in the changes (or vica versa). I can certainly make a semver-2.0.0 specific fix but maybe the changes array should be made more strict generally.

"In the base case both record producers and record consumers can simply ignore the new data type." There are important record consumer use cases that cannot ignore it. One example is the cve.org website. Its objective is to present all version information in a human-readable form.

So, I asked back here #371 (comment) if a reference implementation would be helpful and it seems like maybe it would be. I do wonder if the cve website could simply display the versions as string though as I believe that's how current versions are handled.


@alilleybrinker

@darakian, would there be a problem with splitting out the introduction of greaterThan and greaterThanOrEqual into a separate proposal, leaving this one to only add the new semver-2.0.0 version type?

I'm not in love with that, but I could be open to it. I'd like to get broader consensus before entertaining the idea.

@alilleybrinker
Copy link

Regarding having CVE Services check version bounds, since it's not possible within the schema constraints: in the Package URL proposal we've recently agreed that CVE Services would be responsible for validating Package URLs, since Package URL parsing is too complex to constrain in a regex inside the schema.

I think it's fine that some constraints end up in CVE Services when they can't be done in the schema.

@ElectricNroff
Copy link

ElectricNroff commented Jul 17, 2025

I believe you meant to use semver-2.0.0 as the type there, but either way it's also possible to input invalid data with the current version types. Change greaterThanOrEqual: 5.0 to lessThan: 0.1 and you have the same problem.

I had intentionally used semver (not semver-2.0.0) when writing:

         "versions": [
            {
              "version": "4.0",
              "status": "affected",
              "greaterThanOrEqual": "5.0",
              "versionType": "semver"
            }

but either one is valid in the proposed schema. This isn't an inherited problem. There are now two properties that have the same meaning in this context (version and greaterThanOrEqual) and they can have different values. I agree that it's slightly similar to other data-inconsistency problems that are inherited.

More importantly, a semver-2.0.0 data producer needs to be aware of:

  1. if there is no fixed version, then you should use greaterThanOrEqual with the value of the earliest affected version
  2. if there is a fixed version, then you should specify both the fixed version and the earliest affected version in the same array element. The fixed version goes in the lessThan field. The earliest affected version is also entered, just like before, except that you need to spell greaterThanOrEqual differently. You need to spell it version or else it won't work.
  3. if you know the last affected version (e.g., "lessThanOrEqual": "1.2") but don't know the earliest affected version, and want to capture the largest possible range, then you need to enter 0.0.0-0 because that is ordered before all other versions [THIS IS ALREADY ADDRESSED IN TODAY'S COMMITS]

I believe rule 2 is too ridiculous and we shouldn't ship a schema with that behavior, because the support costs would be too high.

Rule 3 had also been harmful to data integrity, because it conflates the concepts of "don't know" with "a version named 0.0.0-0 existed and was vulnerable."

@ElectricNroff
Copy link

Seems like the trailing commas can just be deleted.

To find the remaining trailing commas without local tools, one can use websites such as jsonlint.com

Invalid JSON!
Error: Parse error on line 386:
...                    ]                  
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', got ']'

It only identifies one of the trailing commas at a time. I don't know how many remain (there's at least one).

@darakian
Copy link
Author

darakian commented Jul 18, 2025

I'll get to the rest of the trailing commas later today. Thanks for the tool 👍

More importantly, a semver-2.0.0 data producer needs to be aware of:

1. if there is no fixed version, then you should use `greaterThanOrEqual` with the value of the earliest affected version

2. if there is a fixed version, then you should specify both the fixed version and the earliest affected version in the same array element. The fixed version goes in the `lessThan` field. The earliest affected version is also entered, just like before, except that you need to spell `greaterThanOrEqual` differently. You need to spell it `version` or else it won't work.

3. if you know the last affected version (e.g., `"lessThanOrEqual": "1.2"`) but don't know the earliest affected version, and want to capture the largest possible range, then you need to enter `0.0.0-0` because that is ordered before all other versions [THIS IS ALREADY ADDRESSED IN TODAY'S COMMITS]

To address these point by point

  1. The schema cannot know of the existence of a fixed version for a piece of software and so cannot enforce behavior dependent on the existence of a fixed version. We can at most document best practice which I'm fine with doing.

  1. If I'm consuming a record to see if it applies to me my task is to do an intersection between the set of versions I use and the set of versions labeled as vulnerable/broken/bad. Intersecting my set 1.2.3 && 1.7.19, && 0.7.3 && 19.1.1 with >= 0.0.0, < 13.3.7 and with < 13.3.7 give the same results. This is an aesthetic difference. If you want to say that we don't allow for ranges which are unbounded from below then I'm ok with that. I believe we discussed this in a QWG meeting and chris commented on using 0.0.0 as a lower bound back here
    Add a formal semver 2.0.0 version type #371 (comment)
    This might be a case where enforcement/normalization can be done via cve services.

  1. I agree that the current construction is a bit awkward. How would you feel about a construction where lower bounds always use lessThan/lessThanOrEqual and upper bounds always use greaterThan/greaterThanOrEqual?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants