Skip to content

Validate OpenAPI schema references #2459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

martincostello
Copy link
Contributor

@martincostello martincostello commented Aug 8, 2025

Add validation rule for OpenAPI document schema references.

Resolves #2453.


Initial draft for now based on testing this approach with dotnet/aspnetcore#63095. Needs tests, plus rebasing after #2460 is merged.

@martincostello martincostello force-pushed the gh-2453-validate-schema-references branch from c461489 to 45e3cac Compare August 15, 2025 12:34
@martincostello martincostello marked this pull request as ready for review August 15, 2025 12:36
@Copilot Copilot AI review requested due to automatic review settings August 15, 2025 12:36
@martincostello martincostello requested a review from a team as a code owner August 15, 2025 12:36
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds validation for OpenAPI schema references to ensure they point to existing schemas in the document. The validation rule detects both invalid references and circular reference patterns.

  • Adds a new validation rule OpenApiDocumentReferencesAreValid that checks schema reference validity
  • Implements detection of circular references with appropriate error messaging
  • Updates test counts to reflect the addition of the new validation rule

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Microsoft.OpenApi/Validations/Rules/OpenApiDocumentRules.cs Implements the main validation logic with schema reference visitor
src/Microsoft.OpenApi/Properties/SRResource.resx Adds error message for invalid schema references
test/Microsoft.OpenApi.Tests/Validations/OpenApiDocumentValidationTests.cs Comprehensive test coverage for valid, invalid, and circular references
test/Microsoft.OpenApi.Tests/Validations/ValidationRuleSetTests.cs Updates expected rule count from 19 to 20
test/Microsoft.OpenApi.Tests/PublicApi/PublicApi.approved.txt Adds new public API for the validation rule
Files not reviewed (1)
  • src/Microsoft.OpenApi/Properties/SRResource.Designer.cs: Language not supported

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@baywet baywet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Add validation rule for OpenAPI document schema references.

Resolves microsoft#2453.
Add two basic unit tests and a fast path for when the components are registered.
- Improve handling of circular references.
- Improve the path.
Add a test for circular schema references.
Copy-pasted into the wrong place during a refactor.
Avoid allocating the segment for the context if the reference is valid.
Parse the document invariantly.
Remove reparsing of the document and just validate the in-memory `OpenApiDocument` instead.
@martincostello martincostello force-pushed the gh-2453-validate-schema-references branch from 39c040a to 216d62b Compare August 18, 2025 14:40
@martincostello martincostello requested a review from baywet August 19, 2025 10:58
Copy link
Member

@baywet baywet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making the changes!

@baywet baywet enabled auto-merge (squash) August 19, 2025 13:09
@baywet baywet merged commit 90b3966 into microsoft:main Aug 19, 2025
8 of 9 checks passed
@martincostello martincostello deleted the gh-2453-validate-schema-references branch August 19, 2025 13:16
@baywet
Copy link
Member

baywet commented Aug 19, 2025

@martincostello this change resulted in a 15% memory allocation increase for smaller descriptions
https://github.com/microsoft/OpenAPI.NET/actions/runs/17043844187/job/48398121099
As far as I can tell, the only allocation that are being made, are here

Would you please mind looking into it, and see if you can reduce allocations? or follow through with an update to the baseline?

@martincostello
Copy link
Contributor Author

martincostello commented Aug 19, 2025

I'll have a quick look later, but I would imagine it'll be a baseline change.

My guess would be that it's paying for enumerators for walking over the whole document tree, but that's the point of the validator (and it's leveraging the existing infrastructure for that).

Did your baseline/performance test just never touch the walker before?

@baywet
Copy link
Member

baywet commented Aug 19, 2025

I haven't checked in details but I think it did since the walker is used by other default validation rules.

@martincostello
Copy link
Contributor Author

Initial findings:

  1. The OpenApiWalker was already covered by the benchmarks because loading a document invokes this method, which uses the walker:
    document.SetReferenceHostDocument();
  2. The new code is covered by the benchmark because loading an OpenAPI document with the default settings implicitly validates a document against all the built-in rules (which isn't something I knew it did) here:
    var openApiErrors = document.Validate(settings.RuleSet);
  3. The checked-in benchmarks use a different version of BenchmarkDotNet and are several months old. It's possible that this PR made just enough changes to tip over the failure threshold. I'll need to verify if that's the case by checking out 10b46b9 and running the benchmarks on my system to get a comparable baseline, then running them again with 90b3966 and comparing the results.

@martincostello
Copy link
Contributor Author

You also might find this interesting: Continuous Benchmarks on a Budget

This is something I set up for my own projects' benchmarks that lets me track trends in the benchmark results over time and visualise them.

martincostello added a commit to martincostello/OpenAPI.NET that referenced this pull request Aug 19, 2025
Update benchmarks for microsoft#2459 investigation.
martincostello added a commit to martincostello/OpenAPI.NET that referenced this pull request Aug 19, 2025
Update benchmarks for microsoft#2459 investigation after changes.
@martincostello
Copy link
Contributor Author

Having done item 3, there's definitely a difference before and after #2459.

The TL;DR for each benchmark is:

Method Mean Before Mean After Ratio Memory Before Memory After Ratio
PetStoreYaml 265.6 μs 322.6 μs 1.21 387.12 KB 445.71 KB 1.15
PetStoreJson 106.6 μs 151.7 μs 1.42 249.26 KB 307.85 KB 1.23
GHESYaml 774,833.9 μs 127,873.81 μs 1.07 400088.73 KB 422174.19 KB 1.05
GHESJson 364,114.2 μs 576,195.19 μs 1.09 261558.87 KB 283644.3 KB 1.08

As you've noted, there's not much actively being allocated by the rule in and of itself (just OpenApiSchemaReferenceVisitor and OpenApiWalker).

My hunch is either something to do with calls into OpenApiSchemaReference.RecursiveTarget, or there's errors being found by walking the documents, so there's just a tonne of allocations from warnings being created. I'll look at that next.

Before


BenchmarkDotNet v0.15.2, Windows 11 (10.0.26100.4946/24H2/2024Update/HudsonValley)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.413
  [Host]   : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
PetStoreYaml 265.6 μs 53.73 μs 2.94 μs 31.2500 7.8125 - 387.12 KB
PetStoreJson 106.6 μs 39.80 μs 2.18 μs 20.0195 5.3711 - 249.26 KB
GHESYaml 774,833.9 μs 155,894.04 μs 8,545.08 μs 36000.0000 19000.0000 4000.0000 400088.73 KB
GHESJson 364,114.2 μs 174,868.35 μs 9,585.12 μs 22000.0000 12000.0000 2000.0000 261558.87 KB

After


BenchmarkDotNet v0.15.2, Windows 11 (10.0.26100.4946/24H2/2024Update/HudsonValley)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.413
  [Host]   : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
PetStoreYaml 322.6 μs 251.77 μs 13.80 μs 35.1563 7.8125 - 445.71 KB
PetStoreJson 151.7 μs 56.10 μs 3.07 μs 24.4141 5.8594 - 307.85 KB
GHESYaml 829,977.0 μs 127,873.81 μs 7,009.19 μs 38000.0000 19000.0000 4000.0000 422174.19 KB
GHESJson 397,127.5 μs 576,195.19 μs 31,583.20 μs 23000.0000 12000.0000 2000.0000 283644.3 KB

@baywet
Copy link
Member

baywet commented Aug 19, 2025

Thank you for the additional information.

The article is great! I had not come across it before.

I think so far we've established the memory increase is due to the code change, would you agree with that statement?

@martincostello
Copy link
Contributor Author

martincostello commented Aug 19, 2025

I haven't found any particular smoking gun introduced by the new code other than "the extra validation rule means more work gets done".

I have however found various things in OpenApiWalker that can be refactored to reduce allocations, but I won't get any further with that local refactoring today to get a PR to open, so I'll continue with that tomorrow.

Here's an example of where I'm up to so far (re-run benchmarks from main diffed against my local changes):

image

@martincostello
Copy link
Contributor Author

With these changes ce6497e (needs further cleanup before a PR) I get these numbers compared to the re-baseline before #2459:

Method Mean Before Mean After Ratio Memory Before Memory After Ratio
PetStoreYaml 265.6 μs 311.7 μs 1.17 387.12 KB 434.34 KB 1.12
PetStoreJson 106.6 μs 142.7 μ 1.33 249.26 KB 296.48 KB 1.18
GHESYaml 774,833.9 μs 796,230.8 μs 1.02 400088.73 KB 404377.45 KB 1.01
GHESJson 364,114.2 μs 359,560.4 μs 0.98 261558.87 KB 265847.63 KB 1.01

This reduces the ratios, and I think reinforces my view that the "regression" is just "does more work" rather than any due to any deficiencies in the change made (plus GHESJson is now ~2% faster despite doing more work).


BenchmarkDotNet v0.15.2, Windows 11 (10.0.26100.4946/24H2/2024Update/HudsonValley)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.413
  [Host]   : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
PetStoreYaml 311.7 μs 90.14 μs 4.94 μs 35.1563 7.8125 - 434.34 KB
PetStoreJson 142.7 μs 28.80 μs 1.58 μs 23.4375 6.8359 - 296.48 KB
GHESYaml 796,230.8 μs 296,583.76 μs 16,256.76 μs 37000.0000 19000.0000 4000.0000 404377.45 KB
GHESJson 359,560.4 μs 122,198.31 μs 6,698.10 μs 22000.0000 12000.0000 2000.0000 265847.63 KB

@baywet
Copy link
Member

baywet commented Aug 19, 2025

This is great! Yes that's what I meant, as opposed to "is a random change in how Benchmark.net counts things up"

@martincostello
Copy link
Contributor Author

Opening a PR shortly, but with the latest round of tweaks I get these results:

Method Mean Before Mean After Ratio Memory Before Memory After Ratio
PetStoreYaml 265.6 μs 292.4 μs 1.10 387.12 KB 421.22 KB 1.08
PetStoreJson 106.6 μs 142.3 μs 1.33 249.26 KB 283.36 KB 1.13
GHESYaml 774,833.9 μs 792,979.6 μs 1.02 400088.73 KB 390824.4 KB 0.97
GHESJson 364,114.2 μs 368,942.1 μs 1.01 261558.87 KB 252294.51 KB 0.96

BenchmarkDotNet v0.15.2, Windows 11 (10.0.26100.4946/24H2/2024Update/HudsonValley)
13th Gen Intel Core i7-13700H 2.90GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 8.0.413
  [Host]   : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.19 (8.0.1925.36514), X64 RyuJIT AVX2

Job=ShortRun  IterationCount=3  LaunchCount=1  
WarmupCount=3  

Method Mean Error StdDev Gen0 Gen1 Gen2 Allocated
PetStoreYaml 292.4 μs 70.22 μs 3.85 μs 33.2031 7.8125 - 421.22 KB
PetStoreJson 142.3 μs 23.19 μs 1.27 μs 22.4609 4.8828 - 283.36 KB
GHESYaml 792,979.6 μs 49,213.49 μs 2,697.56 μs 35000.0000 19000.0000 4000.0000 390824.4 KB
GHESJson 368,942.1 μs 134,561.89 μs 7,375.79 μs 21000.0000 12000.0000 2000.0000 252294.51 KB

@martincostello
Copy link
Contributor Author

#2470

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add default Validation Rule(s) for schema reference validity
2 participants