Skip to content

Improve registry-based archive generation reliability and performance#240

Merged
nutjob4life merged 6 commits intomainfrom
fix-fips-mode-md5-issue-226
Jan 8, 2026
Merged

Improve registry-based archive generation reliability and performance#240
nutjob4life merged 6 commits intomainfrom
fix-fips-mode-md5-issue-226

Conversation

@jordanpadams
Copy link
Member

@jordanpadams jordanpadams commented Jan 7, 2026

🗒️ Summary

This PR significantly improves the reliability and performance of registry-based deep archive generation through three key enhancements:

  1. More reliable collection discovery - Uses ref_lidvid_collection from bundle properties instead of the brittle /members endpoint
  2. Automatic retry with exponential backoff - Handles transient API/database failures gracefully
  3. Optimized API payload size - Requests only required fields, reducing payload from 11MB to 110KB (99% reduction!)

Changes Made

Fix #236 - Use ref_lidvid_collection for collection discovery:

  • Parse ref_lidvid_collection directly from bundle properties to discover collections
  • Avoids reliability issues with the /products/{bundle-lidvid}/members endpoint
  • More accurate since it uses exact collection LIDVIDs from bundle metadata

Fix #237 - Add retry logic with exponential backoff:

  • Implemented automatic retry for HTTP 500, 502, 503, 504 errors
  • Uses exponential backoff: 2s, 4s, 8s, 16s, 32s over 5 attempts
  • Enhanced error handling for JSONDecodeError exceptions
  • Added detailed logging for debugging API failures

Fix #238 - Optimize API payload size:

  • Added fields parameter to API requests to fetch only required fields
  • Fields requested: file URLs, MD5 checksums, and pagination metadata
  • Eliminated unnecessary metadata from responses

⚙️ Test Data and/or Report

Automated Tests

All tests pass:

tox
...
  py312: OK (2.48=setup[0.89]+cmd[1.59] seconds)
  py313: OK (1.81=setup[0.72]+cmd[1.09] seconds)
  docs: OK (1.02=setup[0.45]+cmd[0.57] seconds)
  lint: OK (6.37=setup[0.01]+cmd[6.37] seconds)
  congratulations :) (11.90 seconds)

Pre-commit hooks: All passed during push

Manual Testing

Successfully tested with the New Horizons SWAP bundle referenced in #236:

pds-deep-registry-archive -s PDS_SBN urn:nasa:pds:nh_swap::2.1

Results:

  • ✅ Successfully generated AIP and SIP packages
  • ✅ All 8 collections discovered via ref_lidvid_collection
  • ✅ Retry logic handled transient 500 errors encountered during testing
  • ✅ API payload reduced from 11MB to 110KB per request (99% reduction!)

During testing, encountered the transient registry errors that led to the retry implementation (#237). The retry logic successfully handled these failures with exponential backoff.

Performance Metrics

Metric Before After Improvement
Payload Size 11 MB 110 KB 99% reduction
Collection Discovery /members endpoint ref_lidvid_collection More reliable
Error Handling Immediate failure Auto-retry (5 attempts) Better resilience

♻️ Related Issues

🤓 Reviewer Checklist

Reviewers: Please verify the following before approving this pull request.

Security & Quality

  • SonarCloud: Confirmed no new High or Critical security findings.
  • Secrets Detection: Verified that the Secrets Detection scan passed and no sensitive information (keys, tokens, PII) is exposed.
  • Code Quality: Code follows organization style guidelines and best practices for the specific language (e.g., PEP 8, Google Java Style).

Testing & Validation

  • Test Accuracy: Verified that test data is accurate, representative of real-world PDS4 scenarios, and sufficient for the logic being tested.
  • Coverage: Automated tests cover new logic and edge cases.
  • Local Verification: (If applicable) Successfully built and ran the changes in a local or staging environment.

Documentation

  • Documentation: README, Wiki, or inline documentation (Sphinx, Javadoc, docstrings) have been updated to reflect these changes.

Maintenance

  • Issue Traceability: The PR is linked to a valid GitHub Issue or Jira Ticket.
  • Backward Compatibility: Confirmed that these changes do not break existing downstream dependencies or API contracts (or that breaking changes are clearly documented).

@jordanpadams jordanpadams requested a review from a team as a code owner January 7, 2026 22:19
Copilot AI review requested due to automatic review settings January 7, 2026 22:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the reliability and performance of registry-based deep archive generation by addressing three critical issues: unreliable collection discovery, transient API failures, and excessive API payload sizes. The changes implement a more robust collection discovery mechanism using bundle properties, add automatic retry logic with exponential backoff for transient errors, and optimize API requests to fetch only required fields.

Key changes:

  • Replace /members endpoint with ref_lidvid_collection property parsing for collection discovery
  • Add retry mechanism with exponential backoff for HTTP 500/502/503/504 errors
  • Optimize API payload size by requesting only necessary fields (99% reduction)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

url = f"{server_url}/products/{lidvid}/members"
params = {"sort": _searchkey, "limit": _apiquerylimit}
# Request only the fields we need to minimize payload size
params = {"sort": _searchkey, "limit": _apiquerylimit, "fields": ",".join(_fields)}
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields parameter is added to limit the response size, but there's no validation that the API actually supports this parameter or handles it correctly. If the API ignores unrecognized parameters, the optimization would silently fail. Consider adding validation or documentation noting the API version requirement for the fields parameter.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I think hardcoding in API parameters is fine. we own that API

Workaround for #236 where the /products/{bundle-lidvid}/members endpoint
has proven brittle. Instead, parse ref_lidvid_collection directly from
the bundle properties to discover collections.

This approach is equally accurate since it uses the exact collection
LIDVIDs specified in the bundle metadata, but avoids the reliability
issues with the /members endpoint at the bundle level.

The /members endpoint is still used for discovering products within
each collection.
Fixes #237 by implementing automatic retry with exponential backoff
for all PDS API requests. This addresses transient API/database
performance issues that result in 500 errors and JSON decode failures.

Changes:
- Added _get_session_with_retry() to create sessions with retry logic
- Configured to retry on HTTP 500, 502, 503, 504 status codes
- Uses exponential backoff (2s, 4s, 8s, 16s, 32s) over 5 attempts
- Enhanced error handling to catch JSONDecodeError exceptions
- Added detailed logging for debugging API failures

The retry mechanism uses urllib3.Retry with requests.HTTPAdapter to
automatically handle transient failures without code changes in
calling functions.
Fixes #238 by adding fields parameter to /members endpoint requests
to minimize payload size and improve performance.

Deep archive only needs 5 specific fields from product responses:
- ops:Data_File_Info.ops:file_ref (data file URLs)
- ops:Data_File_Info.ops:md5_checksum (data file checksums)
- ops:Label_File_Info.ops:file_ref (label file URL)
- ops:Label_File_Info.ops:md5_checksum (label file checksum)
- ops:Harvest_Info.ops:harvest_date_time (for pagination)

Changes:
- Updated _fields constant to include harvest_date_time for pagination
- Modified _getproducts() to pass fields parameter to API requests
- Reduced API response payload by filtering unnecessary metadata

Benefits:
- Smaller JSON payloads reduce network bandwidth
- Faster API responses with less data serialization
- Better performance for collections with many products
- Reduced load on API server and database
@jordanpadams jordanpadams force-pushed the fix-fips-mode-md5-issue-226 branch from c52b51d to 0b025ef Compare January 7, 2026 23:28
Copilot AI review requested due to automatic review settings January 7, 2026 23:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Updated retry status codes to use HTTPStatus enum constants for
better code readability and maintainability:
- 500 → HTTPStatus.INTERNAL_SERVER_ERROR
- 502 → HTTPStatus.BAD_GATEWAY
- 503 → HTTPStatus.SERVICE_UNAVAILABLE
- 504 → HTTPStatus.GATEWAY_TIMEOUT

Addresses code review feedback.
Addresses PR feedback with two improvements:

1. Clarified _searchkey field requirement in _fields constant
   - Added comment explaining it must be included for pagination
   - Required by line 244 for search-after parameter access

2. Implemented batch collection fetching to reduce API calls
   - New _getcollections_batch() function fetches collections in batches
   - Uses query syntax: (lidvid eq "..." or lidvid eq "..." or ...)
   - Batches collections into groups of 50 (API limit)
   - Reduces API calls dramatically (e.g., 8 collections: 8→1 requests)
   - Updated _comprehendregistry() to use batch fetching

Benefits:
- Fewer API requests reduces server load
- Better performance for bundles with many collections
- Stays within API query limits
- Maintains same functionality with better efficiency
Copy link

Copilot AI commented Jan 7, 2026

@jordanpadams I've opened a new pull request, #242, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link

Copilot AI commented Jan 7, 2026

@jordanpadams I've opened a new pull request, #243, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Member

@nutjob4life nutjob4life left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jordanpadams!

@jordanpadams
Copy link
Member Author

Updated with Code Review Feedback

Additional Improvements

Commit fce720b - Use HTTPStatus constants:

  • Replaced hardcoded status codes (500, 502, 503, 504) with HTTPStatus enum constants
  • Improves code readability and maintainability
  • Uses: HTTPStatus.INTERNAL_SERVER_ERROR, HTTPStatus.BAD_GATEWAY, HTTPStatus.SERVICE_UNAVAILABLE, HTTPStatus.GATEWAY_TIMEOUT

Commit 4b55407 - Batch collection fetching:

  • Added _getcollections_batch() function to fetch collections efficiently
  • Uses API query syntax: (lidvid eq "..." or lidvid eq "..." or ...)
  • Batches collections into groups of 50 (API limit) to minimize requests
  • Dramatically reduces API calls: 8 collections → 1 request (8x improvement!)
  • For bundles with 150 collections: 150 requests → 3 requests (50x improvement!)
  • Clarified that _searchkey field is required in _fields for pagination at line 244

Performance Impact

Before batch fetching:

  • 1 API call per collection to fetch metadata
  • NH SWAP bundle (8 collections): 8 separate API requests

After batch fetching:

  • 1 API call per 50 collections
  • NH SWAP bundle (8 collections): 1 API request (8x reduction)
  • Large bundle (150 collections): 3 API requests (50x reduction)

Combined with the field optimization (11MB → 110KB payload), these changes significantly improve both performance and reliability of registry-based archive generation.

@nutjob4life nutjob4life merged commit 8f156f2 into main Jan 8, 2026
4 checks passed
@nutjob4life nutjob4life deleted the fix-fips-mode-md5-issue-226 branch January 8, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants