Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement file compression #1429

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Implement file compression #1429

wants to merge 26 commits into from

Conversation

bennavapbc
Copy link
Collaborator

@bennavapbc bennavapbc commented Jan 13, 2025

🎫 Ticket

https://jira.cms.gov/browse/AB2D-6462

🛠 Changes

  • Add utilities for compressing job output files to gzip format
  • Allow bulk download API to return compressed or uncompressed files (per request header)

ℹ️ Context

  • Compress job output files to reduce file size by ~90% and (slightly) improve transfer times for clients requesting compressed files

SonarQube

image

🧪 Validation

  • Added unit tests and integration tests in TestRunner
  • Verified that output files generated and stored in uncompressed ndjson formats are still able to be downloaded
  • Deployed to IMPL and tested (a) status and (b) file download APIs work as expected for both /v1 and /v2

image

image

Verified file length and checksum match download files after decompressing

image

Verified a curl request with Accept-Encoding: gzip returns compressed data along with content-encoding: gzip response header

curl -v -w "@curl-format.txt" -X 'GET' \
  'https://impl.ab2d.cms.gov/api/v1/fhir/Job/f1354d5c-1111-bbbb-cccc-dddddddddddd/file/Z0000_0001.ndjson' \
  -H 'Accept-Encoding: gzip' \
  -H 'accept: application/fhir+ndjson' \
  -H "Authorization: Bearer $JWT" -o Z0000_0001-speed-test.ndjson.gz
Note: Unnecessary use of -X or --request, GET is already inferred.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 100.64.1.18:443...
* Connected to impl.ab2d.cms.gov (100.64.1.18) port 443 (#0)
* ALPN: offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [322 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [100 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [4841 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN: server accepted h2
* Server certificate:
*  subject: C=US; ST=District of Columbia; L=Washington; O=US Department of Health and Human Services; CN=impl.ab2d.cms.gov
*  start date: Sep 11 15:43:30 2024 GMT
*  expire date: Oct 11 15:42:30 2025 GMT
*  subjectAltName: host "impl.ab2d.cms.gov" matched cert's "impl.ab2d.cms.gov"
*  issuer: C=US; O=IdenTrust; OU=HydrantID Trusted Certificate Service; CN=HydrantID Server CA O1
*  SSL certificate verify ok.
* using HTTP/2
* h2 [:method: GET]
* h2 [:scheme: https]
* h2 [:authority: impl.ab2d.cms.gov]
* h2 [:path: /api/v1/fhir/Job/f1354d5c-1111-bbbb-cccc-dddddddddddd/file/Z0000_0001.ndjson]
* h2 [user-agent: curl/8.1.2]
* h2 [accept-encoding: gzip]
* h2 [accept: application/fhir+ndjson]
* h2 [authorization: Bearer ???]
* Using Stream ID: 1 (easy handle 0x13700a800)
> GET /api/v1/fhir/Job/f1354d5c-1111-bbbb-cccc-dddddddddddd/file/Z0000_0001.ndjson HTTP/2
> Host: impl.ab2d.cms.gov
> User-Agent: curl/8.1.2
> Accept-Encoding: gzip
> accept: application/fhir+ndjson
> Authorization: Bearer ???
>
< HTTP/2 200
< date: Mon, 27 Jan 2025 21:40:22 GMT
< content-type: application/fhir+ndjson
< content-encoding: gzip
< content-disposition: inline; swaggerDownload="attachment"; filename="Z0000_0001.ndjson.gz"
< x-content-type-options: nosniff
< x-xss-protection: 1; mode=block
< cache-control: no-cache, no-store, max-age=0, must-revalidate
< pragma: no-cache
< expires: 0
< strict-transport-security: max-age=31536000 ; includeSubDomains
< x-frame-options: DENY
<
{ [16047 bytes data]
100  578k    0  578k    0     0   510k      0 --:--:--  0:00:01 --:--:--  512k

@bennavapbc bennavapbc marked this pull request as ready for review January 13, 2025 20:40
@bennavapbc bennavapbc requested a review from a team as a code owner January 13, 2025 20:40
@bennavapbc bennavapbc changed the title Implement file compression (Draft) Implement file compression Jan 13, 2025
@bennavapbc bennavapbc removed the request for review from a team January 27, 2025 23:59
@bennavapbc bennavapbc requested a review from a team January 27, 2025 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants