-
Notifications
You must be signed in to change notification settings - Fork 289
[SFTP destination] Adds support for fast upload from buffer #3387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a custom SFTP client wrapper (SFTClientCustom) to replace the standard ssh2-sftp-client for file uploads. The new implementation provides a custom fastPutFromBuffer method with lower-level control over the upload process using the ssh2 library's SFTPWrapper directly.
Key changes:
- New
SFTClientCustomclass that wrapsssh2-sftp-clientwith custom upload logic - Updated
uploadSFTPfunction to use the new custom client instead ofexecuteSFTPOperation - Custom abort signal handling implementation for upload operations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts | New custom SFTP client class with manual chunked upload implementation |
| packages/destination-actions/src/destinations/sftp/client.ts | Modified uploadSFTP to use custom client and implement new abort handling logic |
packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts
Outdated
Show resolved
Hide resolved
packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts
Outdated
Show resolved
Hide resolved
packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3387 +/- ##
==========================================
+ Coverage 79.98% 80.38% +0.40%
==========================================
Files 1211 1268 +57
Lines 22356 25306 +2950
Branches 4407 5229 +822
==========================================
+ Hits 17881 20343 +2462
- Misses 3695 4161 +466
- Partials 780 802 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.
packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts
Outdated
Show resolved
Hide resolved
packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts
Outdated
Show resolved
Hide resolved
packages/destination-actions/src/destinations/sftp/sftp-wrapper.ts
Outdated
Show resolved
Hide resolved
| }) | ||
| } | ||
|
|
||
| const processWrites = async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could we have used "async" library which would have further simplified the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help understand how an async library would help? Its anyways 5-6 lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is that we are first executing the first 64 chunk and then moving on to the next 64 chunk. Let's say, if one chunk out of 64 chunk is stuck, we are not moving ahead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async library will handle it out of the box and we our code will be much more simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we add retries in case of common error like "Connection reset by peer"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good point. However, I do want to point out that Centrifuge should be the source of truth for any sort of retries in our platform. So, we should leave that to centrifuge by indicating the delivery failed with a retryable status code. Retries within destination code could increase the time for processing and lead to timeout issues.
- There is no library that could handle this low level detail for chunking. Even the sftp/ssh2 library that we use has a
[fastput](https://github.com/mscdex/ssh2/blob/844f1edfc41589737671f96a4f4e76afdf46abd4/lib/protocol/SFTP.js#L508)method that does the same concurrent upload. I don't think SFTP protocol or any library has this feature OOTB. This is consistent with how our platform works today. All deliveries are retried on the whole even for HTTP or S3 uploads. The current approach (Even with the old synchronous put method) can result in incomplete uploads to SFTP. The technique that is generally followed in this scenario is to first upload a file with different extension/name and then rename the file once all chunks are uploaded. This is something we plan to do in subsequent PRs. - Connection reset by peer will be retried by Centrifuge because we throw an error with 500 (retryable) right now for any error other than file not found error.
|
PR deployed |
This PR
uploadStrategy. This is set tostandardby default. If a customers SFTP server doesn't supportconcurrent, they can usestandard.[PS: I might have to do some more research. On digging in further, it seems like fastput/fastget is not really a server support capability]
From tests, we are seeing 3x improvement in upload speeds with fastPutFromBuffer method.
Video Walkthrough of this change
Testing
New Setting to select upload strategy
Test Authentication continues to work
10k upload worked successfully within ~6 seconds with fastput enabled. Previously, we were able to upload only upto 25-30s
Regression tested with useConcurrent writes set to false with batch size of 10k. Uploads were successful
