Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@tus/s3-store: failed part upload crashes entire server #722

Open
mitjap opened this issue Feb 10, 2025 · 3 comments
Open

@tus/s3-store: failed part upload crashes entire server #722

mitjap opened this issue Feb 10, 2025 · 3 comments
Labels

Comments

@mitjap
Copy link
Collaborator

mitjap commented Feb 10, 2025

When uploadPart or uploadIncompletePart throws (is rejected) entire server crashes. This happens often with Scaleway Object Storage as their service fails with error 500 with non-informative console message An error was encountered in a non-retryable streaming request. from AWS library. After listening for unhandledRejection event I got some more information.

Unhandled Rejection at: Promise {
  <rejected> S3ServiceException [InternalError]: We encountered an internal error. Please try again.
      at throwDefaultError (/app/node_modules/@smithy/smithy-client/dist-cjs/index.js:867:20)
      at /app/node_modules/@smithy/smithy-client/dist-cjs/index.js:876:5
      at de_CommandError (/app/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4965:14)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async /app/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20
      at async /app/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:483:18
      at async /app/node_modules/@smithy/middleware-retry/dist-cjs/index.js:321:38
      at async /app/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:315:18
      at async /app/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:109:22
      at async /app/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:136:14 {
    '$fault': 'client',
    '$metadata': {
      httpStatusCode: 500,
      requestId: 'txg5c09a7e2fbe14ba89375-0067aa0421',
      extendedRequestId: 'txg5c09a7e2fbe14ba89375-0067aa0421',
      cfId: undefined
    },
    Code: 'InternalError',
    RequestId: 'txg5c09a7e2fbe14ba89375-0067aa0421',
    HostId: 'txg5c09a7e2fbe14ba89375-0067aa0421',
    Resource: '<s3-object-key>.bin'
  },

I believe the problem is that when a deferred promise is created it does not handle rejections. It is only inserted into a list.

promises.push(deferred)

At the end promises are aggregated with Promise.all and returned to the caller where an rejection handler is eventually added.
await Promise.all(promises)

return handler.send(req, res).catch(onError)

During this period if any of the promises in a list are rejected, the rejection is not handled.

Steps to reproduce

Use S3 server that sometimes throws errors (e.g. Scaleway Object Storage) or manually throw error. I modified uploadPart(metadata, readStream, partNumber) to randomly reject. Add provided snippet at the top of this function. This will result in server crash.

const errorProbability = Math.random()
if (errorProbability < 0.2) {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      reject(new Error('test'));
    }, 1000);
  })
} 

Expected behavior

Server should not crash. Failed part upload should end the current upload with HTTP 500 Internal Server Error status code.

Observation

Promises (rejections) returned by acquiredPermit?.release() and permit?.release() are also not handled and would cause server to crash.

@mitjap mitjap added the bug label Feb 10, 2025
@mitjap
Copy link
Collaborator Author

mitjap commented Feb 10, 2025

I think this issue could be somehow "managed" with a simple modification. When a deferred promise is rejected just destroy the streamSplitter to stop accepting more data.

promises.push(deferred.catch((error) => {
  splitterStream.destroy(error);
}))

Error will properly be propagated as part of pipeline. Error 500 will be sent to client and server does not crash.

try {
  await streamProm.pipeline(readStream, splitterStream)
} catch {
....
}

Next issue is that await Promise.all(promises) resolves too fast. While there are still parts being uploaded the promise is rejected as soon as first one is rejected. Leaving others dangling. We need to wait for all promises to resolve before proceeding.

This could be solved with next change. First wait for all promises to settle (either successfully or not) and then reject with an error if necessary.

await Promise.allSettled(promises); // wait for all to settle
await Promise.all(promises); // reject is any promise rejected

Next problem is with CancellationContext which is aborted as soon as streamSplitter is destroyed (this ends incoming request), releasing a lock, allowing the user/client to make a HEAD request and then PATCH request to a resource that is still being processed. Any lock should be kept locked until all processing is done (until all promises resolve). AbortController can be used to cancel pending promises faster, but not all can be aborted (e.g. uploadPart, uploadIncompletePart) and need to be waited for.

@mitjap mitjap changed the title @tus/s3-server: failed part upload crashes entire server @tus/s3-store: failed part upload crashes entire server Feb 11, 2025
@Murderlon
Copy link
Member

I think it would be nice to fix and release this on 1.x before doing the major release. I don't like bug fixes on major releases.

Since you wrote the plan down already, do you also want to work on it?

@mitjap
Copy link
Collaborator Author

mitjap commented Feb 12, 2025

I can solve issues with S3 store regarding how it handles errors from S3 provider.

Issues with locking mechanism and CancellationContext are much larger and would need a separate task and discussion. Fixing this would most probably cause major bump in datastores as all data stores would need to be aware of CancellationContext to abort whatever they are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants