Support splice in http blind tunnel #11890

YIHONG-JIN · 2024-12-04T08:43:18Z

Add TS_USE_LINUX_SPLICE as a compilation option.
Make MIOBuffer and MIOBufferReader polymorphic classes making their member function virtual.
Create PipeIOBuffer and PipeIOBufferReader as derived classes, encapsulating Linux pipe.
Use dynamic_cast to enable logic switch in state machines and continuations.

Documentations:
ATS_splice_runbook.md
ATS Performance Benchmark.pdf
ATS_splice_design_doc.pdf

YIHONG-JIN · 2024-12-04T22:20:17Z

Passes unit test and manual integration test on Debian 12. Performance benchmark shows that splice could improve maximum blind tunnel throughput from 300 MB/s to 575 MB/s and reduce latency by 40% for MB level payload on C6in.large EC2 instance. Feel free to benchmark this CR.

moonchen

Thank you for the PR. Overall I think this is a clever idea to introduce zero-copy in a way that fits the existing architecture. I found a few small issues that I hope you can address before merging.

moonchen · 2025-01-08T00:33:04Z

src/proxy/http/HttpSM.cc

@@ -7274,6 +7274,32 @@ HttpSM::setup_blind_tunnel(bool send_response_hdr, IOBufferReader *initial)
  //  header buffer into new buffer
  client_request_body_bytes += from_ua_buf->write(_ua.get_txn()->get_remote_reader());

+#if TS_USE_LINUX_SPLICE
+  MIOBuffer *from_ua_pipe_buf = new_PipeIOBuffer(BUFFER_SIZE_INDEX_32K);


BUFFER_SIZE_INDEX_32K is equal to 8. Is this the capacity of the pipe that we're requesting from the kernel?

Yes, it is the same as the MIOBuffer size we use for blind tunnel. The default linux pipe size is 16 pages so I am trying to save memory by requesting only 8 pages here. However, the system admin still need to lift pipe-user-pages-soft limit to avoid exceptions.

moonchen · 2025-01-08T00:34:15Z

src/iocore/net/UnixNetVConnection.cc

+
+      if (r <= 0) {
+        // Temporary Unavailable, Non-Blocking I/O
+        if (r == -EAGAIN || r == -ENOTCONN) {


Is it possible that we get EAGAIN here because the pipe is at capacity? How do we handle that case?

It is actually impossible to get EAGAIN here because the pipe is at capacity. We will disable the read vio after each successful read and only reenable it when its corresponding pipe is empty again.

However, it is possible that we get EAGAIN because the socket is somehow unavailable. In that case, we wait for next epoll edge trigger same as the logic without zero copy

include/iocore/eventsystem/IOBuffer.h

moonchen · 2025-01-22T15:33:14Z

src/iocore/net/UnixNetVConnection.cc

@@ -508,6 +508,81 @@ UnixNetVConnection::net_read_io(NetHandler *nh)
    read_disable(nh, this);
    return;
  }
+#if TS_USE_LINUX_SPLICE


This function is getting quite long, with two independent code paths to read from socket to buffer. I would prefer using polymorphism to handle the different socket-to-buffer copies, but at least we should split this function up so that it's more readable.

I have considered using polymorphism here but it looks impossible without significant refactoring because of exposure of low level io operations in this function (we will need a new member function in both MIOBuffer and PipeIOBuffer). Will split this function up for now

moonchen · 2025-01-22T15:34:39Z

src/iocore/eventsystem/P_IOBuffer.h

@@ -656,6 +656,15 @@ new_MIOBuffer_internal(const char *location, int64_t size_index)
 TS_INLINE void
 free_MIOBuffer(MIOBuffer *mio)
 {
+#if TS_USE_LINUX_SPLICE
+  // check if mio is PipeIOBuffer using dynamic_cast
+  PipeIOBuffer *pipe_mio = dynamic_cast<PipeIOBuffer *>(mio);


Relying on a runtime type check with dynamic_cast to select the proper deallocation mechanism is generally considered an anti-pattern. Instead, ClassAllocator has a Destruct_on_free_ parameter that allows you to use a virtual destructor to clean up properly depending on the underlying type.

The PipeIOBuffer is managed by pipeIOAllocator instead of ioAllocator. I introduced pipeIOAllocator because it is a new class with additional attributes. The ClassAllocator appears to be designed for concrete classes rather than abstract classes or interfaces.

I didn’t put the cleanup logic to the virtual destructor because the existing logic for MIOBuffer doesn’t use MIOBuffer's destructor. Instead, it manually handles cleanup in free_MIOBuffer with the following steps:

mio->_writer = nullptr; mio->dealloc_all_readers();

Interestingly, the destructor is designed to perform this same cleanup, but for some reason, the existing code avoids relying on it. If anyone knows why, I’d appreciate the insight.

Even if we moved the cleanup logic to the virtual destructor, we would still need a runtime type check since real polymorphism is not yet in place. For example, when dealing with a PipeIOBuffer, we would need to use

THREAD_FREE(mio, pipeIOAllocator, this_thread());

instead of

THREAD_FREE(mio, ioAllocator, this_thread());

to make sure the correct allocator is used.

Will keep it as it was until figure out how to implement real polymorphism here. More background info required.

YIHONG-JIN · 2025-01-22T23:21:41Z

Thanks for the comments @moonchen. I will start to resolve the comments

1. Add TS_USE_LINUX_SPLICE as a compilation option. 2. Make MIOBuffer and MIOBufferReader polymorphic classes making their member function virtual. 3. Create PipeIOBuffer and PipeIOBufferReader as derived classed, encapsulating Linux pipe. 4. Use dynamic_cast to enable logic switch in state machines and continuations.

YIHONG-JIN · 2025-01-27T07:13:32Z

Slightly reorganized the file structure to decouple PipieIOBuffer from original IOBuffer.

masaori335 · 2025-01-30T04:52:32Z

src/iocore/eventsystem/P_PipeIOBuffer.h

+TS_INLINE char *
+PipeIOBufferReader::start()
+{
+  throw std::runtime_error("Not applicable for PipeIOBufferReader");


Hmm, this PR is pretty interesting as concept. However, overriding bunch of methods and throwing runtime errors like this is indicating something is wrong with class design. ( Also, please note that ATS doesn't use Exception in these fundamental code )

YIHONG-JIN marked this pull request as ready for review December 4, 2024 22:11

YIHONG-JIN marked this pull request as draft December 4, 2024 22:18

randall assigned YIHONG-JIN Dec 9, 2024

bryancall requested a review from moonchen December 9, 2024 23:20

bryancall added this to the 10.1.0 milestone Dec 9, 2024

bryancall added HTTP Tunneling IOBuffer labels Dec 9, 2024

YIHONG-JIN force-pushed the zero-copy-pr branch from 339f013 to 5a1b3ee Compare December 20, 2024 02:07

YIHONG-JIN marked this pull request as ready for review December 20, 2024 02:07

moonchen requested changes Jan 22, 2025

View reviewed changes

YIHONG-JIN force-pushed the zero-copy-pr branch from 5a1b3ee to 0f7fc70 Compare January 27, 2025 07:00

masaori335 reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support splice in http blind tunnel #11890

Support splice in http blind tunnel #11890

YIHONG-JIN commented Dec 4, 2024 •

edited

Loading

YIHONG-JIN commented Dec 4, 2024

moonchen left a comment

moonchen Jan 8, 2025

YIHONG-JIN Jan 23, 2025

moonchen Jan 8, 2025

YIHONG-JIN Jan 23, 2025

moonchen Jan 22, 2025

YIHONG-JIN Jan 23, 2025 •

edited

Loading

moonchen Jan 22, 2025

YIHONG-JIN Jan 24, 2025 •

edited

Loading

YIHONG-JIN Jan 27, 2025

YIHONG-JIN commented Jan 22, 2025

YIHONG-JIN commented Jan 27, 2025

masaori335 Jan 30, 2025

Support splice in http blind tunnel #11890

Are you sure you want to change the base?

Support splice in http blind tunnel #11890

Conversation

YIHONG-JIN commented Dec 4, 2024 • edited Loading

YIHONG-JIN commented Dec 4, 2024

moonchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YIHONG-JIN Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YIHONG-JIN Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YIHONG-JIN commented Jan 22, 2025

YIHONG-JIN commented Jan 27, 2025

Choose a reason for hiding this comment

YIHONG-JIN commented Dec 4, 2024 •

edited

Loading

YIHONG-JIN Jan 23, 2025 •

edited

Loading

YIHONG-JIN Jan 24, 2025 •

edited

Loading