Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast blob transfer #1668

Merged
merged 91 commits into from
Jun 18, 2022
Merged

Fast blob transfer #1668

merged 91 commits into from
Jun 18, 2022

Conversation

pludov
Copy link
Contributor

@pludov pludov commented Jun 15, 2022

This MR adds support for local connection and fast memory buffer exchange in the indi protocol.

Working that way, data for BLOB (fits, stream, ...) needs no more being copied/base64 converted. The same memory is directly shared by driver to the client. This is an obvious win for CPU usage and latency, especially on low-end HW (rpi).

This works only for client/server located on the same unix/macos host. In that case, BLOB are written into buffers (shm or memfd) that are then exchanged by reference and shared & mmaped in the client. This is very lightweight compared to the existing base64 transfer

For remote connection, TCP is still supported for remote clients, unchanged. However, the shared buffer are used between driver and server, to eliminate handling there. In that case, the server handle the base64 encoding on a dedicated work thread.

Client that attempts to connect to localhost will be redirected to the local socket of the unix domain to take advantage. It is possible to target a specific unix socket path by using the syntax: localhost:/path/to/socket (an arg to indiserver is available to decide the path it listen on)

For client, since the existing semantic allows them to modify the blob data and that is not compatible with the new mechanism (blob are received as readonly), I added a new function for the client to explicitely allow readonly blob data. This removes one more copy of the data:

    camera_client->enableDirectBlobAccess(MYCCD, nullptr);

There are further optimisations possible to avoid more memory copies, on the driver side (like producing the camera frame directly in the memory buffer instead of copying).

The MR also adds :

  • a convertion of indiserver to cpp with stl
  • usage of libev for optimized event loop in indiserver
  • some integration/non regression test for indiserver

Feedback are welcome, especially for MacOS, since I don't have access to that system...

pludov added 30 commits June 11, 2022 21:05
Still needs a big cleanup / file splitting
@pludov
Copy link
Contributor Author

pludov commented Jun 23, 2022 via email

@rlancaste
Copy link
Contributor

Yeah, I don't recommend we try raising that limit, that is not user friendly. But if there is a way to maybe use a fifo file or memory mapped file, I think that would be better

@sonny486
Copy link
Contributor

I can confirm the memory leak on UBUNTU 20.04 running indi-server. Any capture results in an increase of memory on HTOP until it reaches max and crashes.

@pludov
Copy link
Contributor Author

pludov commented Jun 24, 2022

I can confirm the memory leak on UBUNTU 20.04 running indi-server. Any capture results in an increase of memory on HTOP until it reaches max and crashes.

I think I reproduced it : This is a distinct problem indeed, specific to tcp connection handling of blob. Can you confirm client connects to indiserver through tcp in your case ? using "localhost" as address should transparently route to unix domain and avoid that bug.

@knro
Copy link
Contributor

knro commented Jun 24, 2022

I can't reproduce this. In 87e8420 I free the shared BLOB on closing FITS file. Does this help? But I cannot see CCD simulator nor KStars memory leaking.

@sonny486
Copy link
Contributor

I can confirm the memory leak on UBUNTU 20.04 running indi-server. Any capture results in an increase of memory on HTOP until it reaches max and crashes.

I think I reproduced it : This is a distinct problem indeed, specific to tcp connection handling of blob. Can you confirm client connects to indiserver through tcp in your case ? using "localhost" as address should transparently route to unix domain and avoid that bug.

Yes I am running remote indi-server on ubuntu. I will check @knro update to see if that helps, can't test tonight though will try in the early AM local Texas time.

@sonny486
Copy link
Contributor

sonny486 commented Jun 24, 2022

@knro Looks like memory usage is still climbing. I will try and get more details as to what is increasing,; it does appear to be in the OS side that is increasing, none of the INDI modules are increasing memory usage.

On my QHY268c it climbs at a rate of about 11-12mb per image.

I will try and get more details.

@pludov
Copy link
Contributor Author

pludov commented Jun 24, 2022 via email

@sonny486
Copy link
Contributor

sonny486 commented Jun 24, 2022

I don't know if I see that on my side.

I did do a free -s 1 and it appears the buff/cache is the memory allocation that is increasing.

If I kill indiserver, the buff/cache memory is released.

@pludov
Copy link
Contributor Author

pludov commented Jun 24, 2022 via email

@sonny486
Copy link
Contributor

lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
indiserve 1623 telescope cwd DIR 259,2 4096 6029314 /home/telescope
indiserve 1623 telescope rtd DIR 259,2 4096 2 /
indiserve 1623 telescope txt REG 259,2 1798600 33819672 /usr/bin/indiserver
indiserve 1623 telescope DEL REG 0,1 9 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2053 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 8 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2052 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 7 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1029 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1028 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3079 /memfd:shm_anon
indiserve 1623 telescope mem REG 259,2 1369384 33820064 /usr/lib/x86_64-linux-gnu/libm-2.31.so
indiserve 1623 telescope mem REG 259,2 2029592 33820050 /usr/lib/x86_64-linux-gnu/libc-2.31.so
indiserve 1623 telescope mem REG 259,2 104984 33818753 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
indiserve 1623 telescope mem REG 259,2 1956992 33821941 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28
indiserve 1623 telescope mem REG 259,2 71680 33821220 /usr/lib/x86_64-linux-gnu/libev.so.4.0.0
indiserve 1623 telescope mem REG 259,2 157224 33820131 /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
indiserve 1623 telescope mem REG 259,2 191504 33818285 /usr/lib/x86_64-linux-gnu/ld-2.31.so
indiserve 1623 telescope 0r CHR 1,3 0t0 5 /dev/null
indiserve 1623 telescope 1w REG 259,2 4014 4194338 /tmp/indiserver.log
indiserve 1623 telescope 2w REG 259,2 4014 4194338 /tmp/indiserver.log
indiserve 1623 telescope 3u a_inode 0,14 0 14444 [eventpoll]
indiserve 1623 telescope 4u a_inode 0,14 0 14444 [eventfd]
indiserve 1623 telescope 5u IPv4 41961 0t0 TCP *:7624 (LISTEN)
indiserve 1623 telescope 6u unix 0xffff9c23032d8880 0t0 41962 @/tmp/indiserver type=STREAM
indiserve 1623 telescope 7r FIFO 259,2 0t0 4194337 /tmp/indiFIFO
indiserve 1623 telescope 8u IPv4 41969 0t0 TCP telescope2.local:7624->Telescope-Desktop.local:62517 (ESTABLISHED)
indiserve 1623 telescope 9u unix 0xffff9c23032d9980 0t0 41964 type=STREAM
indiserve 1623 telescope 10r FIFO 0,13 0t0 41965 pipe
indiserve 1623 telescope 11u unix 0xffff9c23032d8440 0t0 41967 type=STREAM
indiserve 1623 telescope 12r FIFO 0,13 0t0 41968 pipe
indiserve 1623 telescope 13u IPv4 41978 0t0 TCP telescope2.local:7624->Telescope-Desktop.local:62518 (ESTABLISHED)
root@telescope2:/home/telescope#

@sonny486
Copy link
Contributor

lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
indiserve 1623 telescope cwd DIR 259,2 4096 6029314 /home/telescope
indiserve 1623 telescope rtd DIR 259,2 4096 2 /
indiserve 1623 telescope txt REG 259,2 1798600 33819672 /usr/bin/indiserver
indiserve 1623 telescope DEL REG 0,1 2055 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3082 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1030 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3081 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3080 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2054 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 10 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 9 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2053 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 8 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2052 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 7 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1029 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1028 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3079 /memfd:shm_anon
indiserve 1623 telescope mem REG 259,2 1369384 33820064 /usr/lib/x86_64-linux-gnu/libm-2.31.so
indiserve 1623 telescope mem REG 259,2 2029592 33820050 /usr/lib/x86_64-linux-gnu/libc-2.31.so
indiserve 1623 telescope mem REG 259,2 104984 33818753 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
indiserve 1623 telescope mem REG 259,2 1956992 33821941 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28
indiserve 1623 telescope mem REG 259,2 71680 33821220 /usr/lib/x86_64-linux-gnu/libev.so.4.0.0
indiserve 1623 telescope mem REG 259,2 157224 33820131 /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
indiserve 1623 telescope mem REG 259,2 191504 33818285 /usr/lib/x86_64-linux-gnu/ld-2.31.so
indiserve 1623 telescope 0r CHR 1,3 0t0 5 /dev/null
indiserve 1623 telescope 1w REG 259,2 4413 4194338 /tmp/indiserver.log
indiserve 1623 telescope 2w REG 259,2 4413 4194338 /tmp/indiserver.log
indiserve 1623 telescope 3u a_inode 0,14 0 14444 [eventpoll]
indiserve 1623 telescope 4u a_inode 0,14 0 14444 [eventfd]
indiserve 1623 telescope 5u IPv4 41961 0t0 TCP *:7624 (LISTEN)
indiserve 1623 telescope 6u unix 0xffff9c23032d8880 0t0 41962 @/tmp/indiserver type=STREAM
indiserve 1623 telescope 7r FIFO 259,2 0t0 4194337 /tmp/indiFIFO
indiserve 1623 telescope 8u IPv4 41969 0t0 TCP telescope2.local:7624->Telescope-Desktop.local:62517 (ESTABLISHED)
indiserve 1623 telescope 9u unix 0xffff9c23032d9980 0t0 41964 type=STREAM
indiserve 1623 telescope 10r FIFO 0,13 0t0 41965 pipe
indiserve 1623 telescope 11u unix 0xffff9c23032d8440 0t0 41967 type=STREAM
indiserve 1623 telescope 12r FIFO 0,13 0t0 41968 pipe
indiserve 1623 telescope 13u IPv4 41978 0t0 TCP telescope2.local:7624->Telescope-Desktop.local:62518 (ESTABLISHED)
root@telescope2:/home/telescope#

@sonny486
Copy link
Contributor

lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
indiserve 1623 telescope cwd DIR 259,2 4096 6029314 /home/telescope
indiserve 1623 telescope rtd DIR 259,2 4096 2 /
indiserve 1623 telescope txt REG 259,2 1798600 33819672 /usr/bin/indiserver
indiserve 1623 telescope DEL REG 0,1 3083 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2058 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2057 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1033 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2056 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 12 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1032 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 11 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1031 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2055 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3082 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1030 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3081 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3080 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2054 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 10 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 9 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2053 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 8 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2052 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 7 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1029 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1028 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3079 /memfd:shm_anon
indiserve 1623 telescope mem REG 259,2 1369384 33820064 /usr/lib/x86_64-linux-gnu/libm-2.31.so
indiserve 1623 telescope mem REG 259,2 2029592 33820050 /usr/lib/x86_64-linux-gnu/libc-2.31.so
indiserve 1623 telescope mem REG 259,2 104984 33818753 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
indiserve 1623 telescope mem REG 259,2 1956992 33821941 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28
indiserve 1623 telescope mem REG 259,2 71680 33821220 /usr/lib/x86_64-linux-gnu/libev.so.4.0.0
indiserve 1623 telescope mem REG 259,2 157224 33820131 /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
indiserve 1623 telescope mem REG 259,2 191504 33818285 /usr/lib/x86_64-linux-gnu/ld-2.31.so
indiserve 1623 telescope 0r CHR 1,3 0t0 5 /dev/null
indiserve 1623 telescope 1w REG 259,2 4926 4194338 /tmp/indiserver.log
indiserve 1623 telescope 2w REG 259,2 4926 4194338 /tmp/indiserver.log
indiserve 1623 telescope 3u a_inode 0,14 0 14444 [eventpoll]
indiserve 1623 telescope 4u a_inode 0,14 0 14444 [eventfd]
indiserve 1623 telescope 5u IPv4 41961 0t0 TCP *:7624 (LISTEN)
indiserve 1623 telescope 6u unix 0xffff9c23032d8880 0t0 41962 @/tmp/indiserver type=STREAM
indiserve 1623 telescope 7r FIFO 259,2 0t0 4194337 /tmp/indiFIFO
indiserve 1623 telescope 8u IPv4 41969 0t0 TCP telescope2.local:7624->Telescope-Desktop.local:62517 (ESTABLISHED)
indiserve 1623 telescope 9u unix 0xffff9c23032d9980 0t0 41964 type=STREAM
indiserve 1623 telescope 10r FIFO 0,13 0t0 41965 pipe
indiserve 1623 telescope 11u unix 0xffff9c23032d8440 0t0 41967 type=STREAM
indiserve 1623 telescope 12r FIFO 0,13 0t0 41968 pipe
indiserve 1623 telescope 13u IPv4 41978 0t0 TCP telescope2.local:7624->Telescope-Desktop.local:62518 (ESTABLISHED)
root@telescope2:/home/telescope#

@pludov
Copy link
Contributor Author

pludov commented Jun 24, 2022 via email

@sonny486
Copy link
Contributor

Glad I could be of some help!

Awesome work by the way.

@pludov
Copy link
Contributor Author

pludov commented Jun 24, 2022

I can confirm the memory leak on UBUNTU 20.04 running indi-server. Any capture results in an increase of memory on HTOP until it reaches max and crashes.

I pushed a fix for the Linux/Remote leak in this PR : #1674

@sonny486
Copy link
Contributor

Ok let me re-download the Git and compile and I will let you know very shortly.

Thanks!

@sonny486
Copy link
Contributor

Will have to wait for commit. Will keep an eye on it.

@sonny486
Copy link
Contributor

I will build your fork and check it out.

@pludov
Copy link
Contributor Author

pludov commented Jun 26, 2022

Can GammaLut16 be optimized further? We really need it as sending 16bit frames would be too dark at the client side. Also, very few clients, if any, have support for RGB48 (16bit per channel). The LUT loop is probably a prime target for CPU vector instructions? Maybe this is already done by GCC at -O3?

I've checked the speed of Lut16 and some variation around it... I've used a test program (-O3) to test the approach, and run against random 4Mb buffers to get timings. The program is single threaded.

On good x86, the LUT table implementation is fine. The table fits in cache and the convertion occurs at a speed of ~ 1.8G sample/seconds. I doubt it can be made very more efficient (maybe using AVX2 that has a dedicated instruction for parallel LUT...).

lut using uint8_t : rate: 1814.676194 Mb/s
lut using uint16_t : rate: 1723.580739 Mb/s
lut on 11bit using uint8_t : rate: 1805.665275 Mb/s
naive float arithmetic: rate: 86.721807 Mb/s

(11 bits means only the 11 upper bits are used - precision is obviously lost - it should be possible to have a non linear LUT)

On smaller hardware (a RPI 2), things are very different. The L1 cache of the CPU is probably the limiting factor here:

lut using uint8_t : rate: 8.791209 Mb/s
lut using uint16_t : rate: 5.882353 Mb/s
lut on 11bit using uint8_t : rate: 28.444444 Mb/s
naive float arithmetic - rate: 0.490497 Mb/s

I submitted a PR for using a vector of uint8_t, which multiply by two the effectiveness of the cache: #1680

@knro
Copy link
Contributor

knro commented Jun 26, 2022

How about using NEON/SIMD? any CPU-agnostic libraries can do this? or perhaps some libraries that can implement AVX/SIMD depending on the underlying CPU architecture?

@pludov
Copy link
Contributor Author

pludov commented Jun 26, 2022 via email

@knro
Copy link
Contributor

knro commented Jul 14, 2022

@pludov Any ideas what's causing this test to fail? https://github.com/indilib/indi/runs/7307903893?check_suite_focus=true

@eric-vickery made some changes to the Docker builds as well recently

@TallFurryMan
Copy link
Contributor

It seems indiserver, which is an external process, might not have the time to consider the driver stopped per the test request, and returns 0 as exit code. Because that's not part of the verification, I suggest the test disregard the exit code. Eventually another test should verify the conditions leading to the unexpected exit code, but it isn't really useful to check that in the situation considered.

@pludov
Copy link
Contributor Author

pludov commented Jul 14, 2022

Hello !

The verification of the exit code is there to ensure indiserver did not terminate by a signal (like sigsegv...).

However, the code of the test itself is reporting 0 instead of the signal. I've opened a PR here to fix that: #1699

Once merged, we'll now what the signal is, but it's probably nothing good (an uncatched sigpipe ? sigsegv ? ,... )

@pludov
Copy link
Contributor Author

pludov commented Jul 14, 2022

I reproduced the failure :-) It's an indiserver segmentation fault, that can occur when doing base64 encoding (it's a race condition)

The fix is here: #1700

@pludov
Copy link
Contributor Author

pludov commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants