-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast blob transfer #1668
Fast blob transfer #1668
Conversation
Still needs a big cleanup / file splitting
How to raise that limit is explained here - not for newbies! :
https://developer.apple.com/forums/thread/669625
Le jeu. 23 juin 2022 à 21:40, Ludovic Pollet ***@***.***> a écrit :
… Memory mapped files would technically be an option, but they would involve
diskio - as afaik there is no ramdisk on macos - so not better than
previous situation.
There is also the option to raise the 4m limit through sysctl, but it
seems to be very complicated since catalina...
Le jeu. 23 juin 2022 à 21:23, rlancasteAstro ***@***.***> a
écrit :
> Ok, I don't pretend to know what is best, or the limitations of various
> methods, but is there not an alternative method to do this that will work
> for MacOS such as fifo files, memory mapped files, sockets, or even just
> splitting files for transfer so they don't exceed that 4 MB limit you
> mentioned?
>
> —
> Reply to this email directly, view it on GitHub
> <#1668 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADMXFHWXFF52JIHGBBBX7V3VQS2S7ANCNFSM5Y4PBK7Q>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Yeah, I don't recommend we try raising that limit, that is not user friendly. But if there is a way to maybe use a fifo file or memory mapped file, I think that would be better |
I can confirm the memory leak on UBUNTU 20.04 running indi-server. Any capture results in an increase of memory on HTOP until it reaches max and crashes. |
I think I reproduced it : This is a distinct problem indeed, specific to tcp connection handling of blob. Can you confirm client connects to indiserver through tcp in your case ? using "localhost" as address should transparently route to unix domain and avoid that bug. |
I can't reproduce this. In 87e8420 I free the shared BLOB on closing FITS file. Does this help? But I cannot see CCD simulator nor KStars memory leaking. |
Yes I am running remote indi-server on ubuntu. I will check @knro update to see if that helps, can't test tonight though will try in the early AM local Texas time. |
@knro Looks like memory usage is still climbing. I will try and get more details as to what is increasing,; it does appear to be in the OS side that is increasing, none of the INDI modules are increasing memory usage. On my QHY268c it climbs at a rate of about 11-12mb per image. I will try and get more details. |
You can check the blobs owned by the indiserver with the lsof command (lsof
-p <pid>)
I observed indiserver does not release them after base64 convertion for
tcp.
Let me know if you have the same problem....
Le ven. 24 juin 2022 à 15:27, Sonny Cavazos ***@***.***> a
écrit :
… @knro <https://github.com/knro> Looks like free memory is still climbing.
I will try and get more details as to what is increasing,; it does appear
to be in the OS side that is increasing, none of the INDI modules are
increasing memory usage.
I will try and get more details.
—
Reply to this email directly, view it on GitHub
<#1668 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADMXFHXYCUQQPOSBVOC2IX3VQWZU7ANCNFSM5Y4PBK7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I don't know if I see that on my side. I did do a free -s 1 and it appears the buff/cache is the memory allocation that is increasing. If I kill indiserver, the buff/cache memory is released. |
What's the output of lsof ?
Le ven. 24 juin 2022 à 15:59, Sonny Cavazos ***@***.***> a
écrit :
… I don't know if I see that on my side.
I did do a free -s 1 and it appears the buff/cache is the memory
allocation that is increasing.
—
Reply to this email directly, view it on GitHub
<#1668 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADMXFHSW2KM362RHJRSIGTDVQW5LBANCNFSM5Y4PBK7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs |
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs |
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs |
That confirms you hit the bug I found, i am fixing it
Le ven. 24 juin 2022 à 17:41, Sonny Cavazos ***@***.***> a
écrit :
… lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
indiserve 1623 telescope cwd DIR 259,2 4096 6029314 /home/telescope
indiserve 1623 telescope rtd DIR 259,2 4096 2 /
indiserve 1623 telescope txt REG 259,2 1798600 33819672 /usr/bin/indiserver
indiserve 1623 telescope DEL REG 0,1 3083 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2058 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2057 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1033 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2056 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 12 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1032 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 11 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1031 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2055 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3082 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1030 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3081 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3080 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2054 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 10 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 9 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2053 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 8 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 2052 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 7 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1029 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 1028 /memfd:shm_anon
indiserve 1623 telescope DEL REG 0,1 3079 /memfd:shm_anon
indiserve 1623 telescope mem REG 259,2 1369384 33820064
/usr/lib/x86_64-linux-gnu/libm-2.31.so
indiserve 1623 telescope mem REG 259,2 2029592 33820050
/usr/lib/x86_64-linux-gnu/libc-2.31.so
indiserve 1623 telescope mem REG 259,2 104984 33818753
/usr/lib/x86_64-linux-gnu/libgcc_s.so.1
indiserve 1623 telescope mem REG 259,2 1956992 33821941
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28
indiserve 1623 telescope mem REG 259,2 71680 33821220
/usr/lib/x86_64-linux-gnu/libev.so.4.0.0
indiserve 1623 telescope mem REG 259,2 157224 33820131
/usr/lib/x86_64-linux-gnu/libpthread-2.31.so
indiserve 1623 telescope mem REG 259,2 191504 33818285
/usr/lib/x86_64-linux-gnu/ld-2.31.so
indiserve 1623 telescope 0r CHR 1,3 0t0 5 /dev/null
indiserve 1623 telescope 1w REG 259,2 4926 4194338 /tmp/indiserver.log
indiserve 1623 telescope 2w REG 259,2 4926 4194338 /tmp/indiserver.log
indiserve 1623 telescope 3u a_inode 0,14 0 14444 [eventpoll]
indiserve 1623 telescope 4u a_inode 0,14 0 14444 [eventfd]
indiserve 1623 telescope 5u IPv4 41961 0t0 TCP *:7624 (LISTEN)
indiserve 1623 telescope 6u unix 0xffff9c23032d8880 0t0 41962
@/tmp/indiserver type=STREAM
indiserve 1623 telescope 7r FIFO 259,2 0t0 4194337 /tmp/indiFIFO
indiserve 1623 telescope 8u IPv4 41969 0t0 TCP
telescope2.local:7624->Telescope-Desktop.local:62517 (ESTABLISHED)
indiserve 1623 telescope 9u unix 0xffff9c23032d9980 0t0 41964 type=STREAM
indiserve 1623 telescope 10r FIFO 0,13 0t0 41965 pipe
indiserve 1623 telescope 11u unix 0xffff9c23032d8440 0t0 41967 type=STREAM
indiserve 1623 telescope 12r FIFO 0,13 0t0 41968 pipe
indiserve 1623 telescope 13u IPv4 41978 0t0 TCP
telescope2.local:7624->Telescope-Desktop.local:62518 (ESTABLISHED)
***@***.***:/home/telescope#
—
Reply to this email directly, view it on GitHub
<#1668 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADMXFHWMSZTFLU2BOCJFHX3VQXJKZANCNFSM5Y4PBK7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Glad I could be of some help! Awesome work by the way. |
I pushed a fix for the Linux/Remote leak in this PR : #1674 |
Ok let me re-download the Git and compile and I will let you know very shortly. Thanks! |
Will have to wait for commit. Will keep an eye on it. |
I will build your fork and check it out. |
I've checked the speed of Lut16 and some variation around it... I've used a test program (-O3) to test the approach, and run against random 4Mb buffers to get timings. The program is single threaded. On good x86, the LUT table implementation is fine. The table fits in cache and the convertion occurs at a speed of ~ 1.8G sample/seconds. I doubt it can be made very more efficient (maybe using AVX2 that has a dedicated instruction for parallel LUT...). lut using uint8_t : rate: 1814.676194 Mb/s (11 bits means only the 11 upper bits are used - precision is obviously lost - it should be possible to have a non linear LUT) On smaller hardware (a RPI 2), things are very different. The L1 cache of the CPU is probably the limiting factor here: lut using uint8_t : rate: 8.791209 Mb/s I submitted a PR for using a vector of uint8_t, which multiply by two the effectiveness of the cache: #1680 |
How about using NEON/SIMD? any CPU-agnostic libraries can do this? or perhaps some libraries that can implement AVX/SIMD depending on the underlying CPU architecture? |
Only avx 2 has an interesting hw instruction for lookup (available on not
too old amd and intel). Neon doesn't.
Anyway, best is probably to not to gamma in server/driver. If you seek high
fps perfs, your connection will be local anyway, so passing 8 or 16bits
costs the same. Then client can look at opengl for more optimized rendering
Le dim. 26 juin 2022 à 16:14, Jasem Mutlaq ***@***.***> a
écrit :
… How about using NEON/SIMD? any CPU-agnostic libraries can do this? or
perhaps some libraries that can implement AVX/SIMD depending on the
underlying CPU architecture?
—
Reply to this email directly, view it on GitHub
<#1668 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADMXFHVWEFOFKYEA5Y2ZAR3VRBQTRANCNFSM5Y4PBK7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@pludov Any ideas what's causing this test to fail? https://github.com/indilib/indi/runs/7307903893?check_suite_focus=true @eric-vickery made some changes to the Docker builds as well recently |
It seems indiserver, which is an external process, might not have the time to consider the driver stopped per the test request, and returns 0 as exit code. Because that's not part of the verification, I suggest the test disregard the exit code. Eventually another test should verify the conditions leading to the unexpected exit code, but it isn't really useful to check that in the situation considered. |
Hello ! The verification of the exit code is there to ensure indiserver did not terminate by a signal (like sigsegv...). However, the code of the test itself is reporting 0 instead of the signal. I've opened a PR here to fix that: #1699 Once merged, we'll now what the signal is, but it's probably nothing good (an uncatched sigpipe ? sigsegv ? ,... ) |
I reproduced the failure :-) It's an indiserver segmentation fault, that can occur when doing base64 encoding (it's a race condition) The fix is here: #1700 |
There is another way of getting shm on MacOS (posix vs sys5) that we can
test ( in shm_creat_fd.c).
Le mer. 22 juin 2022 à 19:47, Jasem Mutlaq ***@***.***> a
écrit :
… Yes I can confirm this as well:
***@***.*** ~ % ipcs -M
IPC status from <running system> as of Wed Jun 22 20:45:47 +03 2022
shminfo:
shmmax: 4194304 (max shared memory segment size)
shmmin: 1 (min shared memory segment size)
shmmni: 32 (max number of shared memory identifiers)
shmseg: 8 (max shared memory segments per process)
shmall: 1024 (max amount of shared memory in pages)
So we avoid using sharedBLOB completely on MacOS?
—
Reply to this email directly, view it on GitHub
<#1668 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADMXFHUI6ITO4KWW6EKSZLTVQNGSBANCNFSM5Y4PBK7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This MR adds support for local connection and fast memory buffer exchange in the indi protocol.
Working that way, data for BLOB (fits, stream, ...) needs no more being copied/base64 converted. The same memory is directly shared by driver to the client. This is an obvious win for CPU usage and latency, especially on low-end HW (rpi).
This works only for client/server located on the same unix/macos host. In that case, BLOB are written into buffers (shm or memfd) that are then exchanged by reference and shared & mmaped in the client. This is very lightweight compared to the existing base64 transfer
For remote connection, TCP is still supported for remote clients, unchanged. However, the shared buffer are used between driver and server, to eliminate handling there. In that case, the server handle the base64 encoding on a dedicated work thread.
Client that attempts to connect to localhost will be redirected to the local socket of the unix domain to take advantage. It is possible to target a specific unix socket path by using the syntax:
localhost:/path/to/socket
(an arg to indiserver is available to decide the path it listen on)For client, since the existing semantic allows them to modify the blob data and that is not compatible with the new mechanism (blob are received as readonly), I added a new function for the client to explicitely allow readonly blob data. This removes one more copy of the data:
There are further optimisations possible to avoid more memory copies, on the driver side (like producing the camera frame directly in the memory buffer instead of copying).
The MR also adds :
Feedback are welcome, especially for MacOS, since I don't have access to that system...