fix: changed internal architecture to avoid high memory usage#33
fix: changed internal architecture to avoid high memory usage#33gabrik wants to merge 4 commits intohrxi:masterfrom
Conversation
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
Signed-off-by: Gabriele Baldoni <gabriele.baldoni@gmail.com>
|
So from what I gather from a high-level architecture point of view, this polls the queue at a configured interval and sends all available data to Loki. Can you explain why you chose this design? To my understanding, this also means that if more than 512 log messages ( Previously, the background task sent off logs immediately when they were produced, if nothing else was currently being sent, or immediately after the current send. This means that the data was stored in the ` The PR (currently) doesn't do error checking of the HTTPS request, drops outstanding messages when the quit message is received, removes the exponential backoff, all of which can be fixed once we figure out the correct architecture. I'm going to try to figure out why you see the increased memory usage with the old design… |
|
(Sorry, I seem to have forgotten to press the send button.) |
Yes, we can log that they are dropped but nor more that that, but both channel size and sleep period can be configured, so its the user to figure out its use-case.
If there is an HTTP error and we keep the messages the memory will grow, so sooner or later they need to be drop, so better to drop immediately, an error log can be emitted.
Well seems that messages where never dropped even if they where sent, not sure why tbh. If you are able to fix the memory issues with the current (main) architecture I'm fine to test it. |
Basically, I'm interested why you chose this particular design, to understand whether it's worthwhile changing to it. E.g. I wouldn't change the design solely to fix a bug with the old implementation, I'd prefer to fix the implementation instead. |
It more straight forward compare to current (main) design, that's the only reason. |
Based on the finding form #9.
RAM usage (10minutes, same load on the process, same log level)
before:

logs on loki:

after:
its around 1/10th with this patch, for the same amount of data pushed to Loki.
I replaced
tokio::mpscwithflumebecause the latter does not take&mut selfin thetry_recv().The backoff time is configurable,
I can make also the channel size configurablechannel size configurable, still via the builder.I introduced an API breaking changeIntroducedtask::spawn(task.start()instead oftask::spawn(task), but I think I can make it compatible just a matter of playing a bit with the builder.BackgroudTaskFutureto deal with it.I can provide macOS Instrument traces if needed, did not upload them as they are 100s of MBs.