Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(backend): concurrently load messages for combined lists using child processes #1452

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

mercihabam
Copy link
Member

No description provided.

@mercihabam mercihabam changed the title enh(backend): concurrently load messages for combined lists using child processes perf(backend): concurrently load messages for combined lists using child processes Feb 18, 2025
@mercihabam mercihabam merged commit 423f5af into cypht-org:master Feb 18, 2025
6 of 7 checks passed
@kroky
Copy link
Member

kroky commented Feb 19, 2025

Hi @mercihabam ,

Sorry to bother, the idea here is nice but I see a number of problems with this merged PR:

  1. Introducing a major concept as subprocesses, react event loop and event async/await JS-style promises in a community driven php app should have at least partial discussion before implementing or at least merging. For example, feat(backend): adding idle and polling mechanism to imap #1196 had some major impact on the project and underwent some serious discussion before agreeing on the exact thing to do. It might also have some partial overlap with what you are trying to achieve here.
  2. Historically, cypht achieved parallelism by ajax requests loading each imap source as one call. Provided we sort out the session lock problem, that would give us a better implementation of the same thing - parallel communication with remote services using the existing php server processes. I am not sure when did this change but now we try to load several sources from one php process thus having you go into the direction of using subprocesses. That would have some serious deployment differences compared to original cypht and would at least deserve a setting to turn on or off. Some issues to mention: memory usage, process resource usage (predetermined set of php-fpm processes, for example, would potentially explode with number of sub-processes spawning for each mailbox potentially leading to memory problems), deserailization issues with session and cache (e.g. missing classes for new deployments, changed parent sessions before/during child process spawning, file-based cache locking concurrency problems).
  3. The new message_list worker file you introduced has absolutely no security and can be called from anybody on the internet. Passing the proper arguments, they can achieve something that shouldn't be possible using normal cypht interface.
  4. The worker itself contains code that used to be in getCombinedMessageList which seems wrong to me - it has some internal references to mailbox connection which shouldn't be there. The Mailbox class was introduced to act as a bridge and be transparent to the modules and other classes. If something needs to check if it is an imap or not, it should be in the mailbox class and only then it sends to the proper methods below. I think part of the code needs to be moved to a (possibly new) method in the mailbox class.
  5. Is there really a need for subprocesses when there is only one source in the combined list?

I propose we follow up with these changes:

  1. Add a setting to allow the parallel or previous sequential mailbox access, so users can tweak if they run in a specific environment where this parallelism causes issues.
  2. My preference would really be to leave the parallelism and subprocesses to the browsers with separate ajax calls and make sure our php processes are straightforward and easy to maintain. We could have fixed the IO blocking problems and other lock issues (like session lock), so multiple calls could be executed in parallel. This should have spared the need to maintain subprocesses on the server-side which has its drawbacks. Since this has already been pushed, I propose we keep it but tighten it up - e.g. add security checks to the workers - make sure they can be executed only locally as a subprocess and not via the web.

@mercihabam
Copy link
Member Author

@kroky you're right this is an important change that should be discussed. However, I want to bring to your attention these few points:

  • Historically, Cypht achieved parallelism through Ajax requests sent sequentially, but this did not give enough control over the final output displayed. i.e. the combined pages could not use the same row length as other pages, we could not paginate and see all data, and sorting and filtering could not be done based on the server data but only on the small data available to the client. So we wanted to overcome these limits by having all data originating from one source.
  • While this approach could result in system resource consumption, I think it does not present any difference compared to when sources are loaded sequentially via the main process.
  • Practically, nobody would ever be able to run the message_list worker file with the correct input outside of its current context as long as the cache and session objects are secured (which I believe they are). But ideally, that restriction should be made explicit.
  • When there only is one source in the combined_list, we could choose to occupy the main process and avoid creating a subprocess, but considering this comparison: "leave the main process free and create a subprocess which we occupy vs occupy the main process and do not create a subprocess" I do not see how that impact the workflow rather than adding complexity to the code.

@kroky
Copy link
Member

kroky commented Feb 20, 2025

Yes, I agree, we can continue this way and see how it goes. I assume we will hit some system issues but we can fix. What about the preference suggestion? I do think that is a good idea to keep certain environments happy - e.g. windows host, various docker setups?

@mercihabam
Copy link
Member Author

I don't think a new preference is necessary. This implementation has no issues with any OS, nor do we need the previous method of handling sources sequentially.

@indridieinarsson
Copy link
Contributor

The code from this PR does not seem to work when run standalone (i.e. not part of tiki) in a docker environment. I tried pulling it and building an image and running it. It runs, and single inboxes display messages as usual, but the combined pages are empty. The workers/messages_list.php exits with error code 255 in all cases.

@mercihabam
Copy link
Member Author

@indridieinarsson, this should work in any environment! I assume it's something very basic causing it to malfunction in your container. Could you debug it further?

@indridieinarsson
Copy link
Contributor

@mercihabam :
With the development version of php.ini (in the docker image), I get this:

Warning: require(lib/framework.php): Failed to open stream: No such file or directory in /usr/local/share/cypht/modules/imap/workers/messages_list.php on line 9
cypht-1  | NOTICE: PHP message:  worker exited normally
cypht-1  | Fatal error: Uncaught Error: Failed opening required 'lib/framework.php' (include_path='.:/usr/local/lib/php') in /usr/local/share/cypht/modules/imap/workers/messages_list.php:9
cypht-1  | Stack trace:
cypht-1  | #0 {main}
cypht-1  |   thrown in /usr/local/share/cypht/modules/imap/workers/messages_list.php on line 9
cypht-1  | 127.0.0.1 -  21/Feb/2025:14:16:35 +0000 "POST /index.php" 200
cypht-1  | 172.22.0.12 - - [21/Feb/2025:14:16:35 +0000] "POST /?page=message_list&list_path=combined_inbox HTTP/1.1" 200 56 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"

@mercihabam
Copy link
Member Author

Seems like, it complains because it doesn't recognize the absolute path given. Could you try to prepend the required file with /usr/local/share/cypht on the mentioned line?

@indridieinarsson
Copy link
Contributor

Seems like, it complains because it doesn't recognize the absolute path given. Could you try to prepend the required file with /usr/local/share/cypht on the mentioned line?

@mercihabam : I need to prepend /usr/local/share/cypht to all the require statements and also have
const APP_PATH = '/usr/local/share/cypht/';
then it works, sort of. Still some errors in the logs (haven't checked yet), but at least I see some messages in the combined view.

@mercihabam
Copy link
Member Author

Yes like that.

@indridieinarsson
Copy link
Contributor

@mercihabam The worker process is still finishing with errors in some cases. Getting this now:
OTICE: PHP message: Worker error output: PHP Fatal error: Uncaught Error: Undefined constant "SITE_ID" in /usr/local/share/cypht/lib/cache.php:590
cypht-1 | Stack trace:
cypht-1 | #0 /usr/local/share/cypht/lib/cache.php(516): Hm_Cache->key_hash('imap67b6f947e6b...')
cypht-1 | #1 /usr/local/share/cypht/lib/cache.php(495): Hm_Cache->redis_get('imap67b6f947e6b...', false)
cypht-1 | #2 /usr/local/share/cypht/modules/imap/hm-imap.php(61): Hm_Cache->get('imap67b6f947e6b...')
cypht-1 | #3 /usr/local/share/cypht/modules/imap/hm-imap.php(67): Hm_IMAP_List::get_cache(Object(Hm_Cache), '67b6f947e6b5c')
cypht-1 | #4 /usr/local/share/cypht/modules/imap/workers/messages_list.php(46): Hm_IMAP_List::get_connected_mailbox('67b6f947e6b5c', Object(Hm_Cache))
cypht-1 | #5 {main}

@mercihabam
Copy link
Member Author

Those missing constants have to be defined in the worker file the same way we have declared the APP_PATH.

@mercihabam
Copy link
Member Author

The worker process is still finishing with errors in some cases.

Are there any cases where it succeeds?

@indridieinarsson
Copy link
Contributor

The worker process is still finishing with errors in some cases.

Are there any cases where it succeeds?

well, if you don't use redis cache, then it works, at least sometimtes. I also had some cases without redis, where one mailbox failed, then nothing was displayed.

@mercihabam
Copy link
Member Author

@indridieinarsson, you are the one capable of diagnosing and fixing what's wrong since you are the one reporting problems.

@indridieinarsson
Copy link
Contributor

@indridieinarsson, you are the one capable of diagnosing and fixing what's wrong since you are the one reporting problems.

I can look into some things. But there are still quite a few issues with this, and the proposed fixes are quite hacky. Like declaring const APP_PATH = '/usr/local/share/cypht/'; is obviously not a proper fix, it needs to be declared in the same way as in the rest of the program. And all (or at least some) other global constants need to be propagated into the child process as well.
@mercihabam @kroky Might it perhaps be a good idea to backpedal with this PR, undo it and do proper fixes to the outstanding issues?

@@ -57,7 +57,10 @@
"twbs/bootstrap": "^5.3",
"twbs/bootstrap-icons": "^1.11",
"webklex/composer-info": "^0.0.1",
"zbateson/mail-mime-parser": "^2.4"
"zbateson/mail-mime-parser": "^2.4",
"react/promise": "^3.2.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this list alphabetically sorted

@mercihabam
Copy link
Member Author

I can look into some things. But there are still quite a few issues with this, and the proposed fixes are quite hacky. Like declaring const APP_PATH = '/usr/local/share/cypht/'; is obviously not a proper fix, it needs to be declared in the same way as in the rest of the program. And all (or at least some) other global constants need to be propagated into the child process as well. @mercihabam @kroky Might it perhaps be a good idea to backpedal with this PR, undo it and do proper fixes to the outstanding issues?

@indridieinarsson, the instructions I provided were not definitive fixes but rather a way to explore possible solutions for your environment. I understand that you might not be able to make it work definitively, which is fine, but please avoid making such suggestions. In cases where you’re unable to move forward, it’s better to request help instead.

@marclaporte
Copy link
Member

@mercihabam @indridieinarsson Can you please find each other https://gitter.im/cypht-org/community and organize a screenshare?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants