From ae66f7b6cc6ff798e600c2c9c3596ce69d607df2 Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 12:39:18 -0400 Subject: [PATCH 1/7] Initial Repair Services Cron Job Documentation --- .../installation/system_customization.md | 21 ++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/_docs/sysadmin/installation/system_customization.md b/_docs/sysadmin/installation/system_customization.md index 0a197ced..b4e18b50 100644 --- a/_docs/sysadmin/installation/system_customization.md +++ b/_docs/sysadmin/installation/system_customization.md @@ -28,10 +28,25 @@ You may want to back up more of `/var/local/submitty` to save configurations and ## Capture cron error messages -The `submitty_daemon` user runs the [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py) -script. Console output from this script can be emailed to a sysadmin to help ensure that errors can be reported and addressed. +To ensure the reliability of the various Submitty services, such as the WebSocket server, their health status is monitored and restarted hourly via the [sbin/repair_services.sh](https://github.com/Submitty/Submitty/blob/master/sbin/repair_services.sh) script run by the submitty_daemon user. This script leverages `systemctl` along with various health-check utility scripts to verify the active state of these core services, triggering a restart if an inactive state is detected. -The first line should be set as `MAILTO=` with a valid email address. For example: +Service failures can occur for various reasons, including unhandled exceptions, memory leaks, port binding issues, or OS-level disruptions such as resource exhaustion. All failures are logged with their relevant timestamp, source, and last output within the `/var/logs/services` directory for the given day in the format `YYYYMMDD.txt`. + +To disable this auto-repair mechanism, comment out the relevant line in the source `.setup/submitty_crontab` file within your repository. Since the crontab is auto-generated during installation, any changes must be followed by a re-run of `submitty_install` to persist them. + +**Note: This mechanism should only be disabled with caution in production environments.** + +```bash +# In .setup/submitty_crontab, comment out the repair_services.sh line: +# 0 * * * * submitty_daemon sudo /usr/local/submitty/sbin/repair_services.sh + +# Then re-apply the configuration: +submitty_install +``` + +The `submitty_daemon` user runs a variety of other scripts, such as [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py) to send pending emails every minute. Console output from these scripts can be emailed to a sysadmin to help ensure that errors can be reported and addressed. + +The first line of the relevant script should be set as `MAILTO=` with a valid email address, as shown below. ``` MAILTO=sysadmins@lists.myuniversity.edu * * * * * python3 /usr/local/submitty/sbin/send_email.py From d1d0520a38b12ab052e7c65243c82a2d4303e272 Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 12:43:48 -0400 Subject: [PATCH 2/7] WebSocket Debugging Documentation --- _docs/sysadmin/installation/system_customization.md | 2 +- _docs/sysadmin/troubleshooting/system_debugging.md | 11 ++++++++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/_docs/sysadmin/installation/system_customization.md b/_docs/sysadmin/installation/system_customization.md index b4e18b50..cc9e6af5 100644 --- a/_docs/sysadmin/installation/system_customization.md +++ b/_docs/sysadmin/installation/system_customization.md @@ -30,7 +30,7 @@ You may want to back up more of `/var/local/submitty` to save configurations and To ensure the reliability of the various Submitty services, such as the WebSocket server, their health status is monitored and restarted hourly via the [sbin/repair_services.sh](https://github.com/Submitty/Submitty/blob/master/sbin/repair_services.sh) script run by the submitty_daemon user. This script leverages `systemctl` along with various health-check utility scripts to verify the active state of these core services, triggering a restart if an inactive state is detected. -Service failures can occur for various reasons, including unhandled exceptions, memory leaks, port binding issues, or OS-level disruptions such as resource exhaustion. All failures are logged with their relevant timestamp, source, and last output within the `/var/logs/services` directory for the given day in the format `YYYYMMDD.txt`. +Service failures can occur for various reasons, including unhandled exceptions, memory leaks, port binding issues, or OS-level disruptions such as resource exhaustion. All failures are logged with their relevant timestamp, source, and last output within the `/var/log/services` directory for the given day in the format `YYYYMMDD.txt`. To disable this auto-repair mechanism, comment out the relevant line in the source `.setup/submitty_crontab` file within your repository. Since the crontab is auto-generated during installation, any changes must be followed by a re-run of `submitty_install` to persist them. diff --git a/_docs/sysadmin/troubleshooting/system_debugging.md b/_docs/sysadmin/troubleshooting/system_debugging.md index 7a09ba28..f7e5e896 100644 --- a/_docs/sysadmin/troubleshooting/system_debugging.md +++ b/_docs/sysadmin/troubleshooting/system_debugging.md @@ -47,7 +47,7 @@ redirect_from: ``` tail -n 50 /var/local/submitty/site_errors/.log - ``` + ``` * Look for errors in the apache log: @@ -62,6 +62,15 @@ redirect_from: /var/log/nginx/error.log ``` +* Look for errors in the daily service outage log + + ``` + /var/log/services/YYYYMMDD.txt + ``` + + + + * Check the SSL keys / certificates for apache & nginx. Look for ssl key & certificate files specified in the enabled `.conf` files for apache & nginx: From 256c132dd2b99bd7663257721a506fad27b7797e Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 12:56:52 -0400 Subject: [PATCH 3/7] Automated Grading Debugging Documentation --- _docs/developer/development_instructions/automated_grading.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_docs/developer/development_instructions/automated_grading.md b/_docs/developer/development_instructions/automated_grading.md index 43258f2f..e63aab0f 100644 --- a/_docs/developer/development_instructions/automated_grading.md +++ b/_docs/developer/development_instructions/automated_grading.md @@ -124,6 +124,8 @@ To debug new features for autograding, it can be helpful to run `submitty_autograding_shipper.py` and `submitty_autograding_worker.py` interactively and inspect the output. +_NOTE: A cron job runs hourly to detect autograding shipper/worker outages on both local and remote machines. To avoid interference during debugging, this job should be disabled before proceeding. See [Capture Cron Error Messages](/sysadmin/installation/system_customization#capture-cron-error-messages) for instructions on disabling the script._ + To do this: 1. Stop the daemons (on each server, as appropriate) From 14b08e7e44aaac15eaf45edb46373dfb039e8b13 Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 12:59:55 -0400 Subject: [PATCH 4/7] Improved System Customization Layout --- _docs/sysadmin/installation/system_customization.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_docs/sysadmin/installation/system_customization.md b/_docs/sysadmin/installation/system_customization.md index cc9e6af5..1e428a05 100644 --- a/_docs/sysadmin/installation/system_customization.md +++ b/_docs/sysadmin/installation/system_customization.md @@ -28,14 +28,12 @@ You may want to back up more of `/var/local/submitty` to save configurations and ## Capture cron error messages -To ensure the reliability of the various Submitty services, such as the WebSocket server, their health status is monitored and restarted hourly via the [sbin/repair_services.sh](https://github.com/Submitty/Submitty/blob/master/sbin/repair_services.sh) script run by the submitty_daemon user. This script leverages `systemctl` along with various health-check utility scripts to verify the active state of these core services, triggering a restart if an inactive state is detected. +To ensure the reliability of the various Submitty services, such as the WebSocket server, their health status is monitored and restarted hourly via the [sbin/repair_services.sh](https://github.com/Submitty/Submitty/blob/master/sbin/repair_services.sh) script run by the `submitty_daemon` user. This script leverages `systemctl` along with various health-check utility scripts to verify the active state of these services, triggering a restart if an inactive state is detected. Service failures can occur for various reasons, including unhandled exceptions, memory leaks, port binding issues, or OS-level disruptions such as resource exhaustion. All failures are logged with their relevant timestamp, source, and last output within the `/var/log/services` directory for the given day in the format `YYYYMMDD.txt`. To disable this auto-repair mechanism, comment out the relevant line in the source `.setup/submitty_crontab` file within your repository. Since the crontab is auto-generated during installation, any changes must be followed by a re-run of `submitty_install` to persist them. -**Note: This mechanism should only be disabled with caution in production environments.** - ```bash # In .setup/submitty_crontab, comment out the repair_services.sh line: # 0 * * * * submitty_daemon sudo /usr/local/submitty/sbin/repair_services.sh @@ -44,6 +42,8 @@ To disable this auto-repair mechanism, comment out the relevant line in the sour submitty_install ``` +_Note: This mechanism should only be disabled with caution in production environments._ + The `submitty_daemon` user runs a variety of other scripts, such as [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py) to send pending emails every minute. Console output from these scripts can be emailed to a sysadmin to help ensure that errors can be reported and addressed. The first line of the relevant script should be set as `MAILTO=` with a valid email address, as shown below. From f65c45d725836d63b247880999fcc811488dcb1c Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 13:01:52 -0400 Subject: [PATCH 5/7] Whitespace fix --- _docs/sysadmin/troubleshooting/system_debugging.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/_docs/sysadmin/troubleshooting/system_debugging.md b/_docs/sysadmin/troubleshooting/system_debugging.md index f7e5e896..fab1f1ea 100644 --- a/_docs/sysadmin/troubleshooting/system_debugging.md +++ b/_docs/sysadmin/troubleshooting/system_debugging.md @@ -47,7 +47,7 @@ redirect_from: ``` tail -n 50 /var/local/submitty/site_errors/.log - ``` + ``` * Look for errors in the apache log: @@ -68,9 +68,6 @@ redirect_from: /var/log/services/YYYYMMDD.txt ``` - - - * Check the SSL keys / certificates for apache & nginx. Look for ssl key & certificate files specified in the enabled `.conf` files for apache & nginx: From c8695518c0936753a7372c31dcbe7f30d046d64e Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 13:02:29 -0400 Subject: [PATCH 6/7] More whitespace --- _docs/sysadmin/troubleshooting/system_debugging.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_docs/sysadmin/troubleshooting/system_debugging.md b/_docs/sysadmin/troubleshooting/system_debugging.md index fab1f1ea..db2050aa 100644 --- a/_docs/sysadmin/troubleshooting/system_debugging.md +++ b/_docs/sysadmin/troubleshooting/system_debugging.md @@ -47,7 +47,7 @@ redirect_from: ``` tail -n 50 /var/local/submitty/site_errors/.log - ``` + ``` * Look for errors in the apache log: From 543ea0feda66aeda24795f7a2ad5a35972db590f Mon Sep 17 00:00:00 2001 From: Jeffrey Cordero Date: Wed, 4 Jun 2025 13:03:57 -0400 Subject: [PATCH 7/7] Refactoring --- _docs/sysadmin/installation/system_customization.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_docs/sysadmin/installation/system_customization.md b/_docs/sysadmin/installation/system_customization.md index 1e428a05..21222974 100644 --- a/_docs/sysadmin/installation/system_customization.md +++ b/_docs/sysadmin/installation/system_customization.md @@ -44,7 +44,7 @@ submitty_install _Note: This mechanism should only be disabled with caution in production environments._ -The `submitty_daemon` user runs a variety of other scripts, such as [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py) to send pending emails every minute. Console output from these scripts can be emailed to a sysadmin to help ensure that errors can be reported and addressed. +The `submitty_daemon` user runs a variety of other scripts, such as [sbin/send_email.py](https://github.com/Submitty/Submitty/blob/master/sbin/send_email.py) to send pending emails every minute. Console output from these scripts can be emailed to a sysadmin to help ensure that errors can be reported and addressed. The first line of the relevant script should be set as `MAILTO=` with a valid email address, as shown below. ```