-
Notifications
You must be signed in to change notification settings - Fork 15
docs(gitlab): add comprehensive troubleshooting guide #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
8f1870d
6d9ffb6
f2f9bd1
b007022
d42df70
f721870
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,331 @@ | ||
| # Troubleshooting the GitLab Orka Integration | ||
|
|
||
| This guide covers common issues and solutions when using the GitLab [Custom executor][custom] with [Orka][orka]. | ||
|
|
||
| ## Setting environment variables | ||
|
|
||
| The integration reads configuration from environment variables. There are two ways to set them: | ||
|
|
||
| **Option A — GitLab CI/CD Variables** (available to the build only) | ||
| Go to Settings > CI/CD > Variables and add each variable. This is the right place for variables like `ORKA_TOKEN` and `ORKA_CONFIG_NAME` that the runner scripts need during job execution. | ||
|
|
||
| **Option B — Docker container startup** (available inside the container) | ||
| Pass variables when starting the runner container: | ||
| ``` | ||
| docker run --env ORKA_ENDPOINT="http://10.221.188.20" ... | ||
| ``` | ||
| Use this for variables that need to be present at the container level, like `ORKA_ENDPOINT`. | ||
|
|
||
| For sensitive variables like `ORKA_TOKEN`, enable the **Masked** option in GitLab so the value is hidden in job logs. | ||
|
|
||
| For `ORKA_SSH_KEY_FILE`, prefer mounting the key file directly into the runner container (Option B) rather than storing the private key contents as a GitLab variable. | ||
|
|
||
| ## Authentication issues | ||
|
|
||
| ### Error: "unauthorized" or "401" | ||
|
|
||
| **Symptoms:** | ||
| - VM deployment fails with authentication errors | ||
| - `orka3` commands return "unauthorized" | ||
|
|
||
| **Causes:** | ||
| - `ORKA_TOKEN` is invalid or expired | ||
|
|
||
| **Solutions:** | ||
|
|
||
| 1. Generate a new service account token: | ||
| ```bash | ||
| orka3 serviceaccount token <service-account-name> | ||
| ``` | ||
|
|
||
| 2. Update `ORKA_TOKEN` with the new token. See [Setting environment variables](#setting-environment-variables). | ||
|
|
||
| **Note:** Service account tokens are valid for 1 year by default. For custom duration, use `--duration` flag. Some Kubernetes control planes (e.g., EKS) do not allow long-lived tokens. In that case, use the `--no-expiration` flag instead. | ||
|
|
||
| ### Error: "config not found" or "no such host" | ||
|
|
||
| **Symptoms:** | ||
| - CLI commands fail before authentication | ||
| - "dial tcp: lookup" errors | ||
|
|
||
| **Causes:** | ||
| - `ORKA_ENDPOINT` is not set or malformed | ||
| - Network connectivity issues to Orka API | ||
|
|
||
| **Solutions:** | ||
|
|
||
| 1. Verify the endpoint format (include protocol, no trailing slash). See [Setting environment variables](#setting-environment-variables). | ||
| ``` | ||
| # Correct format | ||
| http://10.221.188.20 | ||
|
|
||
| # Incorrect formats | ||
| 10.221.188.20 # Missing protocol | ||
| http://10.221.188.20/ # Trailing slash | ||
| ``` | ||
|
|
||
| 2. If the endpoint is correct but commands still fail, see [Runner cannot reach Orka endpoint](#runner-cannot-reach-orka-endpoint) for connectivity troubleshooting. | ||
|
|
||
| ## VM deployment failures | ||
|
|
||
| ### Error: "VM deployment failed" | ||
|
|
||
| **Symptoms:** | ||
| - prepare.sh exits with "VM deployment failed" | ||
| - Deployment attempts exhausted | ||
|
|
||
| **Causes:** | ||
| - `ORKA_CONFIG_NAME` doesn't exist or is misspelled | ||
| - No available nodes with sufficient resources | ||
|
|
||
| **Solutions:** | ||
|
|
||
| 1. If the error says "config does not exist", check the spelling of `ORKA_CONFIG_NAME` in your GitLab CI/CD Variables. Create the config if needed: | ||
| ```bash | ||
| orka3 vm-config create <config-name> --image <image-name> --cpu <count> | ||
| ``` | ||
|
|
||
| 2. Check available node resources: | ||
| ```bash | ||
| orka3 node list | ||
| ``` | ||
|
|
||
| 3. Set `VM_DEPLOYMENT_ATTEMPTS` to your desired retry count. See [Setting environment variables](#setting-environment-variables). | ||
|
|
||
| ### Error: "Invalid ip" or "Invalid port" | ||
|
|
||
| **Symptoms:** | ||
| - VM deploys but connection info extraction fails | ||
| - "Invalid ip: null" in logs | ||
|
|
||
| **Causes:** | ||
| - VM deployment returned unexpected JSON format | ||
| - VM is in a failed state | ||
|
|
||
| **Solutions:** | ||
|
|
||
| This usually indicates a deeper infrastructure issue, not something the troubleshooting steps can directly fix. | ||
|
|
||
| Deploy a VM manually and inspect the JSON output: | ||
| ```bash | ||
| orka3 vm deploy test-vm --config "$ORKA_CONFIG_NAME" -o json | ||
| ``` | ||
|
|
||
| If the manual deploy also returns unexpected output or fails, contact [MacStadium Support][support] with the full error. | ||
|
|
||
| Delete the test VM after inspection: | ||
| ```bash | ||
| orka3 vm delete test-vm | ||
| ``` | ||
|
|
||
| ## SSH connection issues | ||
|
|
||
| ### Error: "Waited 30 seconds for sshd to start" | ||
|
|
||
| **Symptoms:** | ||
| - VM deploys successfully | ||
| - SSH connection times out after 30 seconds | ||
|
|
||
| **Causes:** | ||
| - SSH is not enabled on the base image | ||
| - SSH key not configured on the VM | ||
| - VM is still booting | ||
|
|
||
| **Solutions:** | ||
|
|
||
| Since the runner automatically deletes failed VMs, deploy a VM manually to troubleshoot: | ||
|
|
||
| 1. Deploy a test VM: | ||
| ```bash | ||
| # Connection details (IP and SSH port) are in the JSON output | ||
| orka3 vm deploy test-debug --config "$ORKA_CONFIG_NAME" -o json | ||
| ``` | ||
|
|
||
| 2. Connect via Screen Sharing or VNC to check: | ||
| - System Preferences > Sharing > Remote Login is enabled | ||
| - Your public key is in `~/.ssh/authorized_keys` | ||
|
|
||
| 3. Test SSH manually: | ||
| ```bash | ||
| ssh -i ~/.ssh/orka_deployment_key -p <PORT> admin@<VM_IP> "echo ok" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The runner deletes the VMs that fail. |
||
| ``` | ||
|
|
||
| 4. Clean up: | ||
| ```bash | ||
| orka3 vm delete test-debug | ||
| ``` | ||
|
|
||
| ### Error: "Permission denied (publickey)" | ||
|
|
||
| **Symptoms:** | ||
| - SSH connection is refused | ||
| - "Permission denied" in logs | ||
|
|
||
| **Causes:** | ||
| - SSH key has a passphrase (not supported) | ||
| - Wrong SSH user | ||
| - SSH key not in VM's authorized_keys | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or the key is wrong |
||
| - SSH key doesn't match the one registered in the VM's authorized_keys | ||
|
|
||
| **Solutions:** | ||
|
|
||
| 1. Verify the SSH key has no passphrase: | ||
| ```bash | ||
| # This should NOT prompt for a passphrase | ||
| ssh-keygen -y -f /path/to/key | ||
| ``` | ||
|
|
||
| 2. If the key has a passphrase, generate a new one without: | ||
| ```bash | ||
| ssh-keygen -t ed25519 -f ~/.ssh/orka_key -N "" | ||
| ``` | ||
|
|
||
| 3. Verify `ORKA_VM_USER` matches the user on the VM (default: `admin`). See [Setting environment variables](#setting-environment-variables). | ||
|
|
||
| 4. Deploy a test VM and verify the public key is in `~/.ssh/authorized_keys`. | ||
|
|
||
| ## Environment variable issues | ||
|
|
||
| ### Error: "unbound variable" or blank values | ||
|
|
||
| **Symptoms:** | ||
| - Script fails immediately | ||
| - Variables are empty | ||
|
|
||
| **Causes:** | ||
| - Required environment variables not configured in GitLab | ||
|
|
||
| **Solutions:** | ||
|
|
||
| See [Setting environment variables](#setting-environment-variables) for how to configure these. Verify all required variables are set: | ||
|
|
||
| | Variable | Required | Description | | ||
| |----------|----------|-------------| | ||
| | `ORKA_TOKEN` | Yes | Service account token | | ||
| | `ORKA_ENDPOINT` | Yes | Orka API URL | | ||
| | `ORKA_CONFIG_NAME` | Yes | VM config template name | | ||
| | `ORKA_SSH_KEY_FILE` | Yes | Private SSH key contents. Recommended: mount the key file into the runner container rather than storing key contents as a GitLab variable. See [Setting environment variables](#setting-environment-variables). | | ||
| | `ORKA_VM_USER` | No | SSH user (default: `admin`) | | ||
| | `ORKA_VM_NAME_PREFIX` | No | VM name prefix (default: `gl-runner`) | | ||
| | `VM_DEPLOYMENT_ATTEMPTS` | No | Retry count (default: `1`) | | ||
|
|
||
| For sensitive variables like `ORKA_TOKEN` and `ORKA_SSH_KEY_FILE`, enable the "Masked" option. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would not recommend passing the whole file via the the Gitlab UI, but rather mounting the file inside the container. |
||
|
|
||
| ## Network and connectivity issues | ||
|
|
||
| ### Runner cannot reach Orka endpoint | ||
|
|
||
| **Symptoms:** | ||
| - "Connection refused" or "Connection timed out" | ||
| - curl to endpoint fails | ||
|
|
||
| **Causes:** | ||
| - Runner is not on the same network as Orka | ||
| - VPN not connected | ||
| - Firewall blocking traffic | ||
|
|
||
| **Solutions:** | ||
|
|
||
| 1. Test connectivity from the runner environment: | ||
| ```bash | ||
| curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/api/v1/cluster-info" | ||
| ``` | ||
|
|
||
| 2. If using VPN, verify your connection using your [IP plan][ip-plan] details. | ||
|
|
||
| 3. For Docker-based runners, ensure the container has network access to the Orka endpoint. | ||
|
|
||
| ### IP mapping issues | ||
|
|
||
| **Symptoms:** | ||
| - VM deploys but SSH connects to wrong IP | ||
| - "No route to host" errors | ||
|
|
||
| **Causes:** | ||
| - Private/public IP mismatch | ||
| - settings.json not configured for IP mapping | ||
|
|
||
| **Solutions:** | ||
|
|
||
| If your network requires IP mapping, create `/var/custom-executor/settings.json`: | ||
| ```json | ||
| { | ||
| "mappings": [ | ||
| { | ||
| "private_host": "10.221.188.100", | ||
| "public_host": "203.0.113.100" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| See [template-settings.md](template-settings.md) for configuration details. | ||
|
|
||
| ## Job execution issues | ||
|
|
||
| ### Build script fails | ||
|
|
||
| **Symptoms:** | ||
| - Job fails during run.sh | ||
| - Error is from your CI/CD script, not the integration | ||
|
|
||
| **Note:** The integration distinguishes between: | ||
| - **Build failures**: Your script failed (returns script exit code) | ||
| - **System failures**: Infrastructure failed (returns exit code 1) | ||
|
|
||
| If your build script fails, the issue is in your script, not the integration. Test your script on a standalone Orka VM. | ||
|
|
||
| ### Job hangs or times out | ||
|
|
||
| **Symptoms:** | ||
| - Job runs but never completes | ||
| - GitLab times out the job | ||
|
|
||
| **Causes:** | ||
| - Long-running process without output | ||
| - SSH connection dropped | ||
|
|
||
| **Solutions:** | ||
|
|
||
| 1. For long jobs, add periodic output to prevent GitLab timeout. | ||
|
|
||
| 2. Consider breaking long jobs into smaller stages. | ||
|
|
||
| 3. Increase GitLab job timeout in project settings if needed. | ||
|
|
||
| ## Cleanup issues | ||
|
|
||
| ### Orphaned VMs | ||
|
|
||
| **Symptoms:** | ||
| - VMs remain after job completion | ||
|
|
||
| **Causes:** | ||
| - Runner crashed before cleanup | ||
| - Network issue during cleanup | ||
|
|
||
| **Solutions:** | ||
|
|
||
| Delete orphaned VMs manually: | ||
| ```bash | ||
| orka3 vm delete <vm-name> | ||
| ``` | ||
|
|
||
| ## Getting help | ||
|
|
||
| If you're still experiencing issues: | ||
|
|
||
| 1. Check the [Orka documentation][orka-docs] for platform-specific guidance | ||
| 2. Review GitLab Runner [logs][runner-logs]: `gitlab-runner --debug run` | ||
| 3. Contact [MacStadium Support][support] with: | ||
| - Error messages and logs | ||
| - Environment details (Runner version, Orka version) | ||
| - Steps to reproduce | ||
|
|
||
| [custom]: https://docs.gitlab.com/runner/executors/custom.html | ||
| [orka]: https://support.macstadium.com/hc/en-us/articles/29904434271387-Orka-Overview | ||
| [orka-docs]: https://support.macstadium.com/hc/en-us | ||
| [ip-plan]: https://support.macstadium.com/hc/en-us/articles/28230867289883-IP-Plan | ||
| [masked-variables]: https://docs.gitlab.com/ee/ci/variables/#mask-a-cicd-variable | ||
| [runner-logs]: https://docs.gitlab.com/runner/faq/#how-can-i-get-a-debug-log | ||
| [support]: https://support.macstadium.com/ | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't really a solution for this as it is just a symptom. So one needs to be able to find the root cause and fix it.
Usually if the VM information cannot be extracted, this means there are bigger issues.
It is a good suggestion to ask people to try to deploy a VM manually, but we do not give them anything actionable that can fix the issue.