Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions GitLab/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@
The Orka GitLab integration enables you to use [Orka by MacStadium][orka] in your GitLab CI/CI pipelines.
Learn how to configure the [GitLab Shell executor](shell-executor.md) or the [GitLab Custom executor](custom-executor.md).

For common issues and solutions, see the [Troubleshooting guide](troubleshooting.md).

[orka]: https://www.macstadium.com/orka
331 changes: 331 additions & 0 deletions GitLab/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,331 @@
# Troubleshooting the GitLab Orka Integration

This guide covers common issues and solutions when using the GitLab [Custom executor][custom] with [Orka][orka].

## Setting environment variables

The integration reads configuration from environment variables. There are two ways to set them:

**Option A — GitLab CI/CD Variables** (available to the build only)
Go to Settings > CI/CD > Variables and add each variable. This is the right place for variables like `ORKA_TOKEN` and `ORKA_CONFIG_NAME` that the runner scripts need during job execution.

**Option B — Docker container startup** (available inside the container)
Pass variables when starting the runner container:
```
docker run --env ORKA_ENDPOINT="http://10.221.188.20" ...
```
Use this for variables that need to be present at the container level, like `ORKA_ENDPOINT`.

For sensitive variables like `ORKA_TOKEN`, enable the **Masked** option in GitLab so the value is hidden in job logs.

For `ORKA_SSH_KEY_FILE`, prefer mounting the key file directly into the runner container (Option B) rather than storing the private key contents as a GitLab variable.

## Authentication issues

### Error: "unauthorized" or "401"

**Symptoms:**
- VM deployment fails with authentication errors
- `orka3` commands return "unauthorized"

**Causes:**
- `ORKA_TOKEN` is invalid or expired

**Solutions:**

1. Generate a new service account token:
```bash
orka3 serviceaccount token <service-account-name>
```

2. Update `ORKA_TOKEN` with the new token. See [Setting environment variables](#setting-environment-variables).

**Note:** Service account tokens are valid for 1 year by default. For custom duration, use `--duration` flag. Some Kubernetes control planes (e.g., EKS) do not allow long-lived tokens. In that case, use the `--no-expiration` flag instead.

### Error: "config not found" or "no such host"

**Symptoms:**
- CLI commands fail before authentication
- "dial tcp: lookup" errors

**Causes:**
- `ORKA_ENDPOINT` is not set or malformed
- Network connectivity issues to Orka API

**Solutions:**

1. Verify the endpoint format (include protocol, no trailing slash). See [Setting environment variables](#setting-environment-variables).
```
# Correct format
http://10.221.188.20

# Incorrect formats
10.221.188.20 # Missing protocol
http://10.221.188.20/ # Trailing slash
```

2. If the endpoint is correct but commands still fail, see [Runner cannot reach Orka endpoint](#runner-cannot-reach-orka-endpoint) for connectivity troubleshooting.

## VM deployment failures

### Error: "VM deployment failed"

**Symptoms:**
- prepare.sh exits with "VM deployment failed"
- Deployment attempts exhausted

**Causes:**
- `ORKA_CONFIG_NAME` doesn't exist or is misspelled
- No available nodes with sufficient resources

**Solutions:**

1. If the error says "config does not exist", check the spelling of `ORKA_CONFIG_NAME` in your GitLab CI/CD Variables. Create the config if needed:
```bash
orka3 vm-config create <config-name> --image <image-name> --cpu <count>
```

2. Check available node resources:
```bash
orka3 node list
```

3. Set `VM_DEPLOYMENT_ATTEMPTS` to your desired retry count. See [Setting environment variables](#setting-environment-variables).

### Error: "Invalid ip" or "Invalid port"

**Symptoms:**
- VM deploys but connection info extraction fails
- "Invalid ip: null" in logs

**Causes:**
- VM deployment returned unexpected JSON format
- VM is in a failed state

**Solutions:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't really a solution for this as it is just a symptom. So one needs to be able to find the root cause and fix it.
Usually if the VM information cannot be extracted, this means there are bigger issues.
It is a good suggestion to ask people to try to deploy a VM manually, but we do not give them anything actionable that can fix the issue.


This usually indicates a deeper infrastructure issue, not something the troubleshooting steps can directly fix.

Deploy a VM manually and inspect the JSON output:
```bash
orka3 vm deploy test-vm --config "$ORKA_CONFIG_NAME" -o json
```

If the manual deploy also returns unexpected output or fails, contact [MacStadium Support][support] with the full error.

Delete the test VM after inspection:
```bash
orka3 vm delete test-vm
```

## SSH connection issues

### Error: "Waited 30 seconds for sshd to start"

**Symptoms:**
- VM deploys successfully
- SSH connection times out after 30 seconds

**Causes:**
- SSH is not enabled on the base image
- SSH key not configured on the VM
- VM is still booting

**Solutions:**

Since the runner automatically deletes failed VMs, deploy a VM manually to troubleshoot:

1. Deploy a test VM:
```bash
# Connection details (IP and SSH port) are in the JSON output
orka3 vm deploy test-debug --config "$ORKA_CONFIG_NAME" -o json
```

2. Connect via Screen Sharing or VNC to check:
- System Preferences > Sharing > Remote Login is enabled
- Your public key is in `~/.ssh/authorized_keys`

3. Test SSH manually:
```bash
ssh -i ~/.ssh/orka_deployment_key -p <PORT> admin@<VM_IP> "echo ok"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runner deletes the VMs that fail.
So here we need to suggest deploying a VM manually and trying to connect to it.

```

4. Clean up:
```bash
orka3 vm delete test-debug
```

### Error: "Permission denied (publickey)"

**Symptoms:**
- SSH connection is refused
- "Permission denied" in logs

**Causes:**
- SSH key has a passphrase (not supported)
- Wrong SSH user
- SSH key not in VM's authorized_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the key is wrong

- SSH key doesn't match the one registered in the VM's authorized_keys

**Solutions:**

1. Verify the SSH key has no passphrase:
```bash
# This should NOT prompt for a passphrase
ssh-keygen -y -f /path/to/key
```

2. If the key has a passphrase, generate a new one without:
```bash
ssh-keygen -t ed25519 -f ~/.ssh/orka_key -N ""
```

3. Verify `ORKA_VM_USER` matches the user on the VM (default: `admin`). See [Setting environment variables](#setting-environment-variables).

4. Deploy a test VM and verify the public key is in `~/.ssh/authorized_keys`.

## Environment variable issues

### Error: "unbound variable" or blank values

**Symptoms:**
- Script fails immediately
- Variables are empty

**Causes:**
- Required environment variables not configured in GitLab

**Solutions:**

See [Setting environment variables](#setting-environment-variables) for how to configure these. Verify all required variables are set:

| Variable | Required | Description |
|----------|----------|-------------|
| `ORKA_TOKEN` | Yes | Service account token |
| `ORKA_ENDPOINT` | Yes | Orka API URL |
| `ORKA_CONFIG_NAME` | Yes | VM config template name |
| `ORKA_SSH_KEY_FILE` | Yes | Private SSH key contents. Recommended: mount the key file into the runner container rather than storing key contents as a GitLab variable. See [Setting environment variables](#setting-environment-variables). |
| `ORKA_VM_USER` | No | SSH user (default: `admin`) |
| `ORKA_VM_NAME_PREFIX` | No | VM name prefix (default: `gl-runner`) |
| `VM_DEPLOYMENT_ATTEMPTS` | No | Retry count (default: `1`) |

For sensitive variables like `ORKA_TOKEN` and `ORKA_SSH_KEY_FILE`, enable the "Masked" option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not recommend passing the whole file via the the Gitlab UI, but rather mounting the file inside the container.


## Network and connectivity issues

### Runner cannot reach Orka endpoint

**Symptoms:**
- "Connection refused" or "Connection timed out"
- curl to endpoint fails

**Causes:**
- Runner is not on the same network as Orka
- VPN not connected
- Firewall blocking traffic

**Solutions:**

1. Test connectivity from the runner environment:
```bash
curl -s -o /dev/null -w "%{http_code}" "$ORKA_ENDPOINT/api/v1/cluster-info"
```

2. If using VPN, verify your connection using your [IP plan][ip-plan] details.

3. For Docker-based runners, ensure the container has network access to the Orka endpoint.

### IP mapping issues

**Symptoms:**
- VM deploys but SSH connects to wrong IP
- "No route to host" errors

**Causes:**
- Private/public IP mismatch
- settings.json not configured for IP mapping

**Solutions:**

If your network requires IP mapping, create `/var/custom-executor/settings.json`:
```json
{
"mappings": [
{
"private_host": "10.221.188.100",
"public_host": "203.0.113.100"
}
]
}
```

See [template-settings.md](template-settings.md) for configuration details.

## Job execution issues

### Build script fails

**Symptoms:**
- Job fails during run.sh
- Error is from your CI/CD script, not the integration

**Note:** The integration distinguishes between:
- **Build failures**: Your script failed (returns script exit code)
- **System failures**: Infrastructure failed (returns exit code 1)

If your build script fails, the issue is in your script, not the integration. Test your script on a standalone Orka VM.

### Job hangs or times out

**Symptoms:**
- Job runs but never completes
- GitLab times out the job

**Causes:**
- Long-running process without output
- SSH connection dropped

**Solutions:**

1. For long jobs, add periodic output to prevent GitLab timeout.

2. Consider breaking long jobs into smaller stages.

3. Increase GitLab job timeout in project settings if needed.

## Cleanup issues

### Orphaned VMs

**Symptoms:**
- VMs remain after job completion

**Causes:**
- Runner crashed before cleanup
- Network issue during cleanup

**Solutions:**

Delete orphaned VMs manually:
```bash
orka3 vm delete <vm-name>
```

## Getting help

If you're still experiencing issues:

1. Check the [Orka documentation][orka-docs] for platform-specific guidance
2. Review GitLab Runner [logs][runner-logs]: `gitlab-runner --debug run`
3. Contact [MacStadium Support][support] with:
- Error messages and logs
- Environment details (Runner version, Orka version)
- Steps to reproduce

[custom]: https://docs.gitlab.com/runner/executors/custom.html
[orka]: https://support.macstadium.com/hc/en-us/articles/29904434271387-Orka-Overview
[orka-docs]: https://support.macstadium.com/hc/en-us
[ip-plan]: https://support.macstadium.com/hc/en-us/articles/28230867289883-IP-Plan
[masked-variables]: https://docs.gitlab.com/ee/ci/variables/#mask-a-cicd-variable
[runner-logs]: https://docs.gitlab.com/runner/faq/#how-can-i-get-a-debug-log
[support]: https://support.macstadium.com/