Skip to content

Unit Test Environment Isolation Gaps #3480

@brianaydemir

Description

@brianaydemir

Tests were run inside the dev container with two non-default characteristics:

  1. /etc/pelican/pelican.yaml is configured for a local test federation:

    Federation:
      DiscoveryUrl: https://discovery:8444
      DirectorUrl:  https://director:8444
      RegistryUrl:  https://registry:8444
    Server:
      TLSKey:              /certs/tls.key
      TLSCertificateChain: /certs/tls.crt
  2. The container runs as a non-root user (the dev container image default is root).

Tests that do not override these settings inherit ambient values and fail. Most failures are isolation problems; TestUpdateCert also contains a real test bug.


1. Server.Hostname = "dev" (container hostname)

Affects:
All tests that call config.InitServer or fed_test_utils.NewFedTest without overriding Server.Hostname.

Symptom:

x509: certificate is valid for localhost, …, not dev

Server.Hostname is not set in any config file. Pelican defaults to the system hostname, which is dev. The test TLS certificate does not cover dev.

Fix:
Set Server.Hostname = "localhost" in the shared initServerForTest helper (director/test_helpers_test.go) and in NewFedTest (fed_test_utils/fed.go).


2. Federation URLs from /etc/pelican/pelican.yaml

Affects:

  • TestCompareMetadata/disabled-when-no-discovery-url (director/metadata_comparison_test.go)
  • TestParseRemoteAsPUrl/test_valid_path_that_falls_back_to_configured_director_for_discovery (client/main_test.go)
  • TestGetCacheHostnameFromToken (broker/token_utils_test.go)
  • TestInitServerUrl (config/config_test.go)

Symptoms:

dial tcp: lookup discovery on …:53: no such host
Token issuer https://your-registry.com/… doesn't start with https://registry:8444/…
expected: "https://example.com"  actual: "https://director:8444"

pelican.yaml sets Federation.DiscoveryUrl, DirectorUrl, and RegistryUrl to the test-federation hosts. Tests that don't clear or override these params pick them up from Viper. discovery, director, and registry are not resolvable from within the dev container.

Fix:
Clear or override all Federation.*Url params in each test (or in a shared setup function) before calling InitFederation or any function that reads them from Viper.


3. TLS paths from /etc/pelican/pelican.yaml (non-root)

Affects:

  • TestS3OriginConfig (xrootd/origin_test.go)
  • TestCopyCertificates (xrootd/xrootd_config_test.go)

Symptoms:

RSA type private key in PKCS #8 form is not allowed for /certs/tls.key.
Use an ECDSA key instead.
rename /certs/tls.crt /certs/tls.crt.orig: permission denied

pelican.yaml points Server.TLSKey and Server.TLSCertificateChain at /certs/. Running as non-root, those files are not writable. /certs/tls.key is also RSA; the code requires ECDSA.

Fix:
Generate a temporary ECDSA key/cert pair in t.TempDir() and override both Server.TLSKey and Server.TLSCertificateChain before calling InitServer.


4. TestMultiuserFileSystem_BasicOperations requires CAP_SETGID

Symptom:

failed to set supplementary groups: setgroups(0): operation not permitted

runAsUser unconditionally calls threadSetgroups, even when secondaryGIDs is nil (converted to []uint32{}), making a setgroups(0) syscall that requires CAP_SETGID. The test's os.Getuid() == 0 guard does not prevent this — when running as non-root, the non-root code path still reaches runAsUser.

Fix:
Probe for CAP_SETGID (or attempt a dry-run setgroups) at test start and call t.Skip if the capability is absent.


5. TestUpdateCert: isolation gap + real test bug

Symptom:
The entire web_ui package times out after 10 minutes.

Two independent problems:

  1. Isolation gap — same as issue 3: pelican.yaml points at /certs/tls.crt, which is not writable under non-root, causing the cert-update path to fail. Fix: override both TLS params to writable temp files.

  2. Real test bug — after the cert-path failure, egrp.Wait() blocks indefinitely because a goroutine is stuck sending on an unbuffered doneChan that no one drains. Fix: buffer doneChan (capacity 1), or use close-based signaling.


Brian A: With a tip of the hat to Copilot.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtestImprovements to the test suite

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions