Skip to content

Fix correctness bugs across roles#40

Merged
Oddly merged 2 commits intomainfrom
fix/role-bugs
Mar 4, 2026
Merged

Fix correctness bugs across roles#40
Oddly merged 2 commits intomainfrom
fix/role-bugs

Conversation

@Oddly
Copy link
Copy Markdown
Owner

@Oddly Oddly commented Mar 1, 2026

Fixes several correctness issues found during a deep audit of the role tasks and templates.

The rolling upgrade trigger checked ansible_facts.packages['elasticsearch'][0].version without first verifying the elasticsearch key exists, causing a Jinja2 KeyError on fresh installs. The repos role used string comparison for distribution_major_version which evaluates "10" >= "9" as false — switched to | int. The logstash noauto handler was missing the freshstart guard present in all other role handlers. The logstash_ident mutate block was duplicated in both the filter and output templates with inconsistent hostname sources — removed the duplicate from 90-output.conf.j2. All three beat templates hardcoded ssl.verification_mode: none, now controlled by beats_ssl_verification_mode defaulting to certificate. The ES audit log appender in log4j2.properties was emitted unconditionally even with security disabled — now guarded by elasticsearch_logging_audit and elasticsearch_security.

Closes #33

@Oddly Oddly force-pushed the fix/role-bugs branch 10 times, most recently from 2fedb00 to 565aa2d Compare March 4, 2026 15:09
Oddly added 2 commits March 4, 2026 20:06
The rolling upgrade trigger in elasticsearch/tasks/main.yml accessed
ansible_facts.packages['elasticsearch'][0].version without first
checking that the 'elasticsearch' key exists, which throws a Jinja2
KeyError on fresh installs. Added the missing guard to match the
pre-upgrade block at line 168.

The repos role used string comparison for distribution_major_version
which breaks on Rocky Linux 10 ("10" < "9" lexicographically). Switched
to int comparison. Fixed the same pattern in the elasticstack_default
molecule converge.

The logstash "Restart Logstash noauto" handler was missing the
freshstart guard that all other role handlers have, which could cause
restarts during initial installation.

The logstash_ident mutate block was emitted in both 50-filter.conf and
90-output.conf with inconsistent hostname values (inventory_hostname vs
ansible_facts.hostname). Removed the duplicate from 90-output.conf.

All three beat templates hardcoded ssl.verification_mode: none, silently
disabling certificate validation even when a CA is deployed. Replaced
with a configurable beats_ssl_verification_mode variable defaulting to
"certificate".

The ES audit log appender in log4j2.properties was emitted
unconditionally even when elasticsearch_security is false. Wrapped it in
a guard and added an elasticsearch_logging_audit default.
The setup-passwords command makes HTTP calls to ES but had no retry
logic. If ES briefly becomes unavailable between the cluster health
check and the setup-passwords call (e.g. during cert reload), the
whole converge fails. Now retries up to 10 times with 15s delay.

Also writes to a temp file first and moves on success, so a partial
failure doesn't leave a corrupt passwords file that would cause the
creates guard to skip retries.
@Oddly Oddly merged commit 8785acf into main Mar 4, 2026
30 of 32 checks passed
@Oddly Oddly deleted the fix/role-bugs branch March 4, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix bugs found in role tasks and templates

1 participant