-
Notifications
You must be signed in to change notification settings - Fork 0
Roles elasticsearch
Ansible role for installing, configuring, and managing Elasticsearch. Handles cluster formation, TLS certificate management, security setup (users, passwords, HTTPS), rolling upgrades (8.x to 9.x), JVM tuning, and systemd service management.
In a full-stack deployment, this role should run second (after repos). It initializes the certificate authority, generates passwords for the elastic superuser, and creates the cluster that all other roles depend on. The first host in the elasticsearch inventory group becomes the CA host and the initial master-eligible node.
graph TD
PRE[Gather package facts] --> CHK{Targeting 9.x<br/>with 8.x < 8.19?}
CHK -->|Yes| FAIL([FAIL: upgrade path<br/>violation])
CHK -->|No| A[Include shared defaults]
A --> B[Install package]
B --> C{First install?}
C -->|Yes| D[Generate CA + node certs]
C -->|No| E{Cert expiring?}
E -->|Yes| D
E -->|No| F[Write elasticsearch.yml]
D --> F
F --> G[Configure JVM heap + options]
G --> H[Start service]
H --> I{Security enabled?}
I -->|Yes & first run| J[Setup passwords]
I -->|No| K[Wait for cluster health]
J --> K
K --> L{Rolling upgrade?}
L -->|Yes| M[Upgrade node-by-node]
L -->|No| N([Done])
M --> N
style FAIL fill:#f44336,stroke:#333,color:#fff
style CHK fill:#2196f3,stroke:#333,color:#fff
style D fill:#ffd700,stroke:#333
style J fill:#e8478b,stroke:#333,color:#fff
style M fill:#ff9800,stroke:#333,color:#fff
- Minimum Ansible version:
2.18 - The
reposrole must run first to configure package repositories
Whether to enable and start the Elasticsearch service.
elasticsearch_enable: true # defaultLet the role manage elasticsearch.yml. Set to false if you manage the configuration file externally (e.g. through a separate template or config management tool).
elasticsearch_manage_yaml: true # defaultCreate a timestamped backup of elasticsearch.yml before overwriting it.
elasticsearch_config_backup: false # defaultName of the Elasticsearch cluster. All nodes with the same cluster name will join the same cluster.
elasticsearch_clustername: elasticsearch # defaultEnable machine learning features on this node. Set to false on dedicated data-only or coordinating-only nodes, or if ML is not licensed.
elasticsearch_ml_enabled: true # defaultHostname or IP used for Elasticsearch API health checks during convergence. The role polls this address to verify the node is up before proceeding.
elasticsearch_api_host: localhost # defaultProtocol for Elasticsearch HTTP API. The role automatically overrides this to https when security is enabled, so you typically don't set this manually.
elasticsearch_http_protocol: http # defaultFilesystem path for Elasticsearch data storage (indices, shards).
elasticsearch_datapath: /var/lib/elasticsearch # defaultCreate the data directory if it does not exist. Enable this when using a non-default data path that might not be pre-created by the package.
elasticsearch_create_datapath: false # defaultFilesystem path for Elasticsearch log files.
elasticsearch_logpath: /var/log/elasticsearch # defaultCreate the log directory if it does not exist.
elasticsearch_create_logpath: false # defaultPath to the Elasticsearch configuration directory.
elasticsearch_conf_dir: /etc/elasticsearch/ # defaultLinux group that owns Elasticsearch files.
elasticsearch_group: elasticsearch # defaultJVM heap size in GB. Auto-calculated as half of system RAM, capped at 30 GB, with a minimum of 1 GB. Override this to set an explicit value (e.g. "4" for 4 GB).
# default: auto-calculated
elasticsearch_heap: "{{ [[(ansible_facts.memtotal_mb // 1024) // 2, 30] | min, 1] | max }}"
# explicit override
elasticsearch_heap: 16Enable heap calculation verification — prints the calculated heap value during the run for debugging.
elasticsearch_check_calculation: false # defaultDirectory for JVM heap dump files on OutOfMemoryError.
elasticsearch_heap_dump_path: /var/lib/elasticsearch # defaultAdditional JVM parameters appended to jvm.options.d/. Use for GC tuning, debug flags, or system property overrides. Each line becomes a separate JVM option.
elasticsearch_jvm_custom_parameters: '' # defaultExample:
elasticsearch_jvm_custom_parameters: |
-XX:+HeapDumpOnOutOfMemoryError
-Djava.io.tmpdir=/var/tmp/elasticsearchConfigure PAM limits for the elasticsearch user (open files, max processes). Required for production deployments.
elasticsearch_pamlimits: true # defaultApply JNA temporary directory workaround for systems where /tmp is mounted with noexec. Sets jna.tmpdir to a directory under the Elasticsearch data path.
elasticsearch_jna_workaround: false # defaultEnable Elasticsearch security features: TLS for transport and HTTP, user authentication, and RBAC. This is the main security toggle — when enabled, the role generates certificates, initializes passwords, and configures HTTPS.
elasticsearch_security: true # defaultEnable TLS on the Elasticsearch HTTP interface (port 9200). Only relevant when elasticsearch_security is also true.
elasticsearch_http_security: true # defaultBootstrap password for the elastic superuser during initial cluster setup. Only used once during the very first security initialization. After that, the generated password (stored in elasticstack_initial_passwords) takes over.
elasticsearch_bootstrap_pw: PleaseChangeMe # defaultTLS verification mode for inter-node transport communication. Use full to verify both the certificate chain and the hostname, certificate to verify only the chain, or none to disable verification (not recommended).
elasticsearch_ssl_verification_mode: full # defaultPassphrase protecting the Elasticsearch node's TLS private key. Each node should ideally have a unique passphrase in production.
elasticsearch_tls_key_passphrase: PleaseChangeMeIndividually # defaultMarker file that indicates the cluster has been initialized. The role checks for this file to avoid re-running security setup on subsequent runs.
elasticsearch_initialized_file: "{{ elasticstack_initial_passwords | dirname }}/cluster_initialized" # defaultValidity period in days for generated node TLS certificates. Default is 3 years (1095 days).
elasticsearch_cert_validity_period: 1095 # defaultNumber of days before certificate expiry at which the role will trigger automatic renewal.
elasticsearch_cert_expiration_buffer: 30 # defaultInternal flag set by the role when certificates are within the expiration buffer. Do not set this manually.
elasticsearch_cert_will_expire_soon: false # defaultThe role validates the upgrade path before any work begins. When elasticstack_release is 9 or higher and Elasticsearch is currently installed, the role checks that the installed version is at least 8.19.0. If it finds an older 8.x version, the play fails immediately — you must step through 8.19.x first. This matches Elastic's official upgrade requirements.
Skip rolling upgrade safety checks (shard allocation disable, synced flush, green health wait) and restart all nodes simultaneously. Only use this in non-production environments where you don't care about data availability during the upgrade.
elasticsearch_unsafe_upgrade_restart: false # defaultThese are used internally by the role. Do not set them in your inventory.
Tracks whether this is a fresh installation (package was just installed for the first time).
elasticsearch_freshstart:
changed: falseTracks whether security was just initialized on this run.
elasticsearch_freshstart_security:
changed: falseThe role validates that you have an odd number of master-eligible nodes. An even number makes split-brain possible. If you define elasticsearch_node_types and the resulting master count is even, the play fails with an error.
The default heap formula is min(max(memtotal_mb / 1024 / 2, 1), 30) — half of system RAM in GB, floored at 1 GB, capped at 30 GB. The 30 GB cap follows Elastic's recommendation to stay below the compressed ordinary object pointers (oops) threshold. Set elasticsearch_check_calculation: true to print the calculated value during a run without making changes.
The role sets nofile=65535 for the elasticsearch user via PAM (/etc/security/limits.d/). This is required for production but was historically unreliable in the RPM post-install scripts. Controlled by elasticsearch_pamlimits (default true).
On systems where /tmp is mounted with noexec, Java Native Access fails to load native libraries. Set elasticsearch_jna_workaround: true to redirect JNA's temp directory to {{ elasticsearch_datapath }}/tmp via the sysconfig file (/etc/default/elasticsearch on Debian, /etc/sysconfig/elasticsearch on RedHat).
Elasticsearch 8.19+ and 9.x use Type=notify in their systemd unit, relying on a systemd-entrypoint binary to send READY=1. In container environments (Docker-in-Docker, some LXC setups), the sd_notify socket doesn't work — systemd never receives the ready signal, waits 900 seconds, then kills Elasticsearch even though it's fully operational.
The role detects container environments (virtualization_type in container, docker) and drops in a systemd override that changes Type=exec, bypassing sd_notify entirely. The role's own health-check retries handle readiness detection instead.
The role manages the Elasticsearch keystore (/etc/elasticsearch/elasticsearch.keystore) for TLS certificate passphrases:
- Removes the
autoconfiguration.password_hashkey that ES 8.x writes during package install (it conflicts with the role's bootstrap password) - Sets
bootstrap.passwordfor initial security setup - Sets
xpack.security.http.ssl.keystore.secure_passwordandtruststore.secure_password(when HTTP security enabled) - Sets
xpack.security.transport.ssl.keystore.secure_passwordandtruststore.secure_password(when transport security enabled)
Each key is only written if its value has changed, and removed if the corresponding security feature is disabled.
The security setup includes multiple retry loops to handle the window between Elasticsearch starting and the security subsystem being fully ready:
| Check | Retries | Delay | Total wait |
|---|---|---|---|
| Bootstrap API responsiveness | 5 | 10s | ~50s |
| Bootstrap cluster health | 5 | 10s | ~50s |
| Elastic password API check | 20 | 10s | ~200s |
| Post-watermark cluster health | 20 | 10s | ~200s |
| Wait for port (per node) | — | — | 600s timeout |
In container environments (virtualization_type in container, docker, lxc), the role sets ultra-lenient disk watermarks (low: 97%, high: 98%, flood: 99%) to prevent Elasticsearch from refusing to allocate shards due to limited disk space. This is set both during security initialization and during rolling upgrades. The role also runs rm -rf /var/cache/* to free disk space in containers.
The "Restart Elasticsearch" handler has four guards that prevent it from firing when a restart would be redundant or harmful:
-
elasticsearch_enablemust be true - NOT during a fresh install (service already started naturally)
- NOT during security initialization (service already started)
- NOT after a rolling upgrade (upgrade did its own restart)
The handler also triggers a Kibana restart on all Kibana hosts (if elasticstack_full_stack is enabled) since Kibana may need to reconnect after an ES restart. This Kibana restart is skipped during CA renewal.
The role writes elasticsearch.yml and JVM options twice: once before the rolling upgrade (so the upgrade restart picks up new config in a single restart instead of requiring a second restart afterward), and once after all security initialization is complete (when all facts like elasticsearch_cluster_set_up are known). This prevents a double-restart that would otherwise occur during upgrades.
When the elasticsearch inventory group contains exactly one host, the template writes discovery.type: single-node and omits discovery.seed_hosts and cluster.initial_master_nodes. This avoids bootstrap issues on single-node clusters.
By default, Elasticsearch binds to ["_local_", "_site_"] (localhost and the site-local interface). Override with elasticsearch_network_host for custom binding. The template also supports http.publish_host and http.publish_port for multi-homed hosts.
The initial passwords file at /usr/share/elasticsearch/initial_passwords is generated by elasticsearch-setup-passwords auto -b. The role parses it with grep "PASSWORD <username> " | awk '{print $4}'. Other roles (Kibana, Logstash, Beats) delegate to the CA host to read their service passwords from this file.
The role enforces that elasticsearch_security must be true for Elasticsearch 8.x and later. Running ES 8+ without security is not supported by Elastic and the role will fail the play if you try.
| Tag | Purpose |
|---|---|
certificates |
Run all certificate-related tasks |
renew_ca |
Renew the certificate authority (triggers renewal of all dependent certs) |
renew_es_cert |
Renew only Elasticsearch node certificates |
GPL-3.0-or-later
Netways GmbH