|
| 1 | +--- |
| 2 | +title: Configuring Elasticsearch Cross-Cluster Replication for high availability |
| 3 | +shortTitle: Elasticsearch Cross-Cluster Replication |
| 4 | +intro: 'You can make search more resilient during maintenance, failovers, and upgrades on a high availability deployment by enabling Elasticsearch Cross-Cluster Replication (CCR).' |
| 5 | +versions: |
| 6 | + ghes: '>=3.19' |
| 7 | +contentType: how-tos |
| 8 | +category: |
| 9 | + - Scale your instance |
| 10 | +--- |
| 11 | + |
| 12 | +## About Elasticsearch Cross-Cluster Replication |
| 13 | + |
| 14 | +{% data variables.product.prodname_ghe_server %} uses Elasticsearch to power search across issues, pull requests, repositories, the projects and releases pages, and the counts shown throughout the web interface. Because search is central to the product, the reliability of Elasticsearch directly affects the day-to-day administration of your instance. |
| 15 | + |
| 16 | +In a high availability (HA) configuration, {% data variables.product.prodname_ghe_server %} uses a leader/follower model. The primary appliance receives all writes and traffic, and replica appliances stay in sync as read-only standbys that can take over if the primary fails. For more information, see [AUTOTITLE](/admin/monitoring-and-managing-your-instance/configuring-high-availability/about-high-availability-configuration). |
| 17 | + |
| 18 | +In earlier releases, Elasticsearch did not support this leader/follower model directly. To replicate search data, {% data variables.product.prodname_ghe_server %} ran a single Elasticsearch cluster that spanned the primary and replica appliances. This approach worked, but it introduced a class of problems: Elasticsearch could move a primary shard (the shard responsible for receiving and validating writes) onto a replica appliance. If that replica was then taken offline for maintenance, the instance could enter a locked state, because the replica waited for Elasticsearch to become healthy while Elasticsearch could not become healthy until the replica rejoined. |
| 19 | + |
| 20 | +Elasticsearch Cross-Cluster Replication (CCR) removes this dependency. Instead of one cluster spanning every appliance, each appliance runs as an independent single-node Elasticsearch cluster. CCR then replicates index data between these clusters using a natively supported leader/follower pattern. Data is copied only after it has been durably persisted to the underlying Lucene segments, so replicas always follow data that has been safely written. As a result, a critical primary shard can no longer end up stranded on a read-only replica. |
| 21 | + |
| 22 | +### Benefits |
| 23 | + |
| 24 | +* **Fewer locked upgrades and maintenance windows.** Removing the circular dependency between the primary and replica appliances during maintenance reduces the risk of an instance becoming stuck. |
| 25 | +* **Stronger data protection.** Data is replicated only after it is durably saved, which helps prevent index corruption during failovers. |
| 26 | +* **Simpler operations.** The pattern reduces the need for manual index repairs that previously occurred when maintenance steps were performed out of order. |
| 27 | + |
| 28 | +### Availability |
| 29 | + |
| 30 | +Elasticsearch CCR is supported beginning with {% data variables.product.prodname_ghe_server %} 3.19.1. The feature is optional. {% data variables.product.company_short %} plans to make CCR the default HA search architecture over the following two years, so you have time to test it and provide feedback before it becomes the default. |
| 31 | + |
| 32 | +## Requirements |
| 33 | + |
| 34 | +Before you enable CCR, confirm the following. |
| 35 | + |
| 36 | +* Your instance runs {% data variables.product.prodname_ghe_server %} 3.19.1 or later. |
| 37 | +* Your instance is configured for high availability with at least two appliances (a primary and one or more replicas). |
| 38 | +* You have an updated {% data variables.product.prodname_ghe_server %} license that includes the Elasticsearch entitlement required for CCR. Contact {% data variables.contact.contact_enterprise_sales %} or {% data variables.contact.github_support %} to have your enterprise enabled for the new license, then download the updated license file. |
| 39 | + |
| 40 | +{% warning %} |
| 41 | + |
| 42 | +**Warning:** When CCR is enabled, the upgrade preflight check requires a valid CCR-enabled license. If the flag is enabled and the license check fails, the upgrade will not proceed. Make sure your updated license is installed before you enable the feature or upgrade. If you are unsure whether your license includes the Elasticsearch entitlement, contact {% data variables.contact.github_support %}. |
| 43 | + |
| 44 | +{% endwarning %} |
| 45 | + |
| 46 | +## Enabling Elasticsearch Cross-Cluster Replication |
| 47 | + |
| 48 | +{% note %} |
| 49 | + |
| 50 | +**Note:** The migration may take a significant amount of time depending on the size of your instance, because search data is consolidated onto the primary before replication restarts. Plan to enable CCR during a maintenance window, and test the process in a non-production environment first. For more information, see [AUTOTITLE](/admin/upgrading-your-instance). |
| 51 | + |
| 52 | +{% endnote %} |
| 53 | + |
| 54 | +1. Contact {% data variables.contact.github_support %} and request access to the new HA search architecture. {% data variables.product.company_short %} will enable your enterprise so that you can download the required CCR-enabled license. |
| 55 | +1. Download your updated license and upload it to your instance. For more information, see [AUTOTITLE](/admin/overview/managing-your-github-enterprise-license). |
| 56 | +1. On the primary appliance, enable the feature. |
| 57 | + |
| 58 | + ```shell |
| 59 | + ghe-config app.elasticsearch.ccr true |
| 60 | + ``` |
| 61 | + |
| 62 | +1. Apply the configuration by running a configuration run, or by upgrading the instance to 3.19.1 or later. |
| 63 | + |
| 64 | + ```shell |
| 65 | + ghe-config-apply |
| 66 | + ``` |
| 67 | + |
| 68 | +1. When the instance restarts, Elasticsearch migrates the installation to the new replication method. This migration consolidates search data onto the primary, ends the cluster that previously spanned appliances, and restarts replication using CCR. During the migration, {% data variables.product.prodname_ghe_server %} attaches followers to your existing search indexes and enables an auto-follow rule so that any indexes created in the future are followed automatically. |
| 69 | + |
| 70 | +## Using Elasticsearch Cross-Cluster Replication |
| 71 | + |
| 72 | +### Verifying replication |
| 73 | + |
| 74 | +After the migration completes, search continues to function normally and no change is required in how users search. To confirm replication health, generate a support bundle, which includes CCR status information for review. For more information, see [AUTOTITLE](/support/contacting-github-support/providing-data-to-github-support). |
| 75 | + |
| 76 | +### Failover and disaster recovery |
| 77 | + |
| 78 | +You continue to use the standard high availability replication utilities to manage replicas and to fail over. For more information, see [AUTOTITLE](/admin/monitoring-and-managing-your-instance/configuring-high-availability/initiating-a-failover-to-your-replica-appliance) and [AUTOTITLE](/admin/monitoring-and-managing-your-instance/configuring-high-availability/recovering-a-high-availability-configuration). |
| 79 | + |
| 80 | +After a failover with CCR enabled, the promoted appliance becomes the new leader for search, and replicas re-follow its indexes as part of the standard recovery process. If you encounter errors related to search replication during or after a failover, contact {% data variables.contact.github_support %}. |
| 81 | + |
| 82 | +### Disabling Elasticsearch Cross-Cluster Replication |
| 83 | + |
| 84 | +{% warning %} |
| 85 | + |
| 86 | +**Warning:** Do not disable CCR on a production instance without guidance from {% data variables.contact.github_support %}. Disabling CCR is not a routine self-service operation. Turning the feature off can trigger removal of replica Elasticsearch data as part of returning to the previous mode. |
| 87 | + |
| 88 | +{% endwarning %} |
| 89 | + |
| 90 | +If you need to return to the previous search architecture, contact {% data variables.contact.github_support %} before making any changes. {% data variables.product.company_short %} will help you confirm that your license, replication state, and upgrade path are handled safely. |
| 91 | + |
| 92 | +## Further reading |
| 93 | + |
| 94 | +* [AUTOTITLE](/admin/monitoring-and-managing-your-instance/configuring-high-availability/about-high-availability-configuration) |
| 95 | +* [AUTOTITLE](/admin/administering-your-instance/administering-your-instance-from-the-command-line/command-line-utilities) |
| 96 | +* [How we rebuilt the search architecture for high availability in GitHub Enterprise Server](https://github.blog/engineering/architecture-optimization/how-we-rebuilt-the-search-architecture-for-high-availability-in-github-enterprise-server/) on the {% data variables.product.prodname_blog %} |
0 commit comments