|
| 1 | +FAILOVER(8) |
| 2 | +========== |
| 3 | + |
| 4 | +NAME |
| 5 | +---- |
| 6 | + |
| 7 | +failover - UPS Failover Driver |
| 8 | + |
| 9 | +SYNOPSIS |
| 10 | +-------- |
| 11 | + |
| 12 | +*failover* -h |
| 13 | + |
| 14 | +*failover* -a 'UPS_NAME' ['OPTIONS'] |
| 15 | + |
| 16 | +NOTE: This man page only documents the specific features of the failover driver. |
| 17 | +For information about the core driver, see linkman:nutupsdrv[8]. |
| 18 | + |
| 19 | +DESCRIPTION |
| 20 | +----------- |
| 21 | + |
| 22 | +The `failover` driver acts as a smart proxy for multiple "real" UPS drivers. It |
| 23 | +connects to and monitors these underlying UPS drivers through their local UNIX |
| 24 | +sockets (or Windows named pipes), continuously evaluating health and suitability |
| 25 | +for "primary" duty according to a set of user configurable rules and priorities. |
| 26 | + |
| 27 | +At any given time, `failover` designates one UPS driver as the *primary*, and |
| 28 | +presents its commands, variables and status to the outside world as if it were |
| 29 | +directly talking to that UPS. From the perspective of the clients (such as |
| 30 | +linkman:upsmon[8] or linkman:upsc[8]), the `failover` driver behaves like any |
| 31 | +single UPS, abstracting away the underlying redundancy, and allowing for |
| 32 | +seamless transitioning between all monitored UPS drivers and their datasets. |
| 33 | + |
| 34 | +The driver dynamically promotes or demotes the primary UPS driver based on: |
| 35 | + |
| 36 | +- Socket availability and communication status |
| 37 | +- Data freshness and UPS online/offline indicators |
| 38 | +- User-defined status filters (e.g., presence or absence of `OL`, `LB`, ...) |
| 39 | +- Administrative override via control commands (`force.primary`, `force.ignore`) |
| 40 | + |
| 41 | +If the current primary becomes unavailable or no longer meets the criteria, the |
| 42 | +driver automatically fails over to a more suitable driver. During transitions, |
| 43 | +it ensures that any data is switched out instantly, without the linkman:upsd[8] |
| 44 | +considering it as stale or the clients acting on any previously degraded status. |
| 45 | + |
| 46 | +When no suitable primary is available, a configurable fallback state is entered: |
| 47 | + |
| 48 | +- Keep last primary and declare the data as stale |
| 49 | +- Raise `ALARM` and declare the data as stale |
| 50 | +- Raise `ALARM` and set forced shutdown (`FSD`) |
| 51 | + |
| 52 | +Different communication media can be used to connect to individual UPS drivers |
| 53 | +(e.g., USB, Serial, Ethernet). `failover` communicates directly at the socket |
| 54 | +level and therefore does not rely on linkman:upsd[8] being active. |
| 55 | + |
| 56 | +EXTRA ARGUMENTS |
| 57 | +--------------- |
| 58 | + |
| 59 | +This driver supports the following settings: |
| 60 | + |
| 61 | +*port*='drivername-devicename,drivername2-devicename2,...':: |
| 62 | +Required. Specifies the local sockets (or Windows named pipes) of the underlying |
| 63 | +UPS drivers to be tracked. Entries must either be a path or follow the format |
| 64 | +`drivername-devicename`, as used by NUT's internal socket naming convention |
| 65 | +(e.g. `usbhid-ups-myups`). Multiple entries are comma-separated with no spaces. |
| 66 | + |
| 67 | +*inittime*='seconds':: |
| 68 | +Optional. Sets a grace period after driver startup during which the absence of a |
| 69 | +primary is tolerated. This allows time for underlying drivers to initialize. For |
| 70 | +networked connections or drivers that require "lock-picking" their communication |
| 71 | +protocol, consider increasing this value to accommodate potential longer delays. |
| 72 | +Defaults to 30 seconds. |
| 73 | + |
| 74 | +*deadtime*='seconds':: |
| 75 | +Optional. Sets a grace period in seconds after which a non-responsive UPS driver |
| 76 | +is considered dead. Defaults to 30 seconds. |
| 77 | + |
| 78 | +*relogtime*='seconds':: |
| 79 | +Optional. Time interval in which repeated connection failure logs are emitted |
| 80 | +for a UPS, reducing log spam during unstable conditions. Defaults to 5 seconds. |
| 81 | + |
| 82 | +*noprimarytime*='seconds':: |
| 83 | +Optional. Duration to wait without a suitable primary UPS driver before entering |
| 84 | +the configured fallback mode (`fsdmode`). Defaults to 15 seconds. |
| 85 | + |
| 86 | +*maxconnfails*='count':: |
| 87 | +Optional. Number of consecutive connection failures allowed per UPS driver |
| 88 | +before entering into the cooldown period (`coolofftime`). Defaults to 5. |
| 89 | + |
| 90 | +*coolofftime*='seconds':: |
| 91 | +Optional. Cooldown period during which the driver pauses reconnect attempts |
| 92 | +after exceeding `maxconnfails`. Defaults to 15 seconds. |
| 93 | + |
| 94 | +*fsdmode*='0|1|2':: |
| 95 | +Optional. Defines the behavior when no suitable primary UPS driver is found |
| 96 | +after `noprimarytime` has elapsed. Defaults to 0. |
| 97 | + |
| 98 | +- `0`: *Do not demote the last primary, but mark its data as stale.* This is |
| 99 | +similar to how a regular UPS driver would behave when it loses its connection to |
| 100 | +the target UPS device. linkman:upsmon[8] will act on the last known (online or |
| 101 | +not) status, and decide itself whether that UPS should be considered critical. |
| 102 | + |
| 103 | +- `1`: *Demote the primary, raise `ALARM`, and mark the data as stale after an |
| 104 | +additional few seconds have elapsed (ensuring full propagation).* This will |
| 105 | +cause linkman:upsmon[8] to detect that a device previously in an alarm state has |
| 106 | +lost its connection, consider the UPS driver critical, and possibly trigger a |
| 107 | +forced shutdown (`FSD`) due to depletion of `MINSUPPLIES`. |
| 108 | + |
| 109 | +- `2`: *Demote the primary, raise `ALARM`, and immediately set `FSD`.* This will |
| 110 | +set `FSD` from the driver side and preempt linkman:upsmon[8] from raising it |
| 111 | +itself. This mode is for setups where immediate shutdown is warranted, |
| 112 | +regardless of anything else, and getting `FSD` out to the clients as fast as |
| 113 | +just possible. |
| 114 | + |
| 115 | +*checkruntime*='0|1|2|3':: |
| 116 | +Optional. Controls how `battery.runtime` values are used to break ties between |
| 117 | +non-fully-online UPS devices **at priority 3 or lower**. Has no effect on |
| 118 | +initial priority selection or when `strictfiltering` is enabled. Defaults to 1. |
| 119 | + |
| 120 | +- `0`: *Disabled.* No runtime comparison is done. The first candidate with the |
| 121 | +best priority is selected according to the order of the port argument. |
| 122 | + |
| 123 | +- `1`: *Compare `battery.runtime`.* The UPS with the higher value is preferred. |
| 124 | +If the value is missing or invalid, the UPS cannot win the tie-break. |
| 125 | + |
| 126 | +- `2`: *Compare `battery.runtime.low`.* The UPS with the higher value is |
| 127 | +preferred. If the value is missing or invalid, the UPS cannot win the tie-break. |
| 128 | + |
| 129 | +- `3`: *Compare both variables strictly.* The UPS is preferred only if it has |
| 130 | +both a higher `battery.runtime` and `battery.runtime.low` value. If either is |
| 131 | +missing or invalid, the UPS cannot win the tie-break. |
| 132 | + |
| 133 | +*strictfiltering*='0|1':: Optional. If set to 1, only UPS drivers matching the |
| 134 | +configured status filters are considered for promotion to primary. If set to 0, |
| 135 | +the hard-coded default logic is also considered when no status filters match |
| 136 | +(read more about this in the section `PRIORITIES`). Defaults to 0. |
| 137 | + |
| 138 | +*status_have_any*='OL,CHRG,...':: |
| 139 | +Optional. If any of these comma-separated tokens are present in a UPS driver's |
| 140 | +`ups.status`, it passes this status filtering criteria. Defaults to unset. |
| 141 | + |
| 142 | +*status_have_all*='OL,CHRG,...':: |
| 143 | +Optional. All listed comma-separated tokens must be present in `ups.status` for |
| 144 | +the UPS driver to pass this status filtering criteria. Defaults to unset. |
| 145 | + |
| 146 | +*status_nothave_any*='OB,OFF,...':: |
| 147 | +Optional. If any of these comma-separated tokens are present in `ups.status`, |
| 148 | +the UPS driver does not pass this status filtering criteria. Defaults to unset. |
| 149 | + |
| 150 | +*status_nothave_all*='OB,LB,...':: |
| 151 | +Optional. If all of these comma-separated tokens are present in `ups.status`, |
| 152 | +the UPS driver does not pass this status filtering criteria. Defaults to unset. |
| 153 | + |
| 154 | +NOTE: The `status_*` arguments are primarily intended to adjust the weighting of |
| 155 | +UPS drivers, allowing some to be prioritized over others based on their status. |
| 156 | +For example, a driver reporting `OL` might be preferred over one reporting |
| 157 | +`ALARM OL`. While `strictfiltering` can be enabled, status filters are most |
| 158 | +effective when used in combination with the default set of connectivity-based |
| 159 | +`PRIORITIES`. For more details, see the respective section further below. |
| 160 | + |
| 161 | +IMPLEMENTATION |
| 162 | +-------------- |
| 163 | + |
| 164 | +The port argument in the linkman:ups.conf[5] should reference the local driver |
| 165 | +sockets (or Windows named pipes) that the "real" UPS drivers are using. A basic |
| 166 | +default setup with multiple drivers could look like this: |
| 167 | + |
| 168 | +------ |
| 169 | + [realups] |
| 170 | + driver = usbhid-ups |
| 171 | + port = auto |
| 172 | + |
| 173 | + [realups2] |
| 174 | + driver = usbhid-ups |
| 175 | + port = auto |
| 176 | + |
| 177 | + [failover] |
| 178 | + driver = failover |
| 179 | + port = usbhid-ups-realups,usbhid-ups-realups2 |
| 180 | +------ |
| 181 | + |
| 182 | +Any linkman:upsmon[8] clients would be set to monitor the `failover` UPS. |
| 183 | + |
| 184 | +The driver fully supports setting variables and performing instant commands on |
| 185 | +the currently elected primary UPS driver, which are proxied and with end-to-end |
| 186 | +tracking also being possible (linkman:upscmd[1] and linkman:upsrw[1] `-w`). You |
| 187 | +may notice some variables and commands will be prefixed with `upstream.`, this |
| 188 | +is to clearly separate the upstream commands from those of `failover` itself. |
| 189 | + |
| 190 | +For your convenience, additional administrative commands are exposed to directly |
| 191 | +influence and override the primary election process, e.g. for maintenance: |
| 192 | + |
| 193 | +- `<socketname>.force.ignore [seconds]` prevents the specified UPS driver from |
| 194 | +being selected as primary for the given duration, or permanently if a negative |
| 195 | +value is used. A value of `0` resets this override and re-enables selection. |
| 196 | + |
| 197 | +- `<socketname>.force.primary [seconds]` forces the specified UPS driver to be |
| 198 | +treated with the highest priority for the given duration, or permanently if a |
| 199 | +negative value is used. A value of `0` resets this override. |
| 200 | + |
| 201 | +Calling either command without an argument has the same effect as passing `0`, |
| 202 | +but only for that specific override - it does not affect the other. |
| 203 | + |
| 204 | +PRIORITIES |
| 205 | +---------- |
| 206 | + |
| 207 | +As outlined above, primaries are dynamically elected based on their current |
| 208 | +state and according to a strict set of user influenceable priorities, which are: |
| 209 | + |
| 210 | +- `0` (highest): UPS driver was forced to the top by administrative command. |
| 211 | +- `1`: UPS driver has passed the user-defined status filters. |
| 212 | +- `2`: UPS driver has fresh data and is online (in status `OL`). |
| 213 | +- `3`: UPS driver has fresh data, but may not be fully online. |
| 214 | +- `4` (lowest): UPS driver is alive, but may not have fresh data. |
| 215 | + |
| 216 | +The UPS driver with the highest calculated priority is chosen as primary, ties |
| 217 | +are resolved through order of the socket names given within the `port` argument. |
| 218 | + |
| 219 | +For the user-defined status filters, the following internal order is respected: |
| 220 | + |
| 221 | +1. `status_nothave_any` (first) |
| 222 | +2. `status_have_all` |
| 223 | +3. `status_nothave_all` |
| 224 | +4. `status_have_any` (last) |
| 225 | + |
| 226 | +If `strictfiltering` is enabled, priorities 2 to 4 are not applicable. |
| 227 | + |
| 228 | +If no user-defined status filters are set, the priority 1 is not applicable. |
| 229 | + |
| 230 | +NOTE: The base requirement for any election is the UPS socket being connectable |
| 231 | +and the UPS driver having published at least one full batch of data during its |
| 232 | +lifetime. UPS driver not fulfilling that requirement are always disqualified. |
| 233 | + |
| 234 | +RATIONALE |
| 235 | +--------- |
| 236 | + |
| 237 | +In complex power environments, presenting a single, consistent source of UPS |
| 238 | +information to linkman:upsmon[8] is sometimes preferable to monitoring multiple |
| 239 | +independent drivers directly. The `failover` driver serves as a bridge, allowing |
| 240 | +linkman:upsmon[8] to make decisions based on the most suitable available data, |
| 241 | +without having to interpret conflicting inputs or degraded sources. |
| 242 | + |
| 243 | +Originally designed for use cases such as dual-PSU systems or redundant |
| 244 | +communication paths to a single UPS, `failover` also supports more advanced |
| 245 | +setups - for example, when multiple UPSes feed a shared downstream load (via |
| 246 | +STS/ATS switches), or when drivers vary in reliability. In these cases, the |
| 247 | +driver can be combined with external logic or scripting to dynamically adjust |
| 248 | +primary selection and facilitate graceful degradation. Such setups may also |
| 249 | +benefit from further integration with the `clone` family of drivers, such as |
| 250 | +linkman:clone[8] or linkman:clone-outlet[8], for greater granularity and |
| 251 | +monitoring control down to the outlet level. |
| 252 | + |
| 253 | +Additionally, in more niche scenarios, some third-party NUT integrations or |
| 254 | +graphical interfaces may be limited to monitoring a single UPS device. In such |
| 255 | +cases, `failover` can help by exposing only the most relevant or |
| 256 | +highest-priority data source, allowing those tools to operate within their |
| 257 | +constraints without missing critical information. |
| 258 | + |
| 259 | +Ultimately, this driver enables more nuanced power monitoring and control than |
| 260 | +binary online/offline logic alone, allowing administrators to respond to |
| 261 | +degraded conditions early - before they escalate into critical events or require |
| 262 | +linkman:upsmon[8] to take action. |
| 263 | + |
| 264 | +LIMITATIONS |
| 265 | +----------- |
| 266 | + |
| 267 | +When using `failover` for redundancy between multiple UPS drivers connected to |
| 268 | +the same underlying UPS device, data is not multiplexed between the drivers. As |
| 269 | +a result, some data points may be available in some drivers but not in others. |
| 270 | + |
| 271 | +For `checkruntime` considerations, the unit of both `battery.runtime` and |
| 272 | +`battery.runtime.low` is assumed to be **seconds**. UPS drivers that report |
| 273 | +these values using different units are considered non-compliant with the NUT |
| 274 | +variable standards and should be reported to the NUT developers as faulty. |
| 275 | + |
| 276 | +AUTHOR |
| 277 | +------ |
| 278 | + |
| 279 | +Sebastian Kuttnig < [email protected]> |
| 280 | + |
| 281 | +SEE ALSO |
| 282 | +-------- |
| 283 | + |
| 284 | +linkman:upscmd[1], |
| 285 | +linkman:upsrw[1], |
| 286 | +linkman:ups.conf[5], |
| 287 | +linkman:upsc[8], |
| 288 | +linkman:upsmon[8], |
| 289 | +linkman:nutupsdrv[8], |
| 290 | +linkman:clone[8], |
| 291 | +linkman:clone-outlet[8] |
| 292 | + |
| 293 | +Internet Resources: |
| 294 | +~~~~~~~~~~~~~~~~~~~ |
| 295 | + |
| 296 | +The NUT (Network UPS Tools) home page: https://www.networkupstools.org/ |
0 commit comments