-
-
Notifications
You must be signed in to change notification settings - Fork 390
failover.c - UPS Failover Driver #2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8dae3b5
5855e6e
fe08563
bf3b9e2
961fbc2
6b33157
e5f3631
2fac5e1
f4d8ab7
2c74214
1ad9b26
683559c
fb3d251
18ac174
be52d0b
f3b1541
53359b7
9f78e4c
022342c
a6fda90
22c8465
44cabf3
b799a9d
5321a55
953d368
12e7da4
825edd2
2c68099
95ea397
f001318
3625b15
36204e8
962bac4
505853f
784ce12
ff08659
54c604b
89a0db0
ccaac91
c498dd3
d8f171c
9a6f56c
0937f25
fb3cac7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,296 @@ | ||
FAILOVER(8) | ||
========== | ||
|
||
NAME | ||
---- | ||
|
||
failover - UPS Failover Driver | ||
|
||
SYNOPSIS | ||
-------- | ||
|
||
*failover* -h | ||
|
||
*failover* -a 'UPS_NAME' ['OPTIONS'] | ||
|
||
NOTE: This man page only documents the specific features of the failover driver. | ||
For information about the core driver, see linkman:nutupsdrv[8]. | ||
|
||
DESCRIPTION | ||
----------- | ||
|
||
The `failover` driver acts as a smart proxy for multiple "real" UPS drivers. It | ||
connects to and monitors these underlying UPS drivers through their local UNIX | ||
sockets (or Windows named pipes), continuously evaluating health and suitability | ||
for "primary" duty according to a set of user configurable rules and priorities. | ||
|
||
At any given time, `failover` designates one UPS driver as the *primary*, and | ||
presents its commands, variables and status to the outside world as if it were | ||
directly talking to that UPS. From the perspective of the clients (such as | ||
linkman:upsmon[8] or linkman:upsc[8]), the `failover` driver behaves like any | ||
single UPS, abstracting away the underlying redundancy, and allowing for | ||
seamless transitioning between all monitored UPS drivers and their datasets. | ||
|
||
The driver dynamically promotes or demotes the primary UPS driver based on: | ||
|
||
- Socket availability and communication status | ||
- Data freshness and UPS online/offline indicators | ||
- User-defined status filters (e.g., presence or absence of `OL`, `LB`, ...) | ||
- Administrative override via control commands (`force.primary`, `force.ignore`) | ||
|
||
If the current primary becomes unavailable or no longer meets the criteria, the | ||
driver automatically fails over to a more suitable driver. During transitions, | ||
it ensures that any data is switched out instantly, without the linkman:upsd[8] | ||
considering it as stale or the clients acting on any previously degraded status. | ||
|
||
When no suitable primary is available, a configurable fallback state is entered: | ||
|
||
- Keep last primary and declare the data as stale | ||
- Raise `ALARM` and declare the data as stale | ||
- Raise `ALARM` and set forced shutdown (`FSD`) | ||
|
||
Different communication media can be used to connect to individual UPS drivers | ||
(e.g., USB, Serial, Ethernet). `failover` communicates directly at the socket | ||
level and therefore does not rely on linkman:upsd[8] being active. | ||
|
||
EXTRA ARGUMENTS | ||
--------------- | ||
|
||
This driver supports the following settings: | ||
|
||
*port*='drivername-devicename,drivername2-devicename2,...':: | ||
Required. Specifies the local sockets (or Windows named pipes) of the underlying | ||
UPS drivers to be tracked. Entries must either be a path or follow the format | ||
`drivername-devicename`, as used by NUT's internal socket naming convention | ||
(e.g. `usbhid-ups-myups`). Multiple entries are comma-separated with no spaces. | ||
|
||
*inittime*='seconds':: | ||
Optional. Sets a grace period after driver startup during which the absence of a | ||
primary is tolerated. This allows time for underlying drivers to initialize. For | ||
networked connections or drivers that require "lock-picking" their communication | ||
protocol, consider increasing this value to accommodate potential longer delays. | ||
Defaults to 30 seconds. | ||
|
||
*deadtime*='seconds':: | ||
Optional. Sets a grace period in seconds after which a non-responsive UPS driver | ||
is considered dead. Defaults to 30 seconds. | ||
|
||
*relogtime*='seconds':: | ||
Optional. Time interval in which repeated connection failure logs are emitted | ||
for a UPS, reducing log spam during unstable conditions. Defaults to 5 seconds. | ||
|
||
*noprimarytime*='seconds':: | ||
Optional. Duration to wait without a suitable primary UPS driver before entering | ||
the configured fallback mode (`fsdmode`). Defaults to 15 seconds. | ||
|
||
*maxconnfails*='count':: | ||
Optional. Number of consecutive connection failures allowed per UPS driver | ||
before entering into the cooldown period (`coolofftime`). Defaults to 5. | ||
|
||
*coolofftime*='seconds':: | ||
Optional. Cooldown period during which the driver pauses reconnect attempts | ||
after exceeding `maxconnfails`. Defaults to 15 seconds. | ||
|
||
*fsdmode*='0|1|2':: | ||
Optional. Defines the behavior when no suitable primary UPS driver is found | ||
after `noprimarytime` has elapsed. Defaults to 0. | ||
|
||
- `0`: *Do not demote the last primary, but mark its data as stale.* This is | ||
similar to how a regular UPS driver would behave when it loses its connection to | ||
the target UPS device. linkman:upsmon[8] will act on the last known (online or | ||
not) status, and decide itself whether that UPS should be considered critical. | ||
|
||
- `1`: *Demote the primary, raise `ALARM`, and mark the data as stale after an | ||
additional few seconds have elapsed (ensuring full propagation).* This will | ||
cause linkman:upsmon[8] to detect that a device previously in an alarm state has | ||
lost its connection, consider the UPS driver critical, and possibly trigger a | ||
forced shutdown (`FSD`) due to depletion of `MINSUPPLIES`. | ||
|
||
- `2`: *Demote the primary, raise `ALARM`, and immediately set `FSD`.* This will | ||
set `FSD` from the driver side and preempt linkman:upsmon[8] from raising it | ||
itself. This mode is for setups where immediate shutdown is warranted, | ||
regardless of anything else, and getting `FSD` out to the clients as fast as | ||
just possible. | ||
|
||
*checkruntime*='0|1|2|3':: | ||
Optional. Controls how `battery.runtime` values are used to break ties between | ||
non-fully-online UPS devices **at priority 3 or lower**. Has no effect on | ||
initial priority selection or when `strictfiltering` is enabled. Defaults to 1. | ||
|
||
- `0`: *Disabled.* No runtime comparison is done. The first candidate with the | ||
best priority is selected according to the order of the port argument. | ||
|
||
- `1`: *Compare `battery.runtime`.* The UPS with the higher value is preferred. | ||
If the value is missing or invalid, the UPS cannot win the tie-break. | ||
|
||
- `2`: *Compare `battery.runtime.low`.* The UPS with the higher value is | ||
preferred. If the value is missing or invalid, the UPS cannot win the tie-break. | ||
Comment on lines
+126
to
+127
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure how useful this one is for such comparisons, being a fixed value that may be (derived from) a user setting:
It may also be irrelevant if e.g. But for some users their (own or device's) setting may be a measure of UPS reliability, so why not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I understood this as: this is the seconds left (timer) before the UPS switches to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess it can stay, just needed a second glance at usefulness. This can be a factor for some shutdown scenarios - "this UPS will give me a 5-minute FSD window to shut down gracefully (because that's when it becomes OB+LB)", although for specifically shutdowns with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, thanks for the double check. 👍 |
||
|
||
- `3`: *Compare both variables strictly.* The UPS is preferred only if it has | ||
both a higher `battery.runtime` and `battery.runtime.low` value. If either is | ||
missing or invalid, the UPS cannot win the tie-break. | ||
|
||
*strictfiltering*='0|1':: Optional. If set to 1, only UPS drivers matching the | ||
configured status filters are considered for promotion to primary. If set to 0, | ||
the hard-coded default logic is also considered when no status filters match | ||
(read more about this in the section `PRIORITIES`). Defaults to 0. | ||
|
||
*status_have_any*='OL,CHRG,...':: | ||
Optional. If any of these comma-separated tokens are present in a UPS driver's | ||
`ups.status`, it passes this status filtering criteria. Defaults to unset. | ||
|
||
*status_have_all*='OL,CHRG,...':: | ||
Optional. All listed comma-separated tokens must be present in `ups.status` for | ||
the UPS driver to pass this status filtering criteria. Defaults to unset. | ||
|
||
*status_nothave_any*='OB,OFF,...':: | ||
Optional. If any of these comma-separated tokens are present in `ups.status`, | ||
the UPS driver does not pass this status filtering criteria. Defaults to unset. | ||
|
||
*status_nothave_all*='OB,LB,...':: | ||
Optional. If all of these comma-separated tokens are present in `ups.status`, | ||
the UPS driver does not pass this status filtering criteria. Defaults to unset. | ||
|
||
NOTE: The `status_*` arguments are primarily intended to adjust the weighting of | ||
UPS drivers, allowing some to be prioritized over others based on their status. | ||
For example, a driver reporting `OL` might be preferred over one reporting | ||
`ALARM OL`. While `strictfiltering` can be enabled, status filters are most | ||
effective when used in combination with the default set of connectivity-based | ||
`PRIORITIES`. For more details, see the respective section further below. | ||
|
||
IMPLEMENTATION | ||
-------------- | ||
|
||
The port argument in the linkman:ups.conf[5] should reference the local driver | ||
sockets (or Windows named pipes) that the "real" UPS drivers are using. A basic | ||
default setup with multiple drivers could look like this: | ||
|
||
------ | ||
[realups] | ||
driver = usbhid-ups | ||
port = auto | ||
|
||
[realups2] | ||
driver = usbhid-ups | ||
port = auto | ||
|
||
[failover] | ||
driver = failover | ||
port = usbhid-ups-realups,usbhid-ups-realups2 | ||
------ | ||
|
||
Any linkman:upsmon[8] clients would be set to monitor the `failover` UPS. | ||
|
||
The driver fully supports setting variables and performing instant commands on | ||
the currently elected primary UPS driver, which are proxied and with end-to-end | ||
tracking also being possible (linkman:upscmd[1] and linkman:upsrw[1] `-w`). You | ||
may notice some variables and commands will be prefixed with `upstream.`, this | ||
is to clearly separate the upstream commands from those of `failover` itself. | ||
|
||
For your convenience, additional administrative commands are exposed to directly | ||
influence and override the primary election process, e.g. for maintenance: | ||
|
||
- `<socketname>.force.ignore [seconds]` prevents the specified UPS driver from | ||
being selected as primary for the given duration, or permanently if a negative | ||
value is used. A value of `0` resets this override and re-enables selection. | ||
|
||
- `<socketname>.force.primary [seconds]` forces the specified UPS driver to be | ||
treated with the highest priority for the given duration, or permanently if a | ||
negative value is used. A value of `0` resets this override. | ||
|
||
Calling either command without an argument has the same effect as passing `0`, | ||
but only for that specific override - it does not affect the other. | ||
|
||
PRIORITIES | ||
---------- | ||
|
||
As outlined above, primaries are dynamically elected based on their current | ||
state and according to a strict set of user influenceable priorities, which are: | ||
|
||
- `0` (highest): UPS driver was forced to the top by administrative command. | ||
- `1`: UPS driver has passed the user-defined status filters. | ||
- `2`: UPS driver has fresh data and is online (in status `OL`). | ||
- `3`: UPS driver has fresh data, but may not be fully online. | ||
- `4` (lowest): UPS driver is alive, but may not have fresh data. | ||
|
||
The UPS driver with the highest calculated priority is chosen as primary, ties | ||
are resolved through order of the socket names given within the `port` argument. | ||
|
||
For the user-defined status filters, the following internal order is respected: | ||
|
||
1. `status_nothave_any` (first) | ||
2. `status_have_all` | ||
3. `status_nothave_all` | ||
4. `status_have_any` (last) | ||
|
||
If `strictfiltering` is enabled, priorities 2 to 4 are not applicable. | ||
|
||
If no user-defined status filters are set, the priority 1 is not applicable. | ||
|
||
NOTE: The base requirement for any election is the UPS socket being connectable | ||
and the UPS driver having published at least one full batch of data during its | ||
lifetime. UPS driver not fulfilling that requirement are always disqualified. | ||
|
||
RATIONALE | ||
--------- | ||
|
||
In complex power environments, presenting a single, consistent source of UPS | ||
information to linkman:upsmon[8] is sometimes preferable to monitoring multiple | ||
independent drivers directly. The `failover` driver serves as a bridge, allowing | ||
linkman:upsmon[8] to make decisions based on the most suitable available data, | ||
without having to interpret conflicting inputs or degraded sources. | ||
|
||
Originally designed for use cases such as dual-PSU systems or redundant | ||
communication paths to a single UPS, `failover` also supports more advanced | ||
setups - for example, when multiple UPSes feed a shared downstream load (via | ||
STS/ATS switches), or when drivers vary in reliability. In these cases, the | ||
driver can be combined with external logic or scripting to dynamically adjust | ||
primary selection and facilitate graceful degradation. Such setups may also | ||
benefit from further integration with the `clone` family of drivers, such as | ||
linkman:clone[8] or linkman:clone-outlet[8], for greater granularity and | ||
monitoring control down to the outlet level. | ||
|
||
Additionally, in more niche scenarios, some third-party NUT integrations or | ||
graphical interfaces may be limited to monitoring a single UPS device. In such | ||
cases, `failover` can help by exposing only the most relevant or | ||
highest-priority data source, allowing those tools to operate within their | ||
constraints without missing critical information. | ||
|
||
Ultimately, this driver enables more nuanced power monitoring and control than | ||
binary online/offline logic alone, allowing administrators to respond to | ||
degraded conditions early - before they escalate into critical events or require | ||
linkman:upsmon[8] to take action. | ||
|
||
LIMITATIONS | ||
----------- | ||
|
||
When using `failover` for redundancy between multiple UPS drivers connected to | ||
the same underlying UPS device, data is not multiplexed between the drivers. As | ||
a result, some data points may be available in some drivers but not in others. | ||
|
||
For `checkruntime` considerations, the unit of both `battery.runtime` and | ||
`battery.runtime.low` is assumed to be **seconds**. UPS drivers that report | ||
these values using different units are considered non-compliant with the NUT | ||
variable standards and should be reported to the NUT developers as faulty. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. General note for posterity: not all NUT drivers provide a For the purpose here, comparing charges is probably rather useless (unless the UPSes have similar capacity so runtimes would happen to compare similarly); but it may be productive to eventually focus on generalizing the |
||
AUTHOR | ||
------ | ||
|
||
Sebastian Kuttnig <[email protected]> | ||
|
||
SEE ALSO | ||
-------- | ||
|
||
linkman:upscmd[1], | ||
linkman:upsrw[1], | ||
linkman:ups.conf[5], | ||
linkman:upsc[8], | ||
linkman:upsmon[8], | ||
linkman:nutupsdrv[8], | ||
linkman:clone[8], | ||
linkman:clone-outlet[8] | ||
|
||
Internet Resources: | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
The NUT (Network UPS Tools) home page: https://www.networkupstools.org/ |
Uh oh!
There was an error while loading. Please reload this page.