-
Notifications
You must be signed in to change notification settings - Fork 2
DocFailure
No special action.
TODO: registered unlinked file handles should be closed.
The update scheduler is notified to revert any inflight operations back to their queued state as no state updates can happen when I/O servers are offline (from the MDS's perspective). Flags and bandwidth reservations indicating any such arrangements e.g. via replication are relinquished.
As bmap ↔ ION lease assignments (bia) are tracked persistently, the MDS will recover any such lost state on startup.
No special action.
The ION's have no persistent operation log and hence operate purely in-memory. When an MDS fails, the ION tries to maintain its state and waits for the MDS to return. The MDS's bmap ↔ IOS association log informs the MDS of the IONs which were bound write permission to specific bmaps. In the case of an MDS failure, the ION waits around for the connection to return.
Discuss feasibility of connecting to other MDSes.
So long as the ION has available memory for queued operations to the MDS, it may still accept read and write operations from clients. Read operations may always be handled so long as all parties are satisfied with the state of their read-leased bmaps (i.e. the leases haven't expired).
CRC_UPDATERPCs should be accompanied by transaction numbers so that the MDS and ION can determine the state of the CRC tables without issuing a re-read operation. The transaction number would be stored in the main journal on the MDS with eachCRC_UPDATEoperation. The ION would keep the transaction ID in his MDS resource structure. Upon reconnection the MDS will have restored his export structure from the operation journal log and the transaction ID stored therein. Should the transaction IDs match, no re-read is necessary. However if the transaction IDs do not match then we may have to assume that the connection has been compromised (i.e. perhaps the ION rebooted too).
This scenario only affects replication activity between ION source and destination peers. Such operations are fully idempotent as they contain the bmap generation numbers.
It is the client's responsbility to ensure standing requests are completed
as the ION mantain no operation journal.
The only type of long standing request would be an asynchronous IO (AIO)
request issued by client to an archival_fs ION.
Pending replication status inquiries are all failed.
Asynchronous I/O requests are all given up on.
Before all I/O activity, online availability of IONs is factoring into selection of an IOS. For situations when no IONs are available (who harbor residency of the requested data, exclusively), the client will either block awaiting connectivity or fail instantly, depending on timeout/retry configuration and state.
Any failures during synchronous I/O activity are handled directly and not in any callback fashion.
Note that the client may still have cached buffers attached to bmap leases associated with a failed ION. The client will retain and retry connection establishment until success or bail and purge his cache, depending on configuration and state, as long as the MDS respects his ability to do so, which would only happen if communication with the MDS (from the MDS' perspective) is lost as well.
