Skip to content

Commit ab5645b

Browse files
fdmananajannau
authored andcommitted
btrfs: fix log tree replay failure due to file with 0 links and extents
If we log a new inode (not persisted in a past transaction) that has 0 links and extents, then log another inode with an higher inode number, we end up with failing to replay the log tree with -EINVAL. The steps for this are: 1) create new file A 2) write some data to file A 3) open an fd on file A 4) unlink file A 5) fsync file A using the previously open fd 6) create file B (has higher inode number than file A) 7) fsync file B 8) power fail before current transaction commits Now when attempting to mount the fs, the log replay will fail with -ENOENT at replay_one_extent() when attempting to replay the first extent of file A. The failure comes when trying to open the inode for file A in the subvolume tree, since it doesn't exist. Before commit 5f61b96 ("btrfs: fix inode lookup error handling during log replay"), the returned error was -EIO instead of -ENOENT, since we converted any errors when attempting to read an inode during log replay to -EIO. The reason for this is that the log replay procedure fails to ignore the current inode when we are at the stage LOG_WALK_REPLAY_ALL, our current inode has 0 links and last inode we processed in the previous stage has a non 0 link count. In other words, the issue is that at replay_one_extent() we only update wc->ignore_cur_inode if the current replay stage is LOG_WALK_REPLAY_INODES. Fix this by updating wc->ignore_cur_inode whenever we find an inode item regardless of the current replay stage. This is a simple solution and easy to backport, but later we can do other alternatives like avoid logging extents or inode items other than the inode item for inodes with a link count of 0. The problem with the wc->ignore_cur_inode logic has been around since commit f2d72f4 ("Btrfs: fix warning when replaying log after fsync of a tmpfile") but it only became frequent to hit since the more recent commit 5e85262 ("btrfs: fix fsync of files with no hard links not persisting deletion"), because we stopped skipping inodes with a link count of 0 when logging, while before the problem would only be triggered if trying to replay a log tree created with an older kernel which has a logged inode with 0 links. A test case for fstests will be submitted soon. Reported-by: Peter Jung <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reported-by: burneddi <[email protected]> Link: https://lore.kernel.org/linux-btrfs/lh4W-Lwc0Mbk-QvBhhQyZxf6VbM3E8VtIvU3fPIQgweP_Q1n7wtlUZQc33sYlCKYd-o6rryJQfhHaNAOWWRKxpAXhM8NZPojzsJPyHMf2qY=@protonmail.com/#t Reported-by: Russell Haley <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: f2d72f4 ("Btrfs: fix warning when replaying log after fsync of a tmpfile") CC: [email protected] # 5.4+ Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 48eeb90 commit ab5645b

File tree

1 file changed

+30
-18
lines changed

1 file changed

+30
-18
lines changed

fs/btrfs/tree-log.c

Lines changed: 30 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -321,8 +321,7 @@ struct walk_control {
321321

322322
/*
323323
* Ignore any items from the inode currently being processed. Needs
324-
* to be set every time we find a BTRFS_INODE_ITEM_KEY and we are in
325-
* the LOG_WALK_REPLAY_INODES stage.
324+
* to be set every time we find a BTRFS_INODE_ITEM_KEY.
326325
*/
327326
bool ignore_cur_inode;
328327

@@ -2410,32 +2409,45 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb,
24102409

24112410
nritems = btrfs_header_nritems(eb);
24122411
for (i = 0; i < nritems; i++) {
2413-
btrfs_item_key_to_cpu(eb, &key, i);
2412+
struct btrfs_inode_item *inode_item;
24142413

2415-
/* inode keys are done during the first stage */
2416-
if (key.type == BTRFS_INODE_ITEM_KEY &&
2417-
wc->stage == LOG_WALK_REPLAY_INODES) {
2418-
struct btrfs_inode_item *inode_item;
2419-
u32 mode;
2414+
btrfs_item_key_to_cpu(eb, &key, i);
24202415

2421-
inode_item = btrfs_item_ptr(eb, i,
2422-
struct btrfs_inode_item);
2416+
if (key.type == BTRFS_INODE_ITEM_KEY) {
2417+
inode_item = btrfs_item_ptr(eb, i, struct btrfs_inode_item);
24232418
/*
2424-
* If we have a tmpfile (O_TMPFILE) that got fsync'ed
2425-
* and never got linked before the fsync, skip it, as
2426-
* replaying it is pointless since it would be deleted
2427-
* later. We skip logging tmpfiles, but it's always
2428-
* possible we are replaying a log created with a kernel
2429-
* that used to log tmpfiles.
2419+
* An inode with no links is either:
2420+
*
2421+
* 1) A tmpfile (O_TMPFILE) that got fsync'ed and never
2422+
* got linked before the fsync, skip it, as replaying
2423+
* it is pointless since it would be deleted later.
2424+
* We skip logging tmpfiles, but it's always possible
2425+
* we are replaying a log created with a kernel that
2426+
* used to log tmpfiles;
2427+
*
2428+
* 2) A non-tmpfile which got its last link deleted
2429+
* while holding an open fd on it and later got
2430+
* fsynced through that fd. We always log the
2431+
* parent inodes when inode->last_unlink_trans is
2432+
* set to the current transaction, so ignore all the
2433+
* inode items for this inode. We will delete the
2434+
* inode when processing the parent directory with
2435+
* replay_dir_deletes().
24302436
*/
24312437
if (btrfs_inode_nlink(eb, inode_item) == 0) {
24322438
wc->ignore_cur_inode = true;
24332439
continue;
24342440
} else {
24352441
wc->ignore_cur_inode = false;
24362442
}
2437-
ret = replay_xattr_deletes(wc->trans, root, log,
2438-
path, key.objectid);
2443+
}
2444+
2445+
/* Inode keys are done during the first stage. */
2446+
if (key.type == BTRFS_INODE_ITEM_KEY &&
2447+
wc->stage == LOG_WALK_REPLAY_INODES) {
2448+
u32 mode;
2449+
2450+
ret = replay_xattr_deletes(wc->trans, root, log, path, key.objectid);
24392451
if (ret)
24402452
break;
24412453
mode = btrfs_inode_mode(eb, inode_item);

0 commit comments

Comments
 (0)