Skip to content

Commit f135cea

Browse files
fdmananakdave
authored andcommitted
btrfs: fix partial loss of prealloc extent past i_size after fsync
When we have an inode with a prealloc extent that starts at an offset lower than the i_size and there is another prealloc extent that starts at an offset beyond i_size, we can end up losing part of the first prealloc extent (the part that starts at i_size) and have an implicit hole if we fsync the file and then have a power failure. Consider the following example with comments explaining how and why it happens. $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our test file with 2 consecutive prealloc extents, each with a # size of 128Kb, and covering the range from 0 to 256Kb, with a file # size of 0. $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo $ xfs_io -c "falloc -k 128K 128K" /mnt/foo # Fsync the file to record both extents in the log tree. $ xfs_io -c "fsync" /mnt/foo # Now do a redudant extent allocation for the range from 0 to 64Kb. # This will merely increase the file size from 0 to 64Kb. Instead we # could also do a truncate to set the file size to 64Kb. $ xfs_io -c "falloc 0 64K" /mnt/foo # Fsync the file, so we update the inode item in the log tree with the # new file size (64Kb). This also ends up setting the number of bytes # for the first prealloc extent to 64Kb. This is done by the truncation # at btrfs_log_prealloc_extents(). # This means that if a power failure happens after this, a write into # the file range 64Kb to 128Kb will not use the prealloc extent and # will result in allocation of a new extent. $ xfs_io -c "fsync" /mnt/foo # Now set the file size to 256K with a truncate and then fsync the file. # Since no changes happened to the extents, the fsync only updates the # i_size in the inode item at the log tree. This results in an implicit # hole for the file range from 64Kb to 128Kb, something which fsck will # complain when not using the NO_HOLES feature if we replay the log # after a power failure. $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo So instead of always truncating the log to the inode's current i_size at btrfs_log_prealloc_extents(), check first if there's a prealloc extent that starts at an offset lower than the i_size and with a length that crosses the i_size - if there is one, just make sure we truncate to a size that corresponds to the end offset of that prealloc extent, so that we don't lose the part of that extent that starts at i_size if a power failure happens. A test case for fstests follows soon. Fixes: 31d11b8 ("Btrfs: fix duplicate extents after fsync of file with prealloc extents") CC: [email protected] # 4.14+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 1402d17 commit f135cea

File tree

1 file changed

+40
-3
lines changed

1 file changed

+40
-3
lines changed

fs/btrfs/tree-log.c

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4226,6 +4226,9 @@ static int btrfs_log_prealloc_extents(struct btrfs_trans_handle *trans,
42264226
const u64 ino = btrfs_ino(inode);
42274227
struct btrfs_path *dst_path = NULL;
42284228
bool dropped_extents = false;
4229+
u64 truncate_offset = i_size;
4230+
struct extent_buffer *leaf;
4231+
int slot;
42294232
int ins_nr = 0;
42304233
int start_slot;
42314234
int ret;
@@ -4240,9 +4243,43 @@ static int btrfs_log_prealloc_extents(struct btrfs_trans_handle *trans,
42404243
if (ret < 0)
42414244
goto out;
42424245

4246+
/*
4247+
* We must check if there is a prealloc extent that starts before the
4248+
* i_size and crosses the i_size boundary. This is to ensure later we
4249+
* truncate down to the end of that extent and not to the i_size, as
4250+
* otherwise we end up losing part of the prealloc extent after a log
4251+
* replay and with an implicit hole if there is another prealloc extent
4252+
* that starts at an offset beyond i_size.
4253+
*/
4254+
ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY);
4255+
if (ret < 0)
4256+
goto out;
4257+
4258+
if (ret == 0) {
4259+
struct btrfs_file_extent_item *ei;
4260+
4261+
leaf = path->nodes[0];
4262+
slot = path->slots[0];
4263+
ei = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
4264+
4265+
if (btrfs_file_extent_type(leaf, ei) ==
4266+
BTRFS_FILE_EXTENT_PREALLOC) {
4267+
u64 extent_end;
4268+
4269+
btrfs_item_key_to_cpu(leaf, &key, slot);
4270+
extent_end = key.offset +
4271+
btrfs_file_extent_num_bytes(leaf, ei);
4272+
4273+
if (extent_end > i_size)
4274+
truncate_offset = extent_end;
4275+
}
4276+
} else {
4277+
ret = 0;
4278+
}
4279+
42434280
while (true) {
4244-
struct extent_buffer *leaf = path->nodes[0];
4245-
int slot = path->slots[0];
4281+
leaf = path->nodes[0];
4282+
slot = path->slots[0];
42464283

42474284
if (slot >= btrfs_header_nritems(leaf)) {
42484285
if (ins_nr > 0) {
@@ -4280,7 +4317,7 @@ static int btrfs_log_prealloc_extents(struct btrfs_trans_handle *trans,
42804317
ret = btrfs_truncate_inode_items(trans,
42814318
root->log_root,
42824319
&inode->vfs_inode,
4283-
i_size,
4320+
truncate_offset,
42844321
BTRFS_EXTENT_DATA_KEY);
42854322
} while (ret == -EAGAIN);
42864323
if (ret)

0 commit comments

Comments
 (0)