Skip to content

Fixed some dmu block clone reflink ASSERTs. #14995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

oromenahar
Copy link
Contributor

@oromenahar oromenahar commented Jun 18, 2023

I'm working on reflink support for linux and found some Asserts which are not working as expected.

I'm using cp to reflink the file and than using sha256sum right after it to read it. That results in some strange behaviour.
My cp reflink without adding --reflink as option explicitly.

Motivation and Context

I don't know if this fixes an open issue. I guess there are some issues about reflink support on linux, but not this one exactly.
Right now I'm working on a reflink support for linux and while I'm was working on that I found it. If necessary here is a link to the branch: reflink on linux still WIP

Description

The first one: the list_head(&db->db_dirty_records) can return elements from a previous transaction group.
For example: This can happen, when a file is opened with the O_TRUNC flag and than reflinked.
The free_long_range transaction group is not sync to disk yet, but the clone is already running.
An ASSERT like this would work:

dr = list_head(&db->db_dirty_records);
ASSERT(dr == NULL || (dr->dr_txg < tx->tx_txg));

The second one:
It looks like if you are fast enough to read the data the ASSERT is false. The db->db_state will be DB_READ but dr->dt.dl.dr_brtwrite is not synced to disk yet. So if (db->db_state == DB_READ && dr->dt.dl.dr_brtwrite == B_TRUE) is B_TRUE it is fine to continue. (in debug mode)
It can be repoduced with:

while true;
do
	/usr/bin/cp -fv /tank/test/test.img /tank/test/test.img2 &&
	sleep 1 && sha256sum /tank/test/test.img2;
done

Tested on linux 5.14

How Has This Been Tested?

I have tested on linux 5.14 Rockylinux 9 with custom build ZFS and the zsh. I used the commits from this branch to expose the reflink on linux still WIP interface to the linux kernel and than use cp and sha256sum to read and write it through the VFS. I tested on a virtual machine with a one disk.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation) part of (in the commit and source)

Checklist:

The first one: the `list_head(&db->db_dirty_records)` can return
elements from a previous transaction group.
For example: This can happen, when a file is opened
with the O_TRUNC flag and than reflinked.
The free_long_range transaction group is not sync to disk yet,
but the clone is already running.
An ASSERT like this would work:
```
dr = list_head(&db->db_dirty_records);
ASSERT(dr == NULL || (dr->dr_txg < tx->tx_txg));
```

The second one:
It looks like if you are fast enough to read the data
the ASSERT is false. The db->db_state will be DB_READ
but dr->dt.dl.dr_brtwrite is not synced to disk yet.
So if (db->db_state == DB_READ && dr->dt.dl.dr_brtwrite == B_TRUE)
is B_TRUE it is fine to continue. (in debug mode)
It can be repoduced with:
```
while true;
do /usr/bin/cp -fv /tank/test/test.img /tank/test/test.img2 &&
sleep 1 && sha256sum /tank/test/test.img2;
done
```

Tested on linux 5.14

Signed-off-by: Kay Pedersen <[email protected]>
@rincebrain
Copy link
Contributor

rincebrain commented Jun 18, 2023

Plumbing it into (just) remap_file_range won't work cross-mount, because there's hardcoded checks in everything but copy_file_range, I believe. (I would love to be wrong, dearly, but that was my impression of the state of the world for cross-FS last I looked.)

You may find this and this informative.

Also I'd bet he'll find it anyway, but to speed that up, @pjd .

@oromenahar
Copy link
Contributor Author

oromenahar commented Jun 21, 2023

Oh nice @rincebrain this one is really nice, I'm fairly new to the codebase and the chance to compare other code on the same problem with my own code is really nice. Tanks a lot @rincebrain
I'm going to check it out later this week.

EDIT:
@rincebrain I checked your code for the some behaviour like mine brt poking code, it has the same ASSERT errors.

Plumbing it into (just) remap_file_range won't work cross-mount, because there's hardcoded checks in everything but copy_file_range, I believe. (I would love to be wrong, dearly, but that was my impression of the state of the world for cross-FS last I looked.)

You may find this and this informative.

Also I'd bet he'll find it anyway, but to speed that up, @pjd .

@rincebrain
Copy link
Contributor

I didn't say it didn't have the ASSERTs, I was just pointing to it as another experiment in wiring it up that wasn't just wiring up remap.

@oromenahar
Copy link
Contributor Author

I didn't say it didn't have the ASSERTs, I was just pointing to it as another experiment in wiring it up that wasn't just wiring up remap.

Oh sorry I didn't wanted to point the errors, I only tried to figure out if your copy_file_range has the same ASSERTs like my remap_file_range and provided my test results, more like a hint or information (for me). Didn't want to judge it in any way. Sorry for that.

@behlendorf
Copy link
Contributor

PR #15050 has been opened which adds the Linux integration. @oromenahar it'd be great if you could test out the new PR.

@oromenahar
Copy link
Contributor Author

@behlendorf already working on a review for that pull request. It would be nice if the PR can be merged and can get off the ground.

@robn
Copy link
Member

robn commented Jul 23, 2023

It looks like this is all real. There's more analysis in #15050.

@oromenahar
Copy link
Contributor Author

closed, because added as commits in #15050 (commits: b532ca8 and d41549e)

@oromenahar oromenahar closed this Jul 24, 2023
@oromenahar oromenahar deleted the fix/dmu_brt_clone_dirty_records branch August 3, 2023 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants