You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature would like to see added to OpenZFS
Right now, if a scan (scrub, resilver) concludes that it should look at any DDT for any reason, that entire DDT is traversed in its entirety. This holds true even if the scan is for a narrow range of txgs (say, after a short outage, so with a nontrivial minimum scan txg), and any historical autoditto DDT object(s) are always included in every scan (for correctness, this must be true).
However, in the case of a narrow txg scan, we would traverse only a subset of the DDT: those bits of it that stand a chance of holding block pointers born in txgs after the scan's minimum. The DDTs are stored in ZAPs, which, ultimately, are backed by blocks with associated birth txgs, and we have, in practice, an unchecked invariant: a DDT ZAP block born in txg N can (transitively) hold only block pointers to data born in txgs prior to N (note that this is one-sided, and so there is no utility in considering the scan's maximum txg). While these block birth txgs exist on disk, this information is not plumbed through the dbuf and ZAP APIs, and in particular the ZAP query/traversal API lacks any notion of txgs.
In summary, this would require
Exposing some notion of dbuf birth txg, if not already, so that the ZAP layer can see it and prune its internal traversal.
Exposing a txg-indexed ZAP lookup/traversal API, which does not return objects born in txgs prior to a parameter.
Moving ddt_walk to use that API.
Moving dsl_scan to the new ddt_walk API.
How will this feature improve OpenZFS?
This is a more general approach than the current one taken by dsl_scan.c, which, heuristically, assumes that narrow resilvers should ignore all DDTs except autoditto DDTs, if any. This heuristic is unfortunate, in that it may amplify scan traffic by requiring probes of the DDTs as it traverses the pool. Given the ability to scan versioned slices of the DDTs, as proposed here, we would instead prefer to scan all the DDTs exactly once and then (use the existing logic to) skip all deduplicated blocks while traversing pool metadata, avoiding repeated probing.
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
Describe the feature would like to see added to OpenZFS
Right now, if a scan (scrub, resilver) concludes that it should look at any DDT for any reason, that entire DDT is traversed in its entirety. This holds true even if the scan is for a narrow range of txgs (say, after a short outage, so with a nontrivial minimum scan txg), and any historical autoditto DDT object(s) are always included in every scan (for correctness, this must be true).
However, in the case of a narrow txg scan, we would traverse only a subset of the DDT: those bits of it that stand a chance of holding block pointers born in txgs after the scan's minimum. The DDTs are stored in ZAPs, which, ultimately, are backed by blocks with associated birth txgs, and we have, in practice, an unchecked invariant: a DDT ZAP block born in txg N can (transitively) hold only block pointers to data born in txgs prior to N (note that this is one-sided, and so there is no utility in considering the scan's maximum txg). While these block birth txgs exist on disk, this information is not plumbed through the dbuf and ZAP APIs, and in particular the ZAP query/traversal API lacks any notion of txgs.
In summary, this would require
How will this feature improve OpenZFS?
This is a more general approach than the current one taken by dsl_scan.c, which, heuristically, assumes that narrow resilvers should ignore all DDTs except autoditto DDTs, if any. This heuristic is unfortunate, in that it may amplify scan traffic by requiring probes of the DDTs as it traverses the pool. Given the ability to scan versioned slices of the DDTs, as proposed here, we would instead prefer to scan all the DDTs exactly once and then (use the existing logic to) skip all deduplicated blocks while traversing pool metadata, avoiding repeated probing.
The text was updated successfully, but these errors were encountered: