-
Notifications
You must be signed in to change notification settings - Fork 4.2k
FSDP2 tutorial #3358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSDP2 tutorial #3358
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3358
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 9e7d160 with merge base 78933b1 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
FSDP2 tutorial is ready for review @AlannaBurke @svekars |
link failure is expected for "https://docs.pytorch.org/tutorials/intermediate/FSDP1_tutorial.html". It will work when we land FSDP1_tutorial in this PR |
|
||
# initialize the process group | ||
dist.init_process_group("nccl", rank=rank, world_size=world_size) | ||
``fully_shard`` register forward/backward hooks to all-gather parameters before computation, and reshard parameters after computation. To overlap all-gathers with computation, FSDP2 offers **implicit prefetching** that works out of the box with the training loop above and **explicit prefetching** for advanced users to control all-gather schedules manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
register -> registers
reshard -> reshards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
**Author**: `Hamid Shojanazeri <https://github.com/HamidShojanazeri>`__, `Yanli Zhao <https://github.com/zhaojuanmao>`__, `Shen Li <https://mrshenli.github.io/>`__ | ||
|
||
.. note:: | ||
|edit| FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|edit| FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_. | |
FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@weifengpy Can you post the preview link for this tutorial? |
https://docs-preview.pytorch.org/pytorch/tutorials/3358/intermediate/FSDP_tutorial.html |
FSDP2 tutorial replaces FSDP1 tutorial in place (intermediate_source/FSDP_tutorial.rst)
FSDP1 tutorial is renamed to intermediate_source/FSDP1_tutorial.rst. FSDP2 tutorial link to it
the code for this tutorial is commited to pytorch examples, https://github.com/pytorch/examples/tree/main/distributed/FSDP2