Skip to content

FSDP2 tutorial #3358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 16, 2025
Merged

FSDP2 tutorial #3358

merged 4 commits into from
May 16, 2025

Conversation

weifengpy
Copy link
Contributor

@weifengpy weifengpy commented May 13, 2025

FSDP2 tutorial replaces FSDP1 tutorial in place (intermediate_source/FSDP_tutorial.rst)

FSDP1 tutorial is renamed to intermediate_source/FSDP1_tutorial.rst. FSDP2 tutorial link to it

Screenshot 2025-05-13 at 11 04 50 AM

the code for this tutorial is commited to pytorch examples, https://github.com/pytorch/examples/tree/main/distributed/FSDP2

Copy link

pytorch-bot bot commented May 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3358

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 9e7d160 with merge base 78933b1 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@weifengpy weifengpy marked this pull request as draft May 13, 2025 04:05
@svekars svekars requested a review from AlannaBurke May 13, 2025 15:03
@weifengpy weifengpy marked this pull request as ready for review May 13, 2025 18:06
@weifengpy
Copy link
Contributor Author

FSDP2 tutorial is ready for review @AlannaBurke @svekars

@weifengpy
Copy link
Contributor Author

link failure is expected for "https://docs.pytorch.org/tutorials/intermediate/FSDP1_tutorial.html". It will work when we land FSDP1_tutorial in this PR

@weifengpy weifengpy requested review from mori360 and wconstab May 13, 2025 18:09

# initialize the process group
dist.init_process_group("nccl", rank=rank, world_size=world_size)
``fully_shard`` register forward/backward hooks to all-gather parameters before computation, and reshard parameters after computation. To overlap all-gathers with computation, FSDP2 offers **implicit prefetching** that works out of the box with the training loop above and **explicit prefetching** for advanced users to control all-gather schedules manually.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register -> registers
reshard -> reshards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

**Author**: `Hamid Shojanazeri <https://github.com/HamidShojanazeri>`__, `Yanli Zhao <https://github.com/zhaojuanmao>`__, `Shen Li <https://mrshenli.github.io/>`__

.. note::
|edit| FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|edit| FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_.
FSDP1 is deprecated. Please check out `FSDP2 tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@AlannaBurke
Copy link
Contributor

@weifengpy Can you post the preview link for this tutorial?

@weifengpy
Copy link
Contributor Author

@weifengpy Can you post the preview link for this tutorial?

I got the preview link here
Screenshot 2025-05-14 at 1 36 25 PM

https://docs-preview.pytorch.org/pytorch/tutorials/3358/intermediate/FSDP_tutorial.html

@weifengpy weifengpy merged commit bb88078 into pytorch:main May 16, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants