[v1] AttentionMetadata for each layer #17394

heheda12345 · 2025-04-29T14:41:57Z

Should be merge after #17193

This PR changes ForwardContext.attn_metadata from a global one to dict[layer_name, AttentionMetadata] to prepare for hybrid allocator which allocate different block table to sliding window layers and full attention layers. We only need to build one attention metadata for each kv cache group and let all layers inside that kv cache group point to that attention metadata object.

Signed-off-by: Chen Zhang <[email protected]>

…kens Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

…er_layer_attn_metadata

Signed-off-by: Chen Zhang <[email protected]>

github-actions · 2025-04-29T14:42:07Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Chen Zhang <[email protected]>

…tn_metadata

…tn_metadata Signed-off-by: Chen Zhang <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

WoosukKwon

LGTM. Thanks for the PR!

vllm/v1/spec_decode/eagle.py

vllm/v1/worker/gpu_model_runner.py

vllm/v1/attention/backends/utils.py

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 · 2025-05-06T06:39:07Z

Thanks for your review! I've updated the PR.

WoosukKwon · 2025-05-06T06:52:29Z

@heheda12345 Thanks! Please merge from main.

Signed-off-by: Chen Zhang <[email protected]>

…tn_metadata

… into per_layer_attn_metadata

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 · 2025-05-06T07:26:30Z

Merged with main and fixed the previous TPU CI failure.

WoosukKwon · 2025-05-06T07:33:28Z

@heheda12345 Please fix the lint error 😓

Signed-off-by: Chen Zhang <[email protected]>

… into per_layer_attn_metadata

* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497) Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> * [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138) Signed-off-by: rzou <[email protected]> * [Bugfix] Fixed prompt length for random dataset (vllm-project#17408) Signed-off-by: Mikhail Podvitskii <[email protected]> * [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Fix ScalarType float4 naming (vllm-project#17690) Signed-off-by: Lucas Wilkinson <[email protected]> * Fix `dockerfilegraph` pre-commit hook (vllm-project#17698) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446) Signed-off-by: Mengqing Cao <[email protected]> * [V1] Enable TPU V1 backend by default (vllm-project#17673) Signed-off-by: mgoin <[email protected]> * [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219) Signed-off-by: jiang1.li <[email protected]> Signed-off-by: jiang.li <[email protected]> * [v1] AttentionMetadata for each layer (vllm-project#17394) Signed-off-by: Chen Zhang <[email protected]> * [Feat] Add deprecated=True to CLI args (vllm-project#17426) Signed-off-by: Aaron Pham <[email protected]> * [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709) Signed-off-by: windsonsea <[email protected]> * [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479) Signed-off-by: Chen Zhang <[email protected]> * [doc] Add RAG Integration example (vllm-project#17692) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix modality limits in vision language example (vllm-project#17721) Signed-off-by: DarkLight1337 <[email protected]> * Make right sidebar more readable in "Supported Models" (vllm-project#17723) Signed-off-by: Harry Mellor <[email protected]> * [TPU] Increase block size and reset block shapes (vllm-project#16458) * [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> * [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> --------- Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: Mikhail Podvitskii <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: Stan Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Mikhail Podvitskii <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Jevin Jiang <[email protected]> Co-authored-by: d.transposed <[email protected]> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 added 6 commits April 25, 2025 07:33

remove num_input_tokens from attn_metadata

d35146f

Signed-off-by: Chen Zhang <[email protected]>

fix

20d930b

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm into num_input_to…

d17daf5

…kens Signed-off-by: Chen Zhang <[email protected]>

per_layer_attn_metadata

f0636df

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'num_input_tokens' of github.com:heheda12345/vllm into p…

1e2f970

…er_layer_attn_metadata

updaet comment

dd08b5b

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 29, 2025 14:41

mergify bot added v1 tpu Related to Google TPUs labels Apr 29, 2025

heheda12345 added 4 commits April 29, 2025 07:45

update tpu code

ab4389e

Signed-off-by: Chen Zhang <[email protected]>

fix kv connector

20a1d22

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm into per_layer_at…

e7ffa63

…tn_metadata

Merge branch 'main' of github.com:vllm-project/vllm into per_layer_at…

5816b17

…tn_metadata Signed-off-by: Chen Zhang <[email protected]>

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 30, 2025

fix eagle

4679b4c

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 mentioned this pull request Apr 30, 2025

[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders #17483

Merged

heheda12345 added 2 commits May 1, 2025 04:56

fix bug

1fbb06a

Signed-off-by: Chen Zhang <[email protected]>

fix

bb68034

Signed-off-by: Chen Zhang <[email protected]>

WoosukKwon approved these changes May 6, 2025

View reviewed changes

vllm/v1/spec_decode/eagle.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/attention/backends/utils.py Outdated Show resolved Hide resolved

heheda12345 added 3 commits May 5, 2025 23:09

address review comments

b689523

Signed-off-by: Chen Zhang <[email protected]>

add docstring to CommonAttentionMetadata

e3021c6

Signed-off-by: Chen Zhang <[email protected]>

fix tpu

5b55ca2

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 added 4 commits May 6, 2025 07:12

fix tpu

4acfaff

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'main' of github.com:vllm-project/vllm into per_layer_at…

737e3a8

…tn_metadata

Merge branch 'per_layer_attn_metadata' of github.com:heheda12345/vllm…

8996842

… into per_layer_attn_metadata

fix tpu

dd1ec7d

Signed-off-by: Chen Zhang <[email protected]>

heheda12345 added 2 commits May 6, 2025 02:59

fix precommit

55692ac

Signed-off-by: Chen Zhang <[email protected]>

Merge branch 'per_layer_attn_metadata' of github.com:heheda12345/vllm…

77d86dd

… into per_layer_attn_metadata

WoosukKwon merged commit cba31c4 into vllm-project:main May 6, 2025
51 checks passed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[v1] AttentionMetadata for each layer (vllm-project#17394)

e9e6364

Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Mu Huai <[email protected]>

zixi-qi mentioned this pull request May 12, 2025

[V1][Spec Decode] Support multi-layer eagle draft model #18030

Open

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[v1] AttentionMetadata for each layer (vllm-project#17394)

a4084cc

Signed-off-by: Chen Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1] AttentionMetadata for each layer #17394

[v1] AttentionMetadata for each layer #17394

heheda12345 commented Apr 29, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Apr 29, 2025

WoosukKwon left a comment

heheda12345 commented May 6, 2025

WoosukKwon commented May 6, 2025

heheda12345 commented May 6, 2025

WoosukKwon commented May 6, 2025

[v1] AttentionMetadata for each layer #17394

[v1] AttentionMetadata for each layer #17394

Conversation

heheda12345 commented Apr 29, 2025 • edited by github-actions bot Loading

github-actions bot commented Apr 29, 2025

WoosukKwon left a comment

Choose a reason for hiding this comment

heheda12345 commented May 6, 2025

WoosukKwon commented May 6, 2025

heheda12345 commented May 6, 2025

WoosukKwon commented May 6, 2025

heheda12345 commented Apr 29, 2025 •

edited by github-actions bot

Loading