-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
[Model] Add GraniteMoeHybrid 4.0 model #17497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Add GraniteMoeHybrid 4.0 model #17497
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
55bc3e7
to
3569fe4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass through looks good and implementation looks clean!
When will the model be available for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ibm-research/granite-4.0-tiny-test
be added to test_hybrid.py
? It would be good to get a TP test in place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tlrmchlsmth, I'm working on the transformers side of this PR - ibm-research/granite-4.0-tiny-test
won't be the name of the actual model, it's just a placeholder since the name of the model hasn't been decided quite yet 😅
I agree that it would be nice to add a hybrid model test though
(CC @DarkLight1337 since I had just sent a note to him about this PR a little while ago also)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the model to the test_hybrid.py
, but since the name is not final and therefore the link to HF is not established yet, the tests are marked as skipped for now. Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bohnstingl sounds good! Were you able to run the tests on the upcoming model? I had seen some tp failures in my environment, but not just for this model, so may have something going on with my machines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this! some thoughts
|
||
self.quant_config = vllm_config.quant_config | ||
|
||
super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the super().__init__()
is called in the middle here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the call to super().__init__()
has been moved to the first instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks! I think you may have forgotten to push changes for this file
hidden_states = hidden_states * self.embedding_multiplier | ||
residual = None | ||
else: | ||
assert intermediate_tensors is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the asserts and raise errors instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asserts are replaced with RuntimeErrors
.
|
||
self.position_embedding_type = config.position_embedding_type | ||
if self.position_embedding_type == "rope": | ||
self.rotary_emb = get_rope( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be changed to set self.rotary_emb
to None
if it's not rope
? This will avoid potential attribute errors and we can do
if self.rotary_emb is not None:
query, key = self.rotary_emb(positions, query, key)
in forward()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to store the position_embedding_type
attribute explicitly. Now we set self.rotary_emb
directly based on config.position_embedding_type
and to None
if there is no rope being used.
else: | ||
assert intermediate_tensors is not None | ||
hidden_states = intermediate_tensors["hidden_states"] | ||
residual = intermediate_tensors["residual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be overwritten right below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial value of residual is now not fixed to None anymore
.
Can you move the test based on #17459? |
This pull request has merge conflicts that must be resolved before it can be |
@DarkLight1337, I moved the |
Yes sounds good. |
max_tokens: int, | ||
num_logprobs: int, | ||
) -> None: | ||
if model == "ibm-research/granite-4.0-tiny-test": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now, we can just comment it out in the hybrid model list instead of skipping like this for these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the changes of test_hybrid.py and just added the models as a comment to the HYBRID_MODELS
) | ||
|
||
|
||
@pytest.mark.parametrize("model", SSM_MODELS[0:1] + HYBRID_MODELS[0:2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be SSM_MODELS[0:1] + HYBRID_MODELS[0:2])
rather than [SSM_MODELS[0], HYBRID_MODELS[0]]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just an approach to include the new model in the test, because with [SSM_MODELS[0], HYBRID_MODELS[0]]
, it would not be tested, right? I reverted those changes though for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! I think that is on purpose (there is a comment at the top of the file), although now sure about the reason. I assume probably because they take awhile to run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it is to reduce CI cost
@@ -0,0 +1,346 @@ | |||
# SPDX-License-Identifier: Apache-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as tests/models/decoder_only/language/test_hybrid.py
with the new model added right? I think this PR has both files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize, I missed to commit the deletion of the old file. This should now be fixed.
Signed-off-by: Thomas Ortner <[email protected]>
590412a
to
0dbb1b9
Compare
@alex-jw-brooks. I apologize, I have missed to push the changes to |
bbff670
to
0d4f3cd
Compare
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]>
Head branch was pushed to by a user without write access
f7de40b
to
495fe33
Compare
@DarkLight1337 , thank you for fixing the spacing. |
The CI only uses the HF transformers version defined in the requirements file |
… in CI Signed-off-by: Stanislaw Wozniak <[email protected]>
Head branch was pushed to by a user without write access
…rs in HF Transformers Signed-off-by: Stanislaw Wozniak <[email protected]>
Maybe need to also add |
…in HF Transformers Signed-off-by: Stanislaw Wozniak <[email protected]>
I don't know which exact version would include it. To be on the safe side, I've commented out for now. Update: Apparently I cannot comment out, as tests fail then. As suggested, I've used |
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]>
711dcd4
to
ca47978
Compare
Signed-off-by: Stanislaw Wozniak <[email protected]>
ca47978
to
95ede00
Compare
Two CI tests failed, both unrelated to this PR:
and
I've force-pushed to retrigger CI checks and see if this persists. (Generally, I'm not sure if this is the right way to retrigger or is there a way to retrigger selected tests only.) |
* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497) Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> * [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138) Signed-off-by: rzou <[email protected]> * [Bugfix] Fixed prompt length for random dataset (vllm-project#17408) Signed-off-by: Mikhail Podvitskii <[email protected]> * [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Fix ScalarType float4 naming (vllm-project#17690) Signed-off-by: Lucas Wilkinson <[email protected]> * Fix `dockerfilegraph` pre-commit hook (vllm-project#17698) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446) Signed-off-by: Mengqing Cao <[email protected]> * [V1] Enable TPU V1 backend by default (vllm-project#17673) Signed-off-by: mgoin <[email protected]> * [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219) Signed-off-by: jiang1.li <[email protected]> Signed-off-by: jiang.li <[email protected]> * [v1] AttentionMetadata for each layer (vllm-project#17394) Signed-off-by: Chen Zhang <[email protected]> * [Feat] Add deprecated=True to CLI args (vllm-project#17426) Signed-off-by: Aaron Pham <[email protected]> * [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709) Signed-off-by: windsonsea <[email protected]> * [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479) Signed-off-by: Chen Zhang <[email protected]> * [doc] Add RAG Integration example (vllm-project#17692) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix modality limits in vision language example (vllm-project#17721) Signed-off-by: DarkLight1337 <[email protected]> * Make right sidebar more readable in "Supported Models" (vllm-project#17723) Signed-off-by: Harry Mellor <[email protected]> * [TPU] Increase block size and reset block shapes (vllm-project#16458) * [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> * [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> --------- Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: Mikhail Podvitskii <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: Stan Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Mikhail Podvitskii <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Jevin Jiang <[email protected]> Co-authored-by: d.transposed <[email protected]> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>
The PR adds support for upcoming Granite4.0 models. It is a companion PR to huggingface/transformers#37658 for adding the same model to HF.
Note: Running the model in vLLM depends on having HF Transformers with Granite4.0 support installed, see https://huggingface.co/ibm-granite/granite-4.0-tiny-preview
Note: It is re-opening #17461 again after fixing wrong rebase. Currently rebased on newer main, signed-off, resolved conflicts.
@DarkLight1337 - As per suggestion, Sampler removed from the model.
The HF model is not available in main yet. Model tests are currently marked with 'skip'.
@tdoublep @bohnstingl