Skip to content

[Model] Add GraniteMoeHybrid 4.0 model #17497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 6, 2025

Conversation

s3woz
Copy link
Contributor

@s3woz s3woz commented Apr 30, 2025

The PR adds support for upcoming Granite4.0 models. It is a companion PR to huggingface/transformers#37658 for adding the same model to HF.

Note: Running the model in vLLM depends on having HF Transformers with Granite4.0 support installed, see https://huggingface.co/ibm-granite/granite-4.0-tiny-preview

Note: It is re-opening #17461 again after fixing wrong rebase. Currently rebased on newer main, signed-off, resolved conflicts.
@DarkLight1337 - As per suggestion, Sampler removed from the model.

The HF model is not available in main yet. Model tests are currently marked with 'skip'.

@tdoublep @bohnstingl

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@s3woz s3woz force-pushed the granitemoehybrid_clean branch from 55bc3e7 to 3569fe4 Compare April 30, 2025 21:09
@s3woz s3woz marked this pull request as ready for review April 30, 2025 22:56
Copy link
Collaborator

@tlrmchlsmth tlrmchlsmth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass through looks good and implementation looks clean!

When will the model be available for testing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ibm-research/granite-4.0-tiny-test be added to test_hybrid.py? It would be good to get a TP test in place

Copy link
Contributor

@alex-jw-brooks alex-jw-brooks Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tlrmchlsmth, I'm working on the transformers side of this PR - ibm-research/granite-4.0-tiny-test won't be the name of the actual model, it's just a placeholder since the name of the model hasn't been decided quite yet 😅

I agree that it would be nice to add a hybrid model test though

(CC @DarkLight1337 since I had just sent a note to him about this PR a little while ago also)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the model to the test_hybrid.py, but since the name is not final and therefore the link to HF is not established yet, the tests are marked as skipped for now. Is that okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bohnstingl sounds good! Were you able to run the tests on the upcoming model? I had seen some tp failures in my environment, but not just for this model, so may have something going on with my machines

Copy link
Contributor

@alex-jw-brooks alex-jw-brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this! some thoughts


self.quant_config = vllm_config.quant_config

super().__init__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the super().__init__() is called in the middle here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the call to super().__init__() has been moved to the first instruction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanks! I think you may have forgotten to push changes for this file

hidden_states = hidden_states * self.embedding_multiplier
residual = None
else:
assert intermediate_tensors is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the asserts and raise errors instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts are replaced with RuntimeErrors.


self.position_embedding_type = config.position_embedding_type
if self.position_embedding_type == "rope":
self.rotary_emb = get_rope(
Copy link
Contributor

@alex-jw-brooks alex-jw-brooks May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be changed to set self.rotary_emb to None if it's not rope? This will avoid potential attribute errors and we can do

       if self.rotary_emb is not None:
             query, key = self.rotary_emb(positions, query, key)

in forward() instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to store the position_embedding_type attribute explicitly. Now we set self.rotary_emb directly based on config.position_embedding_type and to None if there is no rope being used.

else:
assert intermediate_tensors is not None
hidden_states = intermediate_tensors["hidden_states"]
residual = intermediate_tensors["residual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be overwritten right below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial value of residual is now not fixed to None anymore.

@DarkLight1337
Copy link
Member

Can you move the test based on #17459?

Copy link

mergify bot commented May 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @s3woz.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 1, 2025
@bohnstingl
Copy link
Contributor

@DarkLight1337, I moved the test_granitemoehybrid.py to tests/models/language/generation/test_granitemoehybrid.py. Is that okay?

@DarkLight1337
Copy link
Member

Yes sounds good.

max_tokens: int,
num_logprobs: int,
) -> None:
if model == "ibm-research/granite-4.0-tiny-test":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, we can just comment it out in the hybrid model list instead of skipping like this for these tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the changes of test_hybrid.py and just added the models as a comment to the HYBRID_MODELS

)


@pytest.mark.parametrize("model", SSM_MODELS[0:1] + HYBRID_MODELS[0:2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be SSM_MODELS[0:1] + HYBRID_MODELS[0:2]) rather than [SSM_MODELS[0], HYBRID_MODELS[0]]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just an approach to include the new model in the test, because with [SSM_MODELS[0], HYBRID_MODELS[0]], it would not be tested, right? I reverted those changes though for the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I think that is on purpose (there is a comment at the top of the file), although now sure about the reason. I assume probably because they take awhile to run

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it is to reduce CI cost

@@ -0,0 +1,346 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as tests/models/decoder_only/language/test_hybrid.py with the new model added right? I think this PR has both files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize, I missed to commit the deletion of the old file. This should now be fixed.

@bohnstingl bohnstingl force-pushed the granitemoehybrid_clean branch from 590412a to 0dbb1b9 Compare May 2, 2025 05:48
@mergify mergify bot removed the needs-rebase label May 2, 2025
@bohnstingl
Copy link
Contributor

@alex-jw-brooks. I apologize, I have missed to push the changes to vllm/model_executor/models/granitemoehybrid.py. I also did a rebase to the latest main, in order to have the new test structure correct. What do you think?

@s3woz s3woz force-pushed the granitemoehybrid_clean branch 2 times, most recently from bbff670 to 0d4f3cd Compare May 2, 2025 07:57
s3woz and others added 11 commits May 2, 2025 08:42
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 5, 2025
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 5, 2025 07:55
auto-merge was automatically disabled May 5, 2025 07:55

Head branch was pushed to by a user without write access

@s3woz s3woz force-pushed the granitemoehybrid_clean branch from f7de40b to 495fe33 Compare May 5, 2025 07:55
@s3woz
Copy link
Contributor Author

s3woz commented May 5, 2025

@DarkLight1337 , thank you for fixing the spacing.
I've also fixed model URLs, as we have it now uploaded in preview version.
Still, the HF code is not in the main yet, so skipping tests.
BTW, Does vLLM CI always pull the HF main? I.e. is it fine to enable test immediately after HF PR goes into main, or do we need to wait for HF release?

@DarkLight1337
Copy link
Member

The CI only uses the HF transformers version defined in the requirements file

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) May 5, 2025 08:08
auto-merge was automatically disabled May 5, 2025 09:32

Head branch was pushed to by a user without write access

…rs in HF Transformers

Signed-off-by: Stanislaw Wozniak <[email protected]>
@DarkLight1337
Copy link
Member

Maybe need to also add min_transformers_version

@s3woz
Copy link
Contributor Author

s3woz commented May 5, 2025

Maybe need to also add min_transformers_version

I don't know which exact version would include it. To be on the safe side, I've commented out for now.

Update: Apparently I cannot comment out, as tests fail then. As suggested, I've used min_transformers_version and set it to the next minor version 4.52.0, similarly to some other models in registry there.

s3woz added 2 commits May 5, 2025 12:31
Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
@s3woz s3woz force-pushed the granitemoehybrid_clean branch from 711dcd4 to ca47978 Compare May 5, 2025 17:00
Signed-off-by: Stanislaw Wozniak <[email protected]>
@s3woz s3woz force-pushed the granitemoehybrid_clean branch from ca47978 to 95ede00 Compare May 5, 2025 20:46
@s3woz
Copy link
Contributor Author

s3woz commented May 5, 2025

Two CI tests failed, both unrelated to this PR:

buildkite/ci/pr/v1-test — Failed (exit status 1)
 v1/engine/test_engine_core_client.py::test_startup_failure
 ...
 Start compiling function <code object forward at 0xda82150, file "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 345>
[2025-05-05T18:30:39Z] +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++

and

buildkite/ci/pr/2-node-tests-4-gpus-in-total — Failed (exit status 1)
 docker run -d --gpus '"device=2,3"' ...
 nvidia-container-cli: device error: 3: unknown device: unknown.

I've force-pushed to retrigger CI checks and see if this persists. (Generally, I'm not sure if this is the right way to retrigger or is there a way to retrigger selected tests only.)

@DarkLight1337 DarkLight1337 merged commit 999328b into vllm-project:main May 6, 2025
52 checks passed
robertgshaw2-redhat added a commit to neuralmagic/vllm that referenced this pull request May 6, 2025
* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497)

Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>

* [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138)

Signed-off-by: rzou <[email protected]>

* [Bugfix] Fixed prompt length for random dataset (vllm-project#17408)

Signed-off-by: Mikhail Podvitskii <[email protected]>

* [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219)

Signed-off-by: DarkLight1337 <[email protected]>

* [Misc] Fix ScalarType float4 naming  (vllm-project#17690)

Signed-off-by: Lucas Wilkinson <[email protected]>

* Fix `dockerfilegraph` pre-commit hook (vllm-project#17698)

Signed-off-by: Harry Mellor <[email protected]>

* [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446)

Signed-off-by: Mengqing Cao <[email protected]>

* [V1] Enable TPU V1 backend by default (vllm-project#17673)

Signed-off-by: mgoin <[email protected]>

* [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219)

Signed-off-by: jiang1.li <[email protected]>
Signed-off-by: jiang.li <[email protected]>

* [v1] AttentionMetadata for each layer (vllm-project#17394)

Signed-off-by: Chen Zhang <[email protected]>

* [Feat] Add deprecated=True to CLI args (vllm-project#17426)

Signed-off-by: Aaron Pham <[email protected]>

* [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709)

Signed-off-by: windsonsea <[email protected]>

* [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479)

Signed-off-by: Chen Zhang <[email protected]>

* [doc] Add RAG Integration example (vllm-project#17692)

Signed-off-by: reidliu41 <[email protected]>
Co-authored-by: reidliu41 <[email protected]>

* [Bugfix] Fix modality limits in vision language example (vllm-project#17721)

Signed-off-by: DarkLight1337 <[email protected]>

* Make right sidebar more readable in "Supported Models" (vllm-project#17723)

Signed-off-by: Harry Mellor <[email protected]>

* [TPU] Increase block size and reset block shapes (vllm-project#16458)

* [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839)

Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>

* [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732)

Signed-off-by: Gregory Shtrasberg <[email protected]>

* [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828)

Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>

---------

Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Signed-off-by: rzou <[email protected]>
Signed-off-by: Mikhail Podvitskii <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Mengqing Cao <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: jiang1.li <[email protected]>
Signed-off-by: jiang.li <[email protected]>
Signed-off-by: Chen Zhang <[email protected]>
Signed-off-by: Aaron Pham <[email protected]>
Signed-off-by: windsonsea <[email protected]>
Signed-off-by: reidliu41 <[email protected]>
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Co-authored-by: Stan Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Richard Zou <[email protected]>
Co-authored-by: Mikhail Podvitskii <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Mengqing Cao <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Aaron Pham <[email protected]>
Co-authored-by: Michael Yao <[email protected]>
Co-authored-by: Reid <[email protected]>
Co-authored-by: reidliu41 <[email protected]>
Co-authored-by: Jevin Jiang <[email protected]>
Co-authored-by: d.transposed <[email protected]>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Thomas Parnell <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025
Signed-off-by: Thomas Ortner <[email protected]>
Signed-off-by: Stanislaw Wozniak <[email protected]>
Co-authored-by: Thomas Ortner <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants