Add GraniteMoeHybrid support for 4.0 #37658

Ssukriti · 2025-04-21T15:43:06Z

What does this PR do?

The PR adds support for upcoming Granite4.0 models. It terms of model architecture, it is a hybrid class with shared MLP layer and Bamba layers.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-04-22T12:22:00Z

cc @ArthurZucker for text models!

vasqu · 2025-04-22T15:11:54Z

src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

+class GraniteMoeHybridSdpaAttention(GraniteMoeSharedSdpaAttention):
+    pass
+
+GRANITEMOEHYBRID_ATTENTION_CLASSES = {


Just as a heads up, I think it would be nice to follow using the new attention interface (see #35235 for the original PR). Llama can also provide a good first pointer for this, e.g.

transformers/src/transformers/models/llama/modeling_llama.py

Line 216 in de182ba

class LlamaAttention(nn.Module):

(Except I'm missing that this is a more special kind of attention here :D )

Thanks for the heads up @vasqu! We are still cleaning up this branch a bit, will take a look at this once the tests are in a better state 🙂

Thanks for the pointer @vasqu! Refactored this PR to the new attention interface 😄

vasqu · 2025-04-22T15:13:13Z

ccing @molbap for mamba2/bamba (feels like I'm pinging you constantly 😆)

Signed-off-by: Sukriti-Sharma4 <[email protected]>

alex-jw-brooks · 2025-05-01T00:16:57Z

Thanks @ArthurZucker! It's ready for another look when you get the chance!

ArthurZucker

Very nice use of modular thanks a lot! 🤗

ArthurZucker · 2025-05-01T14:24:50Z

src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

+
+        hidden_states = self.input_layernorm(hidden_states)
+        self_attn_weights = None
+        if self.layer_type == "mamba":


I am thinking let's remove the check on type, rely rather on the check of self.self_attn is not None?

I agree, I also didn't like self.mamba being conditionally undefined. Updated this to define both in __init__ and just check do mamba if self.mamba is not None and attention otherwise 🙂

ArthurZucker · 2025-05-01T14:24:57Z

src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

+        else:
+            raise ValueError(f"Expected layer type in ['attention', 'mamba'], got {self.layer_type}")


still todo 😉

ArthurZucker · 2025-05-01T14:25:29Z

src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

+        hidden_states = self.post_attention_layernorm(hidden_states)
+        moe_hidden_states, router_logits = self.block_sparse_moe(hidden_states)
+
+        if self.shared_mlp is None:


I don't know if you answered or not, is there two different checkpoint being released, one with / one without this?

The models that are about to come out do use it! I think there are likely experiments ongoing without it, but am not sure about concrete plans for when they'll be released since I'm not the one training the models 🙂

In that case lets remove what's uncertain! 🤗

Sounds good! Removed the case with 0 experts, I'll open a follow-up PR if it ends up being used in a model to be released 😄

ArthurZucker · 2025-05-01T14:25:53Z

src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

+            if self.gradient_checkpointing and self.training:
+                layer_outputs = self._gradient_checkpointing_func(
+                    decoder_layer.__call__,
+                    hidden_states,
+                    layer_mask,
+                    past_key_values,
+                    output_attentions,
+                    use_cache,
+                    cache_position,
+                    output_router_logits,
+                    position_embeddings,
+                )
+            else:
+                layer_outputs = decoder_layer(
+                    hidden_states,
+                    attention_mask=layer_mask,
+                    past_key_value=past_key_values,
+                    output_attentions=output_attentions,
+                    use_cache=use_cache,
+                    cache_position=cache_position,
+                    output_router_logits=output_router_logits,
+                    position_embeddings=position_embeddings,
+                )


let's use the new GradientCHeckpointingLayer wdyt?

Definitely, that is a lot cleaner! I updated the models in the chain for modular to all use the gradient checkpointing layer (GraniteMoe/GraniteMoeShared/GraniteMoeHybrid)

ArthurZucker · 2025-05-01T14:26:15Z

src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py

+        if not return_dict:
+            return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)


we have a @can_return_tuple for the forward

Suggested change

if not return_dict:

return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)

Nice! Added

ArthurZucker · 2025-05-01T14:27:10Z

tests/models/granitemoehybrid/test_modeling_granitemoehybrid.py

+    )
+
+
+class GraniteMoeHybridModelTester:


can we try to inherit tests from closes model so mambda in the same fashion as here

transformers/tests/models/gemma2/test_modeling_gemma2.py

Line 51 in 86d38f1

class Gemma2ModelTester(GemmaModelTester):

Good idea! The closest models are for the tests are bamba. Consolidated a bit to use Bamba tests, should be way easier to look at now 🤞

berserkr

transformers/src/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py

Line 1199 in 8274d2c

std = self.config.initializer_range

- std initialized twice - std = self.config.initializer_range

align test init delete more tests Use common layer init with bamba tests finish test consolidation

alex-jw-brooks · 2025-05-01T21:56:38Z

Thanks @berserkr! There were two because of modular expanding the superclass implementation that also set it. Updated to just pass the config value directly so it's less weird looking 🙂

alex-jw-brooks · 2025-05-01T22:17:28Z

Thank you very much for the fast review @ArthurZucker! I've made all the changes 🙂

ArthurZucker

Marvelous ! Merging once the build PR passes! (should be easy to fix!)

alex-jw-brooks · 2025-05-02T14:14:21Z

Thanks @ArthurZucker! Added the missing TOC entry and removed the currently unused shared condition for the MLP, should pass now! 🤞

* initial config and MLA layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * first pass at decoder Signed-off-by: Sukriti-Sharma4 <[email protected]> * completion of layers Signed-off-by: Sukriti-Sharma4 <[email protected]> * modeling class Signed-off-by: Sukriti-Sharma4 <[email protected]> * adding hybrid class to imports Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix imports granitemoehybrid Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix granitehybrid imports Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix granitehybrid import Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix generated modeling file Signed-off-by: Sukriti-Sharma4 <[email protected]> * add some comments Signed-off-by: Sukriti-Sharma4 <[email protected]> * minor fixes in layers Signed-off-by: Sukriti-Sharma4 <[email protected]> * add sharedMLP layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * correct layer names Signed-off-by: Sukriti-Sharma4 <[email protected]> * fixes in mamba config Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix mamba config Signed-off-by: Sukriti-Sharma4 <[email protected]> * change name of MLP layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix seq mizer layers Signed-off-by: Sukriti-Sharma4 <[email protected]> * correct mamba config Signed-off-by: Sukriti-Sharma4 <[email protected]> * fixes in param names Signed-off-by: Sukriti-Sharma4 <[email protected]> * enable hybrid model Signed-off-by: Sukriti-Sharma4 <[email protected]> * update config Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix config granite hybrid Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix attention layer Signed-off-by: Sukriti-Sharma4 <[email protected]> * cleanup to re-use mamba code Signed-off-by: Sukriti-Sharma4 <[email protected]> * keep layer types Signed-off-by: Sukriti-Sharma4 <[email protected]> * attention bias cleanup Signed-off-by: Sukriti-Sharma4 <[email protected]> * update mamba layer name Signed-off-by: Sukriti-Sharma4 <[email protected]> * first pass at tests Signed-off-by: Sukriti-Sharma4 <[email protected]> * first pass at tests Signed-off-by: Sukriti-Sharma4 <[email protected]> * use granite attention Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix: self attn weights Signed-off-by: Sukriti-Sharma4 <[email protected]> * pass at making pos_emb optional Signed-off-by: Sukriti-Sharma4 <[email protected]> * initialize self_attn only as needed Signed-off-by: Sukriti-Sharma4 <[email protected]> * overwrite forward to create HybridMambaCache Signed-off-by: Sukriti-Sharma4 <[email protected]> * Log invalid layer types * Add attention outputs test * Only emit attentions/logits if not None * Fix config test hidden size divisibility * mark granitmoehybrid as stateful * Initialize mamba convolutional layers * Formatting fixes * config docstring, removed some unused attrs * Fix missing arg in models test * Fix create and check decoder model test * support logits to keep in granitemoe * regen to pass logits_to_keep * Allow None or rope * Fix gradient checkpointing * Add granitemoehybrid as special cache for generate check * Remove unused MLA refs * Fix mamba layer mask * Remove logits to keep from config * Minor docstring nits * Update licenses * Enable cache by default * map layer types to layer block type * First pass at granite moe hybrid docs * Ignore granite moe hybrid in valid checkpoint check * Align attention interfaces * regenerate modular granitemoeshared attention interface * Align granite moe hybrid attn interface * run formatting * Handle mamba initialization * avoid conditional attr defs * Move hybrid layer validation to config * Add placeholder integration tests * Docs nits / Update model names * Clean up forward conditions * Use gradient checkpointing layer * Remove some copied bamba tests + inherit align test init delete more tests Use common layer init with bamba tests finish test consolidation * avoid redundant intermediate std var * use @can_return_tuple * Remove unused moe state * make skipped test names consistent * Fix docstring order * Add missing toc * Always create the shared mlp * Fix name in docstring * link preview model in docs --------- Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Alex-Brooks <[email protected]>

vasqu reviewed Apr 22, 2025

View reviewed changes

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch 5 times, most recently from ac9b018 to d751d26 Compare April 29, 2025 22:52

s3woz mentioned this pull request Apr 30, 2025

[Model] Add GraniteMoeHybrid 4.0 model vllm-project/vllm#17461

Closed

Ssukriti added 21 commits April 30, 2025 12:53

initial config and MLA layer

1e92e6d

Signed-off-by: Sukriti-Sharma4 <[email protected]>

first pass at decoder

c4f8051

Signed-off-by: Sukriti-Sharma4 <[email protected]>

completion of layers

4966ac1

Signed-off-by: Sukriti-Sharma4 <[email protected]>

modeling class

721645a

Signed-off-by: Sukriti-Sharma4 <[email protected]>

adding hybrid class to imports

570147a

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix imports granitemoehybrid

3fcb2bf

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix granitehybrid imports

1d5b29e

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix granitehybrid import

ff2f4e0

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix generated modeling file

a69ef3d

Signed-off-by: Sukriti-Sharma4 <[email protected]>

add some comments

846a507

Signed-off-by: Sukriti-Sharma4 <[email protected]>

minor fixes in layers

69c061e

Signed-off-by: Sukriti-Sharma4 <[email protected]>

add sharedMLP layer

d5e310c

Signed-off-by: Sukriti-Sharma4 <[email protected]>

correct layer names

e7bad48

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fixes in mamba config

5d5a87a

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix mamba config

711fc62

Signed-off-by: Sukriti-Sharma4 <[email protected]>

change name of MLP layer

8177217

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fix seq mizer layers

9790fbe

Signed-off-by: Sukriti-Sharma4 <[email protected]>

correct mamba config

3151198

Signed-off-by: Sukriti-Sharma4 <[email protected]>

fixes in param names

e9c145a

Signed-off-by: Sukriti-Sharma4 <[email protected]>

enable hybrid model

c32b8b0

Signed-off-by: Sukriti-Sharma4 <[email protected]>

update config

018decd

Signed-off-by: Sukriti-Sharma4 <[email protected]>

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch from 48b1c23 to a70d949 Compare April 30, 2025 19:29

s3woz mentioned this pull request Apr 30, 2025

[Model] Add GraniteMoeHybrid 4.0 model vllm-project/vllm#17497

Merged

Docs nits / Update model names

8274d2c

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch from a70d949 to 8274d2c Compare April 30, 2025 22:40

Ssukriti marked this pull request as ready for review April 30, 2025 23:43

ArthurZucker approved these changes May 1, 2025

View reviewed changes

alex-jw-brooks added 2 commits May 1, 2025 17:10

Clean up forward conditions

d43ffcf

Use gradient checkpointing layer

c6679ab

berserkr reviewed May 1, 2025

View reviewed changes

alex-jw-brooks added 4 commits May 1, 2025 21:02

Remove some copied bamba tests + inherit

f4dca66

align test init delete more tests Use common layer init with bamba tests finish test consolidation

avoid redundant intermediate std var

1672a75

use @can_return_tuple

74ef853

Remove unused moe state

f12c856

make skipped test names consistent

156a682

Fix docstring order

91f176e

ArthurZucker approved these changes May 2, 2025

View reviewed changes

alex-jw-brooks added 3 commits May 2, 2025 13:40

Add missing toc

ff7dae2

Always create the shared mlp

d9cf0cc

Fix name in docstring

1c0272a

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch from 6b0ba0c to 1c0272a Compare May 2, 2025 14:04

gabe-l-hart mentioned this pull request May 2, 2025

Feature Request: Granite 4 Support ggml-org/llama.cpp#13275

Open

16 tasks

alex-jw-brooks mentioned this pull request May 5, 2025

Add support for IBM Granite-4.0-Tiny-Preview ollama/ollama#10557

Open

link preview model in docs

bd3081a

ArthurZucker merged commit 471958b into huggingface:main May 6, 2025
18 checks passed

donpellegrino mentioned this pull request May 16, 2025

Add support for IBM Granite 4.0 Tiny Preview tracel-ai/models#71

Open

		else:
		raise ValueError(f"Expected layer type in ['attention', 'mamba'], got {self.layer_type}")

		if not return_dict:
		return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)

Add GraniteMoeHybrid support for 4.0 #37658

Add GraniteMoeHybrid support for 4.0 #37658

Uh oh!

Conversation

Ssukriti commented Apr 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Apr 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented Apr 22, 2025

Uh oh!

alex-jw-brooks commented May 1, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berserkr left a comment

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented May 1, 2025

Uh oh!

alex-jw-brooks commented May 1, 2025

Uh oh!

ArthurZucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

alex-jw-brooks May 1, 2025 •

edited

Loading

ArthurZucker left a comment •

edited

Loading