[Bug]: Quantization In MambaMixer2 Not Supported when Tensor Parallel is enabled #14618
Open
1 task done
Labels
bug
Something isn't working
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
The current implementation for TP for Mamba2 is complicated for the
in_proj
, because the gate, projection, state space, heads, are all fused into this one layer. And furthermore, we also need to consider different possibilities if the number of groups divide the number of heads or not, see #13660.For now the implementation of TP is simplified:
num_groups == 1
ifnum_groups
does not dividenum_heads
#13660.However for large models, it may be useful to support TP > 1 with quant layers, even in some special cases of
num_heads
andnum_groups
. cc: @tlrmchlsmthBefore submitting a new issue...
The text was updated successfully, but these errors were encountered: