Skip to content

Commit d037c8f

Browse files
compiladehodlen
authored andcommitted
llama : fix non-quantization of expert gating tensors (ggml-org#5754)
This reverts a single line from ggml-org#5475
1 parent 4471e65 commit d037c8f

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

llama.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11162,7 +11162,8 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
1116211162
quantize &= !params->only_copy;
1116311163

1116411164
// do not quantize expert gating tensors
11165-
quantize &= name != LLM_TN(model.arch)(LLM_TENSOR_FFN_GATE_INP, "weight");
11165+
// NOTE: can't use LLM_TN here because the layer number is not known
11166+
quantize &= name.find("ffn_gate_inp.weight") == std::string::npos;
1116611167

1116711168
// do not quantize positional embeddings and token types (BERT)
1116811169
quantize &= name != LLM_TN(model.arch)(LLM_TENSOR_POS_EMBD, "weight");

0 commit comments

Comments
 (0)