adding Context Length Specialization (CCL) #388

quic-vjanfaza · 2025-05-05T23:02:39Z

No description provided.

Signed-off-by: vjanfaza <[email protected]>

quic-rishinr · 2025-05-06T15:14:29Z

QEfficient/transformers/models/modeling_auto.py

@@ -1388,6 +1389,9 @@ def from_pretrained(

        kv_offload = kwargs.pop("kv_offload", None)

+        comp_ctx_lengths = kwargs.pop("comp_ctx_lengths", None)
+        cls.comp_ctx_lengths = comp_ctx_lengths


This is not needed. You can pass the comp_ctx_lengths=comp_ctx_lengths as a kwarg in line 1407 while instantiating a class.

We can't remove these lines since we call cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, *args, **kwargs) in line 1396 which is looking for comp_ctx_lengths in the input arguments and causes error.

quic-rishinr · 2025-05-06T15:15:35Z

QEfficient/transformers/models/modeling_auto.py

@@ -1422,7 +1426,7 @@ def model_hash(self) -> str:
    def get_model_config(self) -> dict:
        return self.model.config.__dict__

-    def export(self, export_dir: Optional[str] = None) -> str:
+    def export(self, comp_ctx_lengths: Optional[List[int]] = None, export_dir: Optional[str] = None) -> str:


Since comp_ctx_lengths is an instance variable we would not need to have an argument here.

quic-rishinr · 2025-05-06T15:17:15Z

QEfficient/transformers/models/modeling_auto.py

@@ -1442,10 +1446,12 @@ def export(self, export_dir: Optional[str] = None) -> str:
            "input_ids": torch.zeros((bs, seq_len), dtype=torch.int64),
            "position_ids": torch.arange(seq_len, dtype=torch.int64).view(1, seq_len).repeat(bs, 1),
            "past_key_values": [[] for _ in range(self.num_layers)],
+            "comp_ctx_lengths": torch.randint(0, 100, (40,), dtype=torch.long),


Do we want to add example inputs and dynamic_axes by default? shouldnt we check if comp_ctx_lengths is not None and then add it?

No, we don't need to do that. Since, I generalized the code to use the same onnx file for both with CCL and the default without CCL experiments. This will reduce the number of changes in model file as well and will prevent having multiple onnx generation for with and without CCL experiments.

quic-rishinr · 2025-05-06T16:20:09Z

QEfficient/generation/text_generation_inference.py

+        max_ccl_id = len(self.comp_ctx_lengths) - 1
+        max_position_id = np.max(decode_inputs["position_ids"])
+        ccl_id = 1
+        for i in range(1, len(self.comp_ctx_lengths)):


can we have a reverse list and pop out the last value if max_position_id < self.comp_ctx_lengths[-1]? this way we can avoid the loop

Why should we check with the last element? Each request can be finished in different position_id and we need to check to find the most suitable CCL window to get the best performance. This for loop only happens at the end of a request and it's an order of length(CCL) that can't be more than a few values because of compiler limitation in the number of specializations.

quic-rishinr · 2025-05-06T16:24:46Z

QEfficient/generation/text_generation_inference.py

+                            ccl_id = 1
+                            for i in range(1, len(self.comp_ctx_lengths)):
+                                if max_position_id < self.comp_ctx_lengths[i]:
+                                    ccl_id = i


same as above

adding Context Length Specialization (CCL)

6f49752

Signed-off-by: vjanfaza <[email protected]>

quic-vjanfaza requested review from quic-rishinr, ochougul and quic-amitraj as code owners May 5, 2025 23:02

adding Context Length Specialization (CCL)

bead10a

Signed-off-by: vjanfaza <[email protected]>

quic-rishinr requested changes May 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding Context Length Specialization (CCL) #388

adding Context Length Specialization (CCL) #388

quic-vjanfaza commented May 5, 2025

quic-rishinr May 6, 2025

quic-vjanfaza May 6, 2025

quic-rishinr May 6, 2025

quic-rishinr May 6, 2025

quic-vjanfaza May 6, 2025

quic-rishinr May 6, 2025

quic-vjanfaza May 6, 2025

quic-rishinr May 6, 2025

adding Context Length Specialization (CCL) #388

Are you sure you want to change the base?

adding Context Length Specialization (CCL) #388

Conversation

quic-vjanfaza commented May 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment