-
Notifications
You must be signed in to change notification settings - Fork 43
Mllama(single + dual) + InternVL(single) + Llava (single) #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Amit Raj <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Co-authored-by: Amit Raj <[email protected]> Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Amit Raj <[email protected]>
1. Mllama single qpc support added 2. Simplified generate inputs for single and dual qpc --------- Signed-off-by: Amit Raj <[email protected]> Co-authored-by: asmigosw <[email protected]> Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Amit Raj <[email protected]>
Added support for Laava model single QPC Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]> Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
b9c0bc1
to
b3a5d22
Compare
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Amit Raj <[email protected]>
|
||
class QEFFTransformersBase(QEFFBaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this entire part showing up as diff? possible to clean up the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we moved the QEFFAutoModelForCausalLM
class to the bottom of the file from top as we wanted to support Intern via that.
As we discussed should I chnage it to use QEFFAutoModelImageTextTOText
?
Signed-off-by: Amit Raj <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
24fad68
to
81cea10
Compare
Signed-off-by: Onkar Chougule <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add the description about newly added argument model
2de6982
to
f5efdae
Compare
Signed-off-by: Amit Raj <[email protected]>
f5efdae
to
ad594d7
Compare
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True) | ||
config._attn_implementation = "eager" | ||
config.vision_config.use_flash_attn = "false" | ||
model = cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, config, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are explicitly fetching the config and directly calling the from_pretrained() of AutoModelForImageTextToText. We could have called super.from_pretrained() and then internally it would have handled this
logger.warning("Updating low_cpu_mem_usage=False") | ||
|
||
kwargs.update({"attn_implementation": "eager", "low_cpu_mem_usage": False}) | ||
model = cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making these changes in derived class from_pretained() function shouldn't we make these changes in Base class and then call it from here, it might be good from design perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think we can remove QEFFTransformerBaseClass
we will need a bigger discussion to decide this.
pretrained_model_name_or_path, | ||
*args, | ||
**kwargs, | ||
): | ||
if kwargs.get("attn_implementation", None) not in {None, "eager"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are not calling super.from_pretained() here?, it already has this peice of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we need to change config options in this method that are not needed for other auto clases.
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Adding generalized infrastructure to support VLMs with Dual/single QPC approaches --------- Signed-off-by: Amit Raj <[email protected]> Signed-off-by: Rishin Raj <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Rishin Raj <[email protected]> Co-authored-by: Amit Raj <[email protected]> Co-authored-by: Amit Raj <[email protected]> Co-authored-by: asmigosw <[email protected]> Signed-off-by: Hem Agnihotri <[email protected]>
Adding generalized infrastructure to support VLMs with Dual/single QPC approaches