-
Notifications
You must be signed in to change notification settings - Fork 43
VLM Pipeline for Model Onboarding through QEff #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Dipankar Sarkar <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
Signed-off-by: Dipankar Sarkar <[email protected]>
import torch.nn.functional as F | ||
import torch.utils.checkpoint | ||
import transformers | ||
from einops import rearrange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, please remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REVIEW WIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
# if repl_module := cls._module_mapping.get(type(module)): | ||
if repl_module := cls._module_mapping.get(module.__class__.__name__): | ||
module.__class__ = repl_module | ||
# Handling the __init__ calls in the models | ||
if hasattr(module, "__qeff_init__"): | ||
module.__qeff_init__() | ||
transformed = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a new transform named like:
class ModuleMappingViaStringAndClassMatchTransform:
_module_mapping_via_class: Dict[Type[nn.Module], Type[nn.Module]]
_module_mapping_via_string: Dict[string, Type[nn.Module]]
@classmethod
def apply(cls, model):
transformed=False
for module in model.modules():
if repl_module := cls._module_mapping_via_class.get(type(module)):
# replace the class here
elif repl_module := cls._module_mapping_via_string.get(type(module)):
# replace the class here
Create two different dicts basically.
And write a test that makes sure the keys on the two dicts don't match.
def get_num_layers_vlm(config): | ||
if hasattr(config, "architectures") and "LlavaForConditionalGeneration" in config.architectures: | ||
num_layers = config.text_config.num_hidden_layers | ||
return num_layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we reuse existing method named get_num_layers_from_config
and pass model.config.text_config
to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some models it is text_config
, In some it is llm_config
, txt_config
etc. Hence adding it as a new function for vlm architecture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it separate to avoid cluttering it with multiple conditions inside same function?
def get_padding_shape_vlm(config, batch_size=1): | ||
if hasattr(config, "architectures") and "LlavaForConditionalGeneration" in config.architectures: | ||
n_heads = config.text_config.num_key_value_heads | ||
d_head = config.text_config.hidden_size // config.text_config.num_attention_heads | ||
padding_shape = [batch_size, n_heads, Constants.CTX_LEN_VLM, d_head] | ||
return padding_shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above is this new method required? Can't we reuse existing one?
# InternVL | ||
"InternVLChatModel": QEffInternVLChatModel, | ||
"InternVisionEmbeddings": QEffInternVisionEmbeddings, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please create different transform as mentioned above and separate this dict
model_config["n_layer_text"] = 1 | ||
model_config["n_layer_vision"] = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not go in library code. It is allowed only in tests
Signed-off-by: Dipankar Sarkar <[email protected]>
Already addressed in #267 |
Features Added
1.Original modeling files removed for Intern. Generic Solution for Models not part of transformers.
2.Used Model Wrapper inside modeling files to put the generate_inputs functions. Calls will be made based on model at pretrained.
3.Constant file updated.
4.Removed pytorch generate from modeling_auto
5.General Clean up of code done
Tested and Verified on
TODO