-
Notifications
You must be signed in to change notification settings - Fork 43
Finite lorax support #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finite lorax support #153
Conversation
quic-jouachen
commented
Oct 10, 2024
•
edited
Loading
edited
- Use case: Users can activate multiple LoRA adapters and compile them with base model. At runtime, they can specify which prompt should utilize which adapter, allowing for mixed adapter usage within the same batch. (refer to examples/lora_models.py for more details)
- These changes are built for both continuous batching and regular inference scenario.
3c3615e
to
3c33e99
Compare
24ddfc6
to
5885208
Compare
7e632e2
to
8ed2088
Compare
ad27d0c
to
c3c00fc
Compare
162711b
to
da19a3b
Compare
ce1158f
to
d3fd512
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the way transformers and peft packages handle loading and doing inference with adapters.
- All the methods for loading and running inference with adapters are implemented in
PeftModel
which is inherited byAutoPeftModel
. Instance ofAutoPeftModel
is returned byAutoPeftModelForCausalLM.from_pretrained
- transformers
AutoModelForCausalLM
inheritsPeftAdapterMixin
that provides the same methods and in turn uses low level APIs from peft package to provide the simlar utility of loading adapter and running inference.
I don't think there is a need to make a new QEffAutoLoraModelForCausalLM
class, that is not known to users as most of the users will be using peft
, transformers
packages in their day-to-day job.
It would be best to keep the interface same as peft
, transformers
packages unless there is an absolute need of new interface. Let's have a discussion if needed.
Edited (Nov.12): Thanks, Onkar! We have aligned the API changes as suggested and discussed in the email. We would appreciate your review once again. |
30ba3a9
to
892dfaf
Compare
Signed-off-by: Jou-An Chen <[email protected]>
5fd9d2e
to
b32ba94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review is still WIP
Signed-off-by: Jou-An Chen <[email protected]>
b32ba94
to
96ce832
Compare
cdf5f6b
to
11393f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move lora module inside peft module so the directory structure can be
peft
|__lora
8d48e39
to
7893a49
Compare
Have moved as requested. I also moved the test file in the test folder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want the adapter_name
as mandatory argument when finite_adapters=True
?
If not passed, in current code it raises an error.
If you want it to be mandatory argument, please raise TypeError("required adapter name argument")
, also do you always want user to pass it without keyword can we handle the case when uses passes QEffAutoPeftModelForCausalLM.from_pretrained("asdfjsdk", adapter_name="---", finite_adapters=True)
I've added a check in the |
7893a49
to
acca814
Compare
Signed-off-by: Jou-An Chen <[email protected]>
acca814
to
c7a10b8
Compare
…a tests to use single layer models Signed-off-by: Onkar Chougule <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Initial commit for finite loras implementation Signed-off-by: Jou-An Chen <[email protected]> * Remove set delete adapter, add init assertion, update LinearMultiLoRA Signed-off-by: Jou-An Chen <[email protected]> * Fix base model inference index INTMAX issue Signed-off-by: Jou-An Chen <[email protected]> * Addressed review comments Signed-off-by: Jou-An Chen <[email protected]> * Rebase on PR116 and make API changes Signed-off-by: Jou-An Chen <[email protected]> * Enable init from QEffAutoPeftModelForCausalLM with finite_adapters flag Signed-off-by: Jou-An Chen <[email protected]> * Address review comments Signed-off-by: Jou-An Chen <[email protected]> * allow adapter_name passed as keyword argument, updated all finite lora tests to use single layer models Signed-off-by: Onkar Chougule <[email protected]> * added pytest on_qaic marker for lora test using AI_100 device Signed-off-by: Onkar Chougule <[email protected]> --------- Signed-off-by: Jou-An Chen <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Onkar Chougule <[email protected]>