Skip to content

Finite lorax support #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 19, 2024
Merged

Finite lorax support #153

merged 9 commits into from
Nov 19, 2024

Conversation

quic-jouachen
Copy link
Contributor

@quic-jouachen quic-jouachen commented Oct 10, 2024

  • Use case: Users can activate multiple LoRA adapters and compile them with base model. At runtime, they can specify which prompt should utilize which adapter, allowing for mixed adapter usage within the same batch. (refer to examples/lora_models.py for more details)
  • These changes are built for both continuous batching and regular inference scenario.

@quic-jouachen quic-jouachen force-pushed the finiteloras branch 3 times, most recently from 3c3615e to 3c33e99 Compare October 10, 2024 19:20
@quic-jouachen quic-jouachen force-pushed the finiteloras branch 2 times, most recently from 24ddfc6 to 5885208 Compare October 11, 2024 18:09
@quic-jouachen quic-jouachen force-pushed the finiteloras branch 3 times, most recently from 7e632e2 to 8ed2088 Compare October 14, 2024 20:40
@quic-rishinr quic-rishinr requested a review from irajagop October 16, 2024 08:33
@quic-rishinr quic-rishinr added the in-review Review process is ongoing label Oct 16, 2024
@quic-jouachen quic-jouachen force-pushed the finiteloras branch 4 times, most recently from ad27d0c to c3c00fc Compare October 18, 2024 00:03
@quic-rishinr
Copy link
Contributor

@irajagop @ochougul please review

@quic-jouachen quic-jouachen force-pushed the finiteloras branch 3 times, most recently from 162711b to da19a3b Compare October 21, 2024 18:01
@quic-jouachen quic-jouachen force-pushed the finiteloras branch 2 times, most recently from ce1158f to d3fd512 Compare November 5, 2024 20:34
Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the way transformers and peft packages handle loading and doing inference with adapters.

  • All the methods for loading and running inference with adapters are implemented in PeftModel which is inherited by AutoPeftModel. Instance of AutoPeftModel is returned by AutoPeftModelForCausalLM.from_pretrained
  • transformers AutoModelForCausalLM inherits PeftAdapterMixin that provides the same methods and in turn uses low level APIs from peft package to provide the simlar utility of loading adapter and running inference.

I don't think there is a need to make a new QEffAutoLoraModelForCausalLM class, that is not known to users as most of the users will be using peft, transformers packages in their day-to-day job.

It would be best to keep the interface same as peft, transformers packages unless there is an absolute need of new interface. Let's have a discussion if needed.

@quic-jouachen
Copy link
Contributor Author

quic-jouachen commented Nov 6, 2024

I looked at the way transformers and peft packages handle loading and doing inference with adapters.

  • All the methods for loading and running inference with adapters are implemented in PeftModel which is inherited by AutoPeftModel. Instance of AutoPeftModel is returned by AutoPeftModelForCausalLM.from_pretrained
  • transformers AutoModelForCausalLM inherits PeftAdapterMixin that provides the same methods and in turn uses low level APIs from peft package to provide the simlar utility of loading adapter and running inference.

I don't think there is a need to make a new QEffAutoLoraModelForCausalLM class, that is not known to users as most of the users will be using peft, transformers packages in their day-to-day job.

It would be best to keep the interface same as peft, transformers packages unless there is an absolute need of new interface. Let's have a discussion if needed.

Edited (Nov.12): Thanks, Onkar! We have aligned the API changes as suggested and discussed in the email. We would appreciate your review once again.

@quic-jouachen quic-jouachen force-pushed the finiteloras branch 2 times, most recently from 30ba3a9 to 892dfaf Compare November 12, 2024 19:20
@quic-jouachen quic-jouachen force-pushed the finiteloras branch 4 times, most recently from 5fd9d2e to b32ba94 Compare November 13, 2024 06:06
Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review is still WIP

@quic-jouachen quic-jouachen force-pushed the finiteloras branch 2 times, most recently from cdf5f6b to 11393f9 Compare November 14, 2024 22:03
Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move lora module inside peft module so the directory structure can be

peft
|__lora

@quic-jouachen quic-jouachen force-pushed the finiteloras branch 2 times, most recently from 8d48e39 to 7893a49 Compare November 15, 2024 19:21
@quic-jouachen
Copy link
Contributor Author

Can you please move lora module inside peft module so the directory structure can be

peft
|__lora

Have moved as requested. I also moved the test file in the test folder: test/lora ==> test/peft/lora

Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the adapter_name as mandatory argument when finite_adapters=True?
If not passed, in current code it raises an error.
If you want it to be mandatory argument, please raise TypeError("required adapter name argument") , also do you always want user to pass it without keyword can we handle the case when uses passes QEffAutoPeftModelForCausalLM.from_pretrained("asdfjsdk", adapter_name="---", finite_adapters=True)

@quic-jouachen
Copy link
Contributor Author

quic-jouachen commented Nov 18, 2024

raise TypeError("required adapter name argument")

I've added a check in the QEffAutoPeftModelForCausalLM to see if adapter_name is passed in and if it's a string; otherwise, will raise type error. I also added additional check in the unit test.

Signed-off-by: Jou-An Chen <[email protected]>
Copy link
Contributor

@ochougul ochougul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ochougul ochougul merged commit 34386ed into quic:main Nov 19, 2024
4 checks passed
@abukhoy abukhoy mentioned this pull request Nov 21, 2024
shubhagr-quic pushed a commit to shubhagr-quic/efficient-transformers that referenced this pull request Nov 22, 2024
* Initial commit for finite loras implementation

Signed-off-by: Jou-An Chen <[email protected]>

* Remove set delete adapter, add init assertion, update LinearMultiLoRA

Signed-off-by: Jou-An Chen <[email protected]>

* Fix base model inference index INTMAX issue

Signed-off-by: Jou-An Chen <[email protected]>

* Addressed review comments

Signed-off-by: Jou-An Chen <[email protected]>

* Rebase on PR116 and make API changes

Signed-off-by: Jou-An Chen <[email protected]>

* Enable init from QEffAutoPeftModelForCausalLM with finite_adapters flag

Signed-off-by: Jou-An Chen <[email protected]>

* Address review comments

Signed-off-by: Jou-An Chen <[email protected]>

* allow adapter_name passed as keyword argument, updated all finite lora tests to use single layer models

Signed-off-by: Onkar Chougule <[email protected]>

* added pytest on_qaic marker for lora test using AI_100 device

Signed-off-by: Onkar Chougule <[email protected]>

---------

Signed-off-by: Jou-An Chen <[email protected]>
Signed-off-by: Onkar Chougule <[email protected]>
Co-authored-by: Onkar Chougule <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-review Review process is ongoing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants