Finite lorax support #153

quic-jouachen · 2024-10-10T18:02:18Z

Use case: Users can activate multiple LoRA adapters and compile them with base model. At runtime, they can specify which prompt should utilize which adapter, allowing for mixed adapter usage within the same batch. (refer to examples/lora_models.py for more details)
These changes are built for both continuous batching and regular inference scenario.

QEfficient/exporter/export_hf_to_cloud_ai_100.py

QEfficient/exporter/export_utils.py

QEfficient/lora/auto.py

tests/lora/test_lora_model.py

QEfficient/lora/layers.py

QEfficient/generation/text_generation_inference.py

QEfficient/lora/auto.py

QEfficient/lora/layers.py

QEfficient/generation/text_generation_inference.py

quic-rishinr · 2024-10-18T10:19:53Z

@irajagop @ochougul please review

ochougul

I looked at the way transformers and peft packages handle loading and doing inference with adapters.

All the methods for loading and running inference with adapters are implemented in PeftModel which is inherited by AutoPeftModel. Instance of AutoPeftModel is returned by AutoPeftModelForCausalLM.from_pretrained
transformers AutoModelForCausalLM inherits PeftAdapterMixin that provides the same methods and in turn uses low level APIs from peft package to provide the simlar utility of loading adapter and running inference.

I don't think there is a need to make a new QEffAutoLoraModelForCausalLM class, that is not known to users as most of the users will be using peft, transformers packages in their day-to-day job.

It would be best to keep the interface same as peft, transformers packages unless there is an absolute need of new interface. Let's have a discussion if needed.

quic-jouachen · 2024-11-06T17:41:35Z

I looked at the way transformers and peft packages handle loading and doing inference with adapters.

All the methods for loading and running inference with adapters are implemented in PeftModel which is inherited by AutoPeftModel. Instance of AutoPeftModel is returned by AutoPeftModelForCausalLM.from_pretrained

transformers AutoModelForCausalLM inherits PeftAdapterMixin that provides the same methods and in turn uses low level APIs from peft package to provide the simlar utility of loading adapter and running inference.

I don't think there is a need to make a new QEffAutoLoraModelForCausalLM class, that is not known to users as most of the users will be using peft, transformers packages in their day-to-day job.

It would be best to keep the interface same as peft, transformers packages unless there is an absolute need of new interface. Let's have a discussion if needed.

Edited (Nov.12): Thanks, Onkar! We have aligned the API changes as suggested and discussed in the email. We would appreciate your review once again.

Signed-off-by: Jou-An Chen <[email protected]>

ochougul

review is still WIP

QEfficient/lora/auto.py

QEfficient/generation/text_generation_inference.py

QEfficient/lora/auto.py

QEfficient/generation/text_generation_inference.py

Signed-off-by: Jou-An Chen <[email protected]>

QEfficient/lora/auto.py

ochougul

Can you please move lora module inside peft module so the directory structure can be

peft
|__lora

QEfficient/lora/auto.py

QEfficient/peft/auto.py

tests/lora/test_lora_model.py

quic-jouachen · 2024-11-15T19:25:11Z

Can you please move lora module inside peft module so the directory structure can be
peft
|__lora

Have moved as requested. I also moved the test file in the test folder: test/lora ==> test/peft/lora

ochougul

Do you want the adapter_name as mandatory argument when finite_adapters=True?
If not passed, in current code it raises an error.
If you want it to be mandatory argument, please raise TypeError("required adapter name argument") , also do you always want user to pass it without keyword can we handle the case when uses passes QEffAutoPeftModelForCausalLM.from_pretrained("asdfjsdk", adapter_name="---", finite_adapters=True)

quic-jouachen · 2024-11-18T18:33:04Z

raise TypeError("required adapter name argument")

I've added a check in the QEffAutoPeftModelForCausalLM to see if adapter_name is passed in and if it's a string; otherwise, will raise type error. I also added additional check in the unit test.

Signed-off-by: Jou-An Chen <[email protected]>

…a tests to use single layer models Signed-off-by: Onkar Chougule <[email protected]>

Signed-off-by: Onkar Chougule <[email protected]>

ochougul

LGTM

* Initial commit for finite loras implementation Signed-off-by: Jou-An Chen <[email protected]> * Remove set delete adapter, add init assertion, update LinearMultiLoRA Signed-off-by: Jou-An Chen <[email protected]> * Fix base model inference index INTMAX issue Signed-off-by: Jou-An Chen <[email protected]> * Addressed review comments Signed-off-by: Jou-An Chen <[email protected]> * Rebase on PR116 and make API changes Signed-off-by: Jou-An Chen <[email protected]> * Enable init from QEffAutoPeftModelForCausalLM with finite_adapters flag Signed-off-by: Jou-An Chen <[email protected]> * Address review comments Signed-off-by: Jou-An Chen <[email protected]> * allow adapter_name passed as keyword argument, updated all finite lora tests to use single layer models Signed-off-by: Onkar Chougule <[email protected]> * added pytest on_qaic marker for lora test using AI_100 device Signed-off-by: Onkar Chougule <[email protected]> --------- Signed-off-by: Jou-An Chen <[email protected]> Signed-off-by: Onkar Chougule <[email protected]> Co-authored-by: Onkar Chougule <[email protected]>

quic-jouachen requested review from quic-rishinr and ochougul as code owners October 10, 2024 18:02

quic-jouachen force-pushed the finiteloras branch 3 times, most recently from 3c3615e to 3c33e99 Compare October 10, 2024 19:20

quic-mohitgupta reviewed Oct 10, 2024

View reviewed changes

quic-jouachen force-pushed the finiteloras branch 2 times, most recently from 24ddfc6 to 5885208 Compare October 11, 2024 18:09

quic-mohitgupta reviewed Oct 11, 2024

View reviewed changes

QEfficient/lora/layers.py Outdated Show resolved Hide resolved

quic-jouachen force-pushed the finiteloras branch 3 times, most recently from 7e632e2 to 8ed2088 Compare October 14, 2024 20:40

quic-rishinr requested a review from irajagop October 16, 2024 08:33

quic-rishinr added the in-review Review process is ongoing label Oct 16, 2024

quic-jouachen force-pushed the finiteloras branch 4 times, most recently from ad27d0c to c3c00fc Compare October 18, 2024 00:03

quic-rishinr requested changes Oct 18, 2024

View reviewed changes

quic-jouachen force-pushed the finiteloras branch 3 times, most recently from 162711b to da19a3b Compare October 21, 2024 18:01

quic-jouachen force-pushed the finiteloras branch 2 times, most recently from ce1158f to d3fd512 Compare November 5, 2024 20:34

quic-jouachen requested a review from quic-rishinr November 5, 2024 20:37

ochougul requested changes Nov 6, 2024

View reviewed changes

quic-jouachen force-pushed the finiteloras branch 2 times, most recently from 30ba3a9 to 892dfaf Compare November 12, 2024 19:20

Rebase on PR116 and make API changes

522355a

Signed-off-by: Jou-An Chen <[email protected]>

quic-jouachen force-pushed the finiteloras branch 4 times, most recently from 5fd9d2e to b32ba94 Compare November 13, 2024 06:06

ochougul requested changes Nov 13, 2024

View reviewed changes

Enable init from QEffAutoPeftModelForCausalLM with finite_adapters flag

96ce832

Signed-off-by: Jou-An Chen <[email protected]>

quic-jouachen force-pushed the finiteloras branch from b32ba94 to 96ce832 Compare November 14, 2024 01:17

quic-rishinr requested changes Nov 14, 2024

View reviewed changes

QEfficient/lora/auto.py Outdated Show resolved Hide resolved

QEfficient/lora/auto.py Outdated Show resolved Hide resolved

QEfficient/lora/auto.py Outdated Show resolved Hide resolved

QEfficient/lora/auto.py Outdated Show resolved Hide resolved

quic-jouachen force-pushed the finiteloras branch 2 times, most recently from cdf5f6b to 11393f9 Compare November 14, 2024 22:03

quic-jouachen requested review from quic-rishinr and ochougul November 14, 2024 22:06

ochougul requested changes Nov 15, 2024

View reviewed changes

QEfficient/lora/auto.py Outdated Show resolved Hide resolved

QEfficient/lora/auto.py Outdated Show resolved Hide resolved

QEfficient/peft/auto.py Show resolved Hide resolved

tests/lora/test_lora_model.py Outdated Show resolved Hide resolved

quic-jouachen force-pushed the finiteloras branch 2 times, most recently from 8d48e39 to 7893a49 Compare November 15, 2024 19:21

quic-jouachen requested a review from ochougul November 15, 2024 19:25

ochougul requested changes Nov 18, 2024

View reviewed changes

quic-jouachen force-pushed the finiteloras branch from 7893a49 to acca814 Compare November 18, 2024 18:35

quic-jouachen requested a review from ochougul November 18, 2024 18:35

Address review comments

c7a10b8

Signed-off-by: Jou-An Chen <[email protected]>

quic-jouachen force-pushed the finiteloras branch from acca814 to c7a10b8 Compare November 18, 2024 21:07

ochougul added 2 commits November 19, 2024 13:18

allow adapter_name passed as keyword argument, updated all finite lor…

e31568b

…a tests to use single layer models Signed-off-by: Onkar Chougule <[email protected]>

added pytest on_qaic marker for lora test using AI_100 device

42a240d

Signed-off-by: Onkar Chougule <[email protected]>

ochougul approved these changes Nov 19, 2024

View reviewed changes

ochougul merged commit 34386ed into quic:main Nov 19, 2024
4 checks passed

abukhoy mentioned this pull request Nov 21, 2024

readme update #181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finite lorax support #153

Finite lorax support #153

quic-jouachen commented Oct 10, 2024 •

edited

Loading

quic-rishinr commented Oct 18, 2024

ochougul left a comment

quic-jouachen commented Nov 6, 2024 •

edited

Loading

ochougul left a comment

ochougul left a comment

quic-jouachen commented Nov 15, 2024

ochougul left a comment

quic-jouachen commented Nov 18, 2024 •

edited

Loading

ochougul left a comment

Finite lorax support #153

Finite lorax support #153

Conversation

quic-jouachen commented Oct 10, 2024 • edited Loading

quic-rishinr commented Oct 18, 2024

ochougul left a comment

Choose a reason for hiding this comment

quic-jouachen commented Nov 6, 2024 • edited Loading

ochougul left a comment

Choose a reason for hiding this comment

ochougul left a comment

Choose a reason for hiding this comment

quic-jouachen commented Nov 15, 2024

ochougul left a comment

Choose a reason for hiding this comment

quic-jouachen commented Nov 18, 2024 • edited Loading

ochougul left a comment

Choose a reason for hiding this comment

quic-jouachen commented Oct 10, 2024 •

edited

Loading

quic-jouachen commented Nov 6, 2024 •

edited

Loading

quic-jouachen commented Nov 18, 2024 •

edited

Loading