Support GGUF models #98

vladfaust · 2024-08-12T07:38:09Z

See vllm-project/vllm#1002, vllm-project/vllm#5191.

Should be able to set gguf as QUANTIZATION envar, but we also need to provide exact quant. I'm thinking of some MODEL_FILENAME envar containing the exact filename in the model's repository. The model download logic shall be changed, see https://github.com/Isotr0py/vllm/blob/main/examples/gguf_inference.py.

The text was updated successfully, but these errors were encountered:

nerdylive123 · 2024-12-11T15:08:22Z

yeah i guess now we dont need set quantization env var but we need to support it like from this one

mohamednaji7 · 2025-01-20T22:13:53Z

@nerdylive123 How could I implement GGUF for my fork?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GGUF models #98

Support GGUF models #98

vladfaust commented Aug 12, 2024

nerdylive123 commented Dec 11, 2024

mohamednaji7 commented Jan 20, 2025

Support GGUF models #98

Support GGUF models #98

Comments

vladfaust commented Aug 12, 2024

nerdylive123 commented Dec 11, 2024

mohamednaji7 commented Jan 20, 2025