Standalone Server #21

abetlen · 2023-04-04T18:19:10Z

MillionthOdin16 · 2023-04-04T20:36:06Z

Just a note, I found a package fastapi-code-generator that you can put the OpenAI OpenAPI spec into and it will generate a server skeleton with the correct models and endpoints. Similarly, there are packages that can create test cases for the endpoints based off the API spec. This might save some time and we can return a not implemented error for endpoints that our server doesn't support.

abetlen · 2023-04-05T19:37:18Z

With the latest commit we now handle all the request parameters for the /v1/completions, /v1/chat/completions and /v1/embeddings endpoints. The server accepts additional parameters that are llama.cpp specific and ignores any that we currently don't support.

Last step is really just to bundle this into the PyPI package as a subpackage so it can be installed with pip install llama-cpp-python[server] and then run with python -m llama_cpp.server or something like that.

MillionthOdin16 · 2023-04-05T19:53:42Z

Awesome work! Just FYI, Llama CPP just got some major bug fixes that improves performance in the last hour. There should no longer be a performance degradation as the context size increases. Hopefully this translates into better performance for us too 🔥

abetlen · 2023-04-05T20:03:06Z

Awesome, I'll update the package!

abetlen · 2023-04-05T20:29:20Z

@MillionthOdin16 pushed the updated llama.cpp and the standalone server.

Do you mind testing it for me?

Just update from pip and run MODEL=/path/to/model python3 -m llama_cpp.server

MillionthOdin16 · 2023-04-05T22:39:49Z

@abetlen done! I created a #29 with some fixes, especially for windows. API is super nice. I did experience significantly slower chat_completion performance compared to the other endpoints (as you previously mentioned). But overall, super cool!

MillionthOdin16 · 2023-04-05T22:43:09Z

One extra note on usability, I think it would be nice to pass in the model (and eventually model folder) as an argument to llama_cpp.server instead of using an env var. Would make it more similar to other usages I think

abetlen · 2023-04-06T17:27:15Z

Next steps on the server (no particular order):

multiple models selectable by model parameter
prompt caching (if possible or maybe just hack this with multiple contexts)
server cli options
logprobs
investigate adding /models/{model} endpoint
model aliasing (kind of a hack but could fix some issues)

I'll close this issue and spin these out individually

riverzhou · 2023-06-28T07:41:33Z

I want to save chat log.
What's the best practices for this purpose?

abetlen changed the title ~~Standalone Server as a Subpackage~~ Standalone Server Apr 5, 2023

abetlen pinned this issue Apr 5, 2023

abetlen closed this as completed Apr 6, 2023

abetlen unpinned this issue Apr 6, 2023

xaptronic pushed a commit to xaptronic/llama-cpp-python that referenced this issue Jun 13, 2023

Add LICENSE (abetlen#21)

6a9a67f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone Server #21

Standalone Server #21

abetlen commented Apr 4, 2023 •

edited

Loading

MillionthOdin16 commented Apr 4, 2023

abetlen commented Apr 5, 2023 •

edited

Loading

MillionthOdin16 commented Apr 5, 2023

abetlen commented Apr 5, 2023

abetlen commented Apr 5, 2023 •

edited

Loading

MillionthOdin16 commented Apr 5, 2023

MillionthOdin16 commented Apr 5, 2023 •

edited

Loading

abetlen commented Apr 6, 2023 •

edited

Loading

riverzhou commented Jun 28, 2023

Standalone Server #21

Standalone Server #21

Comments

abetlen commented Apr 4, 2023 • edited Loading

MillionthOdin16 commented Apr 4, 2023

abetlen commented Apr 5, 2023 • edited Loading

MillionthOdin16 commented Apr 5, 2023

abetlen commented Apr 5, 2023

abetlen commented Apr 5, 2023 • edited Loading

MillionthOdin16 commented Apr 5, 2023

MillionthOdin16 commented Apr 5, 2023 • edited Loading

abetlen commented Apr 6, 2023 • edited Loading

riverzhou commented Jun 28, 2023

abetlen commented Apr 4, 2023 •

edited

Loading

abetlen commented Apr 5, 2023 •

edited

Loading

abetlen commented Apr 5, 2023 •

edited

Loading

MillionthOdin16 commented Apr 5, 2023 •

edited

Loading

abetlen commented Apr 6, 2023 •

edited

Loading