-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Standalone Server #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just a note, I found a package fastapi-code-generator that you can put the OpenAI OpenAPI spec into and it will generate a server skeleton with the correct models and endpoints. Similarly, there are packages that can create test cases for the endpoints based off the API spec. This might save some time and we can return a not implemented error for endpoints that our server doesn't support. |
With the latest commit we now handle all the request parameters for the Last step is really just to bundle this into the PyPI package as a subpackage so it can be installed with |
Awesome work! Just FYI, Llama CPP just got some major bug fixes that improves performance in the last hour. There should no longer be a performance degradation as the context size increases. Hopefully this translates into better performance for us too 🔥 |
Awesome, I'll update the package! |
@MillionthOdin16 pushed the updated llama.cpp and the standalone server. Do you mind testing it for me? Just update from pip and run |
One extra note on usability, I think it would be nice to pass in the model (and eventually model folder) as an argument to llama_cpp.server instead of using an env var. Would make it more similar to other usages I think |
Next steps on the server (no particular order):
I'll close this issue and spin these out individually |
I want to save chat log. |
Since the server is one of the goals / highlights of this project. I'm planning to move it into a subpackage e.g.
llama-cpp-python[server]
or something like that.Work that needs to be done first:
/v1/models
endpointTest OpenAI client librariesFuture work
The text was updated successfully, but these errors were encountered: