-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fixes and Tweaks to Defaults #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s relating to TypeDict or subclass() if the version is too old or new...
…when n_ctx was missing and n_batch was 2048.
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
Thank you! With regards to pydantic did you install with |
With that command it does properly install pydantic. Will it still get pydantic if I clone the repo and do |
Unfortunately, I think with |
@@ -19,6 +19,7 @@ | |||
entry_points={"console_scripts": ["llama_cpp.server=llama_cpp.server:main"]}, | |||
install_requires=[ | |||
"typing-extensions>=4.5.0", | |||
"pydantic==1.10.7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert this for now, we can fix properly when we migrate from setup.py. For now it should only effect developers who are also working on the server (very small number of people) vs. requiring pydantic for every install (would effect all build steps for example).
Wow. It took awhile to figure out that mlock was silently causing the model to fail without an exception. 😢 haha
It's awesome once I got it working though. I still need to look into where it puts the built library and where it searches for it. I think it's searching in the same directory as the executable, but I might have had to manually move it around to get it happy. Don't know why this is.
Summary:
Added
pydantic
as a dependency insetup.py
. The code was previously throwing errors whenpydantic
was not present.❗ This might also need to be done for scikit-build. I installed it manually at the very start. You probably know more about the best place to do it.
Set the default value of
n_batch
to 8 in bothexamples/high_level_api/fastapi_server.py
andllama_cpp/server/__main__.py
. This replaces the previous default value of 2048.Reduced the default value of
n_threads
to half of the available CPU count in bothexamples/high_level_api/fastapi_server.py
andllama_cpp/server/__main__.py
. This is to protect against locking up the system, and I usually only run 1/3 of available threads. Users can always turn it up, but 100 is kind of shocking. More details in actual code.Disabled the use of
mlock
by default in bothexamples/high_level_api/fastapi_server.py
andllama_cpp/server/__main__.py
. The previous setting was causing silent failures on platforms that don't supportmlock
, such as Windows. We can either check if it's supported on the platform, or allow users to enable manually, but I don't think it's recommended by some people in llama.cpp discussions.Updated
.gitignore
to ignore the./idea
folder.For additional info, and just future reference, this is how mlock on an unsupported platform fails.