-
Notifications
You must be signed in to change notification settings - Fork 11.9k
llama : add retrieval
example
#5692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
i am interested in this |
That's something I already done in the past, but in another language (not cpp). If @devilkadabra69 you want to take then you can start with a simple cpp program that It's basically the same idea with langchain text splitter, but in cpp. The max number of tokens per chunk will be specified via CLI argument. |
@devilkadabra69 Can you confirm if you gonna do this? |
Would that be interesting to also include generation in the example ? then we will have a complete RAG example. |
Just personal my opinion: I don't think it's needed, because the goal of each example is to show case one feature at a time, but not many of them at once. Having one example that can do multiple things may make it difficult to maintain in long term (for example, when the library introduce a breaking change). Also, if listed in details what we want for
|
@ngxson if you want you can do it |
@devilkadabra69 currently I don't have time to do that, just ask so that other people who want to take can start working |
I have made one in ChatLLM.cpp: https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md |
@foldl Nice! Do I understand correctly that the ReRanker part is a more advanced way for searching for the top embeddings in the database (for example compared to simple cosine similarity metric)? Btw, for people looking to work on this example, here we are interested only in generating the embeddings and searching in them. The full RAG will be demonstrated in further examples |
@ggerganov Yes. ReRanker gives a float point score for each question and text pair. The higher the score, the text is more likely to contain the answer to the question. |
Since we now support embedding models in
llama.cpp
we should add a simple example to demonstrate retrieval functionality. Here is how it should work:The text was updated successfully, but these errors were encountered: