Skip to content

llama : add retrieval example #5692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ggerganov opened this issue Feb 23, 2024 · 10 comments · Fixed by #6193
Closed

llama : add retrieval example #5692

ggerganov opened this issue Feb 23, 2024 · 10 comments · Fixed by #6193
Labels

Comments

@ggerganov
Copy link
Member

ggerganov commented Feb 23, 2024

Since we now support embedding models in llama.cpp we should add a simple example to demonstrate retrieval functionality. Here is how it should work:

  • load a set of text files (provided from the command line)
  • split the text into chunks of user-configurable size, each chunk ending on a configurable stop string
  • embed all chunks using an embedding model (BERT / SBERT)
  • receive input from the command line, embed it and display the top N most relevant chunks based on cosine similarity between the input and chunk emebeddings
@ggerganov ggerganov added the good first issue Good for newcomers label Feb 23, 2024
@ggerganov ggerganov moved this to Todo in ggml : roadmap Feb 23, 2024
@devilkadabra69
Copy link

i am interested in this
so i have to make a cpp file to demonstrate retrieval example ?? or something else can you specify pls

@ngxson
Copy link
Collaborator

ngxson commented Feb 23, 2024

That's something I already done in the past, but in another language (not cpp).

If @devilkadabra69 you want to take then you can start with a simple cpp program that #include "llama.h", load the text files (maybe specified by glob ./path/to/folder/*.txt), split them into chunks then calculate the embedding vectors for them.

It's basically the same idea with langchain text splitter, but in cpp. The max number of tokens per chunk will be specified via CLI argument.

@ngxson
Copy link
Collaborator

ngxson commented Feb 25, 2024

@devilkadabra69 Can you confirm if you gonna do this?

@phymbert
Copy link
Collaborator

Would that be interesting to also include generation in the example ? then we will have a complete RAG example.

@ngxson
Copy link
Collaborator

ngxson commented Feb 26, 2024

Would that be interesting to also include generation in the example ? then we will have a complete RAG example.

Just personal my opinion: I don't think it's needed, because the goal of each example is to show case one feature at a time, but not many of them at once. Having one example that can do multiple things may make it difficult to maintain in long term (for example, when the library introduce a breaking change).

Also, if listed in details what we want for retrieval, I think it's already quite a lot of tasks:

  • What about customizable chunk splitter? For example, in some languages you cannot split the sentence in the middle, it will completely alter the meaning of that sentence. Also if you split by sentence, then not all language use the same period character (for example, chinese uses instead of .). What about chunk overlapping? ...
  • Do we want to use a database vector implementation? hnswlib is lightweight enough, but do we really need one?
  • What about caching? Do we want to save the embedding vector somewhere and reuse it later?

@devilkadabra69
Copy link

@ngxson if you want you can do it

@ngxson
Copy link
Collaborator

ngxson commented Feb 28, 2024

@devilkadabra69 currently I don't have time to do that, just ask so that other people who want to take can start working

@foldl
Copy link
Contributor

foldl commented Mar 2, 2024

I have made one in ChatLLM.cpp:

https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

@ggerganov
Copy link
Member Author

@foldl Nice!

Do I understand correctly that the ReRanker part is a more advanced way for searching for the top embeddings in the database (for example compared to simple cosine similarity metric)?

Btw, for people looking to work on this example, here we are interested only in generating the embeddings and searching in them. The full RAG will be demonstrated in further examples

@foldl
Copy link
Contributor

foldl commented Mar 2, 2024

@ggerganov Yes. ReRanker gives a float point score for each question and text pair. The higher the score, the text is more likely to contain the answer to the question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants