llama : add `retrieval` example #5692

ggerganov · 2024-02-23T18:46:29Z

Since we now support embedding models in llama.cpp we should add a simple example to demonstrate retrieval functionality. Here is how it should work:

load a set of text files (provided from the command line)
split the text into chunks of user-configurable size, each chunk ending on a configurable stop string
embed all chunks using an embedding model (BERT / SBERT)
receive input from the command line, embed it and display the top N most relevant chunks based on cosine similarity between the input and chunk emebeddings

The text was updated successfully, but these errors were encountered:

devilkadabra69 · 2024-02-23T19:21:57Z

i am interested in this
so i have to make a cpp file to demonstrate retrieval example ?? or something else can you specify pls

ngxson · 2024-02-23T19:41:25Z

That's something I already done in the past, but in another language (not cpp).

If @devilkadabra69 you want to take then you can start with a simple cpp program that #include "llama.h", load the text files (maybe specified by glob ./path/to/folder/*.txt), split them into chunks then calculate the embedding vectors for them.

It's basically the same idea with langchain text splitter, but in cpp. The max number of tokens per chunk will be specified via CLI argument.

ngxson · 2024-02-25T15:17:13Z

@devilkadabra69 Can you confirm if you gonna do this?

phymbert · 2024-02-26T16:53:50Z

Would that be interesting to also include generation in the example ? then we will have a complete RAG example.

ngxson · 2024-02-26T17:32:13Z

Would that be interesting to also include generation in the example ? then we will have a complete RAG example.

Just personal my opinion: I don't think it's needed, because the goal of each example is to show case one feature at a time, but not many of them at once. Having one example that can do multiple things may make it difficult to maintain in long term (for example, when the library introduce a breaking change).

Also, if listed in details what we want for retrieval, I think it's already quite a lot of tasks:

What about customizable chunk splitter? For example, in some languages you cannot split the sentence in the middle, it will completely alter the meaning of that sentence. Also if you split by sentence, then not all language use the same period character (for example, chinese uses 。 instead of .). What about chunk overlapping? ...
Do we want to use a database vector implementation? hnswlib is lightweight enough, but do we really need one?
What about caching? Do we want to save the embedding vector somewhere and reuse it later?

devilkadabra69 · 2024-02-28T19:24:09Z

@ngxson if you want you can do it

ngxson · 2024-02-28T20:30:33Z

@devilkadabra69 currently I don't have time to do that, just ask so that other people who want to take can start working

foldl · 2024-03-02T04:15:27Z

I have made one in ChatLLM.cpp:

https://github.com/foldl/chatllm.cpp/blob/master/docs/rag.md

ggerganov · 2024-03-02T13:38:21Z

@foldl Nice!

Do I understand correctly that the ReRanker part is a more advanced way for searching for the top embeddings in the database (for example compared to simple cosine similarity metric)?

Btw, for people looking to work on this example, here we are interested only in generating the embeddings and searching in them. The full RAG will be demonstrated in further examples

foldl · 2024-03-02T14:08:13Z

@ggerganov Yes. ReRanker gives a float point score for each question and text pair. The higher the score, the text is more likely to contain the answer to the question.

ggerganov added the good first issue Good for newcomers label Feb 23, 2024

ggerganov added this to ggml : roadmap Feb 23, 2024

ggerganov moved this to Todo in ggml : roadmap Feb 23, 2024

ggerganov added the examples label Feb 23, 2024

mscheong01 mentioned this issue Mar 21, 2024

add retrieval example #6193

Merged

ggerganov closed this as completed in #6193 Mar 25, 2024

ggerganov moved this from Todo to Done in ggml : roadmap Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add `retrieval` example #5692

llama : add `retrieval` example #5692

ggerganov commented Feb 23, 2024 •

edited

Loading

devilkadabra69 commented Feb 23, 2024

ngxson commented Feb 23, 2024

ngxson commented Feb 25, 2024

phymbert commented Feb 26, 2024

ngxson commented Feb 26, 2024 •

edited

Loading

devilkadabra69 commented Feb 28, 2024

ngxson commented Feb 28, 2024

foldl commented Mar 2, 2024

ggerganov commented Mar 2, 2024

foldl commented Mar 2, 2024

llama : add retrieval example #5692

llama : add retrieval example #5692

Comments

ggerganov commented Feb 23, 2024 • edited Loading

devilkadabra69 commented Feb 23, 2024

ngxson commented Feb 23, 2024

ngxson commented Feb 25, 2024

phymbert commented Feb 26, 2024

ngxson commented Feb 26, 2024 • edited Loading

devilkadabra69 commented Feb 28, 2024

ngxson commented Feb 28, 2024

foldl commented Mar 2, 2024

ggerganov commented Mar 2, 2024

foldl commented Mar 2, 2024

llama : add `retrieval` example #5692

llama : add `retrieval` example #5692

ggerganov commented Feb 23, 2024 •

edited

Loading

ngxson commented Feb 26, 2024 •

edited

Loading