[doc] Add RAG Integration example #17692

reidliu41 · 2025-05-06T03:46:35Z

RAG (Retrieval-Augmented Generation) enhances LLMs by retrieving relevant context from external sources,
improving factual accuracy and grounding. It's widely adopted in modern LLM applications.
This PR introduces basic RAG example using:

vLLM + LangChain + Milvus
vLLM + LlamaIndex + Milvus

Signed-off-by: reidliu41 <[email protected]>

github-actions · 2025-05-06T03:46:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

examples/online_serving/retrieval_augmented_generation_with_langchain.py

Signed-off-by: reidliu41 <[email protected]>

docs/source/deployment/frameworks/retrieval_augmented_generation.md

Signed-off-by: reidliu41 <[email protected]>

DarkLight1337 · 2025-05-06T08:59:44Z

cc @hmellor there seems to be some indentation errors even though this PR doesn't change them

hmellor · 2025-05-06T09:07:36Z

Those indentation errors (which should be solved) are harmless for the build.

The main issue is that:

:::{argparse}
:module: examples.online_serving.retrieval_augmented_generation_with_langchain
:func: get_parser
:prog: retrieval_augmented_generation_with_langchain.py
:::

requires the imports from that module to either be mocked (docs/source/conf.py) or installed (requirements/docs.txt).

reidliu41 · 2025-05-06T09:15:33Z

@hmellor yeah, seems that, thanks
@DarkLight1337 seems cannot import, maybe rollback the previous command output? seems not good to change some settings/configs for the examples.

DarkLight1337 · 2025-05-06T09:24:48Z

We can add langchain and llamaindex to the dependencies to mock inside conf.py

Signed-off-by: reidliu41 <[email protected]>

reidliu41 · 2025-05-06T13:05:55Z


[2025-05-06T13:03:22Z] Warning, treated as error:
--
  | [2025-05-06T13:03:22Z] /vllm-workspace/test_docs/docs/source/deployment/frameworks/retrieval_augmented_generation.md:42:Failed to import "get_parser" from "examples.online_serving.retrieval_augmented_generation_with_langchain".
  | [2025-05-06T13:03:22Z] No module named 'examples.online_serving'
  | [2025-05-06T13:03:45Z] make: *** [Makefile:20: html] Error 2
  | [2025-05-06T13:03:46Z] 🚨 Error: The command exited with status 2

still failed...

reidliu41 · 2025-05-06T13:16:40Z

@DarkLight1337 maybe just simply remove it or just roll back??

DarkLight1337 · 2025-05-06T13:33:10Z

I prefer removing the help text if you can't get it to work, so we don't have to worry about the help text getting out of sync with the actual code

Signed-off-by: reidliu41 <[email protected]>

reidliu41 · 2025-05-06T14:20:14Z

ok, thanks

DarkLight1337

Thanks for your effort and patience!

* [Model] Add GraniteMoeHybrid 4.0 model (vllm-project#17497) Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> * [easy] Fix logspam on PiecewiseBackend errors (vllm-project#17138) Signed-off-by: rzou <[email protected]> * [Bugfix] Fixed prompt length for random dataset (vllm-project#17408) Signed-off-by: Mikhail Podvitskii <[email protected]> * [Doc] Update notes for H2O-VL and Gemma3 (vllm-project#17219) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Fix ScalarType float4 naming (vllm-project#17690) Signed-off-by: Lucas Wilkinson <[email protected]> * Fix `dockerfilegraph` pre-commit hook (vllm-project#17698) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Fix triton import with local TritonPlaceholder (vllm-project#17446) Signed-off-by: Mengqing Cao <[email protected]> * [V1] Enable TPU V1 backend by default (vllm-project#17673) Signed-off-by: mgoin <[email protected]> * [V1][PP] Support PP for MultiprocExecutor (vllm-project#14219) Signed-off-by: jiang1.li <[email protected]> Signed-off-by: jiang.li <[email protected]> * [v1] AttentionMetadata for each layer (vllm-project#17394) Signed-off-by: Chen Zhang <[email protected]> * [Feat] Add deprecated=True to CLI args (vllm-project#17426) Signed-off-by: Aaron Pham <[email protected]> * [Docs] Use gh-file to add links to tool_calling.md (vllm-project#17709) Signed-off-by: windsonsea <[email protected]> * [v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (vllm-project#17479) Signed-off-by: Chen Zhang <[email protected]> * [doc] Add RAG Integration example (vllm-project#17692) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix modality limits in vision language example (vllm-project#17721) Signed-off-by: DarkLight1337 <[email protected]> * Make right sidebar more readable in "Supported Models" (vllm-project#17723) Signed-off-by: Harry Mellor <[email protected]> * [TPU] Increase block size and reset block shapes (vllm-project#16458) * [Misc] Add Next Edit Prediction (NEP) datasets support in `benchmark_serving.py` (vllm-project#16839) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> * [Bugfix] Fix for the condition to accept empty encoder inputs for mllama (vllm-project#17732) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (vllm-project#16828) Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> --------- Signed-off-by: Thomas Ortner <[email protected]> Signed-off-by: Stanislaw Wozniak <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: Mikhail Podvitskii <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: jiang1.li <[email protected]> Signed-off-by: jiang.li <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Signed-off-by: dtransposed <> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: [email protected] <[email protected]> Co-authored-by: Stan Wozniak <[email protected]> Co-authored-by: Thomas Ortner <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Mikhail Podvitskii <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Li, Jiang <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: Jevin Jiang <[email protected]> Co-authored-by: d.transposed <[email protected]> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: Thomas Parnell <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>

Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>

[doc] Add RAG Integration example

7ab9942

Signed-off-by: reidliu41 <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label May 6, 2025

DarkLight1337 reviewed May 6, 2025

View reviewed changes

examples/online_serving/retrieval_augmented_generation_with_langchain.py Show resolved Hide resolved

update llamaindex with config

dbe5db5

Signed-off-by: reidliu41 <[email protected]>

DarkLight1337 reviewed May 6, 2025

View reviewed changes

docs/source/deployment/frameworks/retrieval_augmented_generation.md Outdated Show resolved Hide resolved

auto generate help

0306fb6

Signed-off-by: reidliu41 <[email protected]>

reidliu41 force-pushed the add-rag branch from e34397e to 0306fb6 Compare May 6, 2025 08:02

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label May 6, 2025

Merge remote-tracking branch 'upstream/main' into add-rag

d0ba49e

reidliu41 added 3 commits May 6, 2025 17:31

add mock imports

ef033cc

Signed-off-by: reidliu41 <[email protected]>

add missing mock imports

7acaf4a

Signed-off-by: reidliu41 <[email protected]>

correct the name

9370c3d

Signed-off-by: reidliu41 <[email protected]>

remove help text

0ab9508

Signed-off-by: reidliu41 <[email protected]>

DarkLight1337 approved these changes May 6, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) May 6, 2025 14:22

DarkLight1337 merged commit 7525d5f into vllm-project:main May 6, 2025
32 checks passed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[doc] Add RAG Integration example (vllm-project#17692)

87db288

Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> Signed-off-by: Mu Huai <[email protected]>

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request May 14, 2025

[doc] Add RAG Integration example (vllm-project#17692)

8692892

Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] Add RAG Integration example #17692

[doc] Add RAG Integration example #17692

reidliu41 commented May 6, 2025 •

edited by github-actions bot

Loading

github-actions bot commented May 6, 2025

DarkLight1337 commented May 6, 2025

hmellor commented May 6, 2025

reidliu41 commented May 6, 2025

DarkLight1337 commented May 6, 2025

reidliu41 commented May 6, 2025

reidliu41 commented May 6, 2025

DarkLight1337 commented May 6, 2025 •

edited

Loading

reidliu41 commented May 6, 2025

DarkLight1337 left a comment

[doc] Add RAG Integration example #17692

[doc] Add RAG Integration example #17692

Conversation

reidliu41 commented May 6, 2025 • edited by github-actions bot Loading

github-actions bot commented May 6, 2025

DarkLight1337 commented May 6, 2025

hmellor commented May 6, 2025

reidliu41 commented May 6, 2025

DarkLight1337 commented May 6, 2025

reidliu41 commented May 6, 2025

reidliu41 commented May 6, 2025

DarkLight1337 commented May 6, 2025 • edited Loading

reidliu41 commented May 6, 2025

DarkLight1337 left a comment

Choose a reason for hiding this comment

reidliu41 commented May 6, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented May 6, 2025 •

edited

Loading