-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
[Bugfix] add qwen3 reasoning-parser fix content is None when disable … #17369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] add qwen3 reasoning-parser fix content is None when disable … #17369
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
e8e6b42
to
a28caaf
Compare
a28caaf
to
a4063c0
Compare
Thanks for adding this, can you add some tests to verify the fix? |
a4063c0
to
852ca12
Compare
Thanks for the feedback! I have already added tests to verify the fix. Please let me know if you need any additional tests or if there’s anything else I should improve. |
Thanks a lot! Looking forward to merge. |
…thinking (vllm-project#17357) Signed-off-by: mofanke <[email protected]>
852ca12
to
7d4031b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
vllm-project#17369) Signed-off-by: mofanke <[email protected]>
I think there might be an issue with this PR implementation. I used the following test case: deepseek_r1: vllm serve Qwen/Qwen3-8B --enable-reasoning --reasoning-parser deepseek_r1 --guided-decoding-backend xgrammar vllm serve Qwen/Qwen3-8B --enable-reasoning --reasoning-parser qwen3 --guided-decoding-backend xgrammar
client: from pydantic import BaseModel
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "Bearer skxx"
openai_api_base = "http://localhost:8000/v1"
class Step(BaseModel):
ground_truth_key_ideas: str
system_response_key_ideas: str
discussion: str
recall: float
precision: float
if __name__ == '__main__':
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
# client.chat.completions.create
json_schema = Step.model_json_schema()
chat_response = client.beta.chat.completions.parse(
model="",
messages=[
{'role': 'system',
'content': 'Your input fields are:\n1. `question` (str)\n2. `ground_truth` (str)\n3. `system_response` (str)\n\nYour output fields are:\n1. `ground_truth_key_ideas` (str): enumeration of key ideas in the ground truth\n2. `system_response_key_ideas` (str): enumeration of key ideas in the system response\n3. `discussion` (str): discussion of the overlap between ground truth and system response\n4. `recall` (float): fraction (out of 1.0) of ground truth covered by the system response\n5. `precision` (float): fraction (out of 1.0) of system response covered by the ground truth\n\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\nInputs will have the following structure:\n\n[[ ## question ## ]]\n{question}\n\n[[ ## ground_truth ## ]]\n{ground_truth}\n\n[[ ## system_response ## ]]\n{system_response}\n\nOutputs will be a JSON object with the following fields.\n\n{\n "ground_truth_key_ideas": "{ground_truth_key_ideas}",\n "system_response_key_ideas": "{system_response_key_ideas}",\n "discussion": "{discussion}",\n "recall": "{recall} # note: the value you produce must be a single float value",\n "precision": "{precision} # note: the value you produce must be a single float value"\n}\n\nIn adhering to this structure, your objective is: \n Compare a system\'s response to the ground truth to compute recall and precision of key ideas.\n You will first enumerate key ideas in each response, discuss their overlap, and then report recall and precision.'},
{'role': 'user',
'content': '[[ ## question ## ]]\nWhy is the action space of language modeling particularly large? Is it because of the vocab size? But then, moving in the real world also has a huge action space (degrees of movement).\n\n[[ ## ground_truth ## ]]\nThe action space for language modeling is equal to the vocabulary set of language models. Since the vocabularies are very large (i.e. tens of thousands of possible tokens), the action space is also very large. In general, locomotion in the real world can be condensed to three quantities - moving across X, Y or Z axes, or a linear combination thereof. The authors mention that typical RL problems have an action space that is an order of magnitude smaller, but do not specifically explain how the action spaces for typical problems is modeled or constructed.\n\n[[ ## system_response ## ]]\nThe action space in language modeling is indeed particularly large due to the **vocabulary size**—at each step, the model must choose from thousands of possible words (e.g., 50k+ tokens in large vocabularies), leading to combinatorial explosion in sequence generation. This discrete, high-dimensional choice is distinct from real-world continuous action spaces (e.g., motor control with infinite degrees of movement). However, the challenges differ: language models face **discrete, high-cardinality decisions** with combinatorial complexity, while real-world actions often involve **continuous control**. Techniques like actor-critic methods (e.g., Bahdanau et al. 2016) or action space reduction (e.g., GALAD) address the former by managing variance and exploration in discrete, large vocabularies, whereas real-world control typically uses gradient-based methods for continuous spaces.\n\nRespond with a JSON object in the following order of fields: `ground_truth_key_ideas`, then `system_response_key_ideas`, then `discussion`, then `recall` (must be formatted as a valid Python float), then `precision` (must be formatted as a valid Python float).'}
],
temperature=0.0,
extra_body={"chat_template_kwargs": {"enable_thinking": True}, "guided_json": json_schema},
)
print("Chat response:", chat_response)
s = Step.parse_raw(chat_response.choices[0].message.reasoning_content)
print("-----", s.system_response_key_ideas)
result: deepseek_r1: Chat response: ParsedChatCompletion[NoneType](id='chatcmpl-c8ac33157c6a46aa91adede0f1f36b06', choices=[ParsedChoice[NoneType](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[NoneType](content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=None, reasoning_content='{\n "ground_truth_key_ideas": "1. The action space in language modeling equals the vocabulary size, which is large (tens of thousands of tokens). 2. Real-world locomotion can be condensed to three axes (X, Y, Z) or their combinations. 3. The authors note that typical RL problems have action spaces an order of magnitude smaller than language modeling.",\n "system_response_key_ideas": "1. The action space in language modeling is large due to high vocabulary size (e.g., 50k+ tokens). 2. This leads to combinatorial explosion in sequence generation. 3. Language models face discrete, high-cardinality decisions with combinatorial complexity. 4. Real-world actions involve continuous control (e.g., motor control with infinite degrees of movement). 5. Techniques like actor-critic methods and action space reduction address the challenges in language modeling.",\n "discussion": "The system response aligns with the ground truth on the vocabulary size as the primary reason for the large action space in language modeling. Both mention the combinatorial complexity due to high vocabulary. However, the system response adds details about discrete vs. continuous action spaces and specific techniques to address the challenges, which are not present in the ground truth. The ground truth includes the point about real-world locomotion being condensed to three axes, which the system response does not explicitly mention.",\n "recall": 0.6,\n "precision": 0.75\n}'), stop_reason=None)], created=1746001853, model='Qwen/Qwen3-8B', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=309, prompt_tokens=766, total_tokens=1075, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)
----- 1. The action space in language modeling is large due to high vocabulary size (e.g., 50k+ tokens). 2. This leads to combinatorial explosion in sequence generation. 3. Language models face discrete, high-cardinality decisions with combinatorial complexity. 4. Real-world actions involve continuous control (e.g., motor control with infinite degrees of movement). 5. Techniques like actor-critic methods and action space reduction address the challenges in language modeling. qwen3: Chat response: ParsedChatCompletion[NoneType](id='chatcmpl-7b079ebfa7ef4c9e87779bcb6cfffccd', choices=[ParsedChoice[NoneType](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[NoneType](content='{\n "ground_truth_key_ideas": "1. The action space in language modeling equals the vocabulary size, which is large (tens of thousands of tokens). 2. Real-world locomotion can be condensed to three axes (X, Y, Z) or their combinations. 3. The authors note that typical RL problems have action spaces an order of magnitude smaller than language modeling.",\n "system_response_key_ideas": "1. The action space in language modeling is large due to high vocabulary size (e.g., 50k+ tokens). 2. This leads to combinatorial explosion in sequence generation. 3. Language models face discrete, high-cardinality decisions with combinatorial complexity. 4. Real-world actions involve continuous control (e.g., motor control with infinite degrees of movement). 5. Techniques like actor-critic methods and action space reduction address the challenges in language modeling.",\n "discussion": "The system response aligns with the ground truth on the vocabulary size as the primary reason for the large action space in language modeling. Both mention the combinatorial complexity due to high vocabulary. However, the system response adds details about discrete vs. continuous action spaces and specific techniques to address the challenges, which are not present in the ground truth. The ground truth includes the point about real-world locomotion being condensed to three axes, which the system response does not explicitly mention.",\n "recall": 0.6,\n "precision": 0.75\n}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=None, reasoning_content=None), stop_reason=None)], created=1746002026, model='Qwen/Qwen3-8B', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=309, prompt_tokens=766, total_tokens=1075, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.12/site-packages/pydantic/main.py", line 1187, in parse_raw
obj = parse.load_str_bytes(
^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/pydantic/deprecated/parse.py", line 49, in load_str_bytes
return json_loads(b) # type: ignore
^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/json/__init__.py", line 339, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/vllm/test14.py", line 35, in <module>
s = Step.parse_raw(chat_response.choices[0].message.reasoning_content)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/anaconda3/lib/python3.12/site-packages/pydantic/main.py", line 1214, in parse_raw
raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
pydantic_core._pydantic_core.ValidationError: 1 validation error for Step
__root__
the JSON object must be str, bytes or bytearray, not NoneType [type=type_error, input_value=None, input_type=NoneType]
|
The root cause is that it incorrectly assumes the current mode is not reasoning mode, but I have indeed enabled reasoning mode. However, the model's output was formatted into JSON by xgrammar, leading the qwen3-reasoning-parser to mistakenly believe that the current mode is not reasoning mode. vllm/vllm/reasoning/qwen3_reasoning_parser.py Lines 114 to 117 in a39203f
@DarkLight1337 @mofanke @YorkSu WDYT? |
vllm/vllm/model_executor/guided_decoding/xgrammar_decoding.py Lines 345 to 353 in ece5a8b
vllm/vllm/reasoning/deepseek_r1_reasoning_parser.py Lines 46 to 47 in ece5a8b
However, in the openai entrypoints, ReasoningParser only check if the model output contains vllm/vllm/entrypoints/openai/serving_chat.py Lines 607 to 608 in 1534d38
vllm/vllm/entrypoints/openai/serving_chat.py Lines 623 to 624 in 1534d38
vllm/vllm/entrypoints/openai/serving_chat.py Lines 684 to 685 in 1534d38
|
Try to run some example with guided_json and set |
Thanks for the PR, the commit copied from my fork looks a little outdated. For example, it still uses regex in the @chaunceyjiang You might be interested. |
* Revert "[Misc] Add S3 environment variables for better support of MinIO." (vllm-project#17021) * [misc] tune some env vars for GB200 (vllm-project#16992) Signed-off-by: youkaichao <[email protected]> * [INTEL-HPU][v0] Port delayed sampling to upstream (vllm-project#16949) Signed-off-by: Michal Adamczyk <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Co-authored-by: Michal Adamczyk <[email protected]> * [doc] add download path tips (vllm-project#17013) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Triton FA function takes no keyword arguments (vllm-project#16902) Signed-off-by: vllmellm <[email protected]> * [V1] Avoid socket errors during shutdown when requests are in in-flight (vllm-project#16807) Signed-off-by: Nick Hill <[email protected]> * [BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#16998) Signed-off-by: Lucas Wilkinson <[email protected]> * [Misc] Improve readability of get_open_port function. (vllm-project#17024) Signed-off-by: gitover22 <[email protected]> * [Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (vllm-project#16964) Signed-off-by: chaunceyjiang <[email protected]> * [CI] Run v1/test_serial_utils.py in CI (vllm-project#16996) Signed-off-by: Russell Bryant <[email protected]> * Mistral-format support for compressed-tensors (vllm-project#16803) Signed-off-by: mgoin <[email protected]> * Categorize `tests/kernels/` based on kernel type (vllm-project#16799) Signed-off-by: mgoin <[email protected]> * [Doc] Add top anchor and a note to quantization/bitblas.md (vllm-project#17042) Signed-off-by: windsonsea <[email protected]> * Ensure that `pid` passed to `kill_process_tree` is `int` for `mypy` (vllm-project#17051) Signed-off-by: Harry Mellor <[email protected]> * [CI] Update structured-output label automation (vllm-project#17055) Signed-off-by: Russell Bryant <[email protected]> * Improve Transformers backend model loading QoL (vllm-project#17039) Signed-off-by: Harry Mellor <[email protected]> * `CacheConfig.block_size` should always be `int` when used (vllm-project#17052) Signed-off-by: Harry Mellor <[email protected]> * Use `@property` and private field for `data_parallel_rank_local` (vllm-project#17053) Signed-off-by: Harry Mellor <[email protected]> * [Frontend] Support guidance:no-additional-properties for compatibility with xgrammar (vllm-project#15949) Signed-off-by: Travis Johnson <[email protected]> * [BugFix][V1] Fix int32 token index overflow when preparing input ids (vllm-project#16806) * [V1][Spec Decode] Always use argmax for sampling draft tokens (vllm-project#16899) Signed-off-by: Woosuk Kwon <[email protected]> * [CI/Build] workaround for CI build failure (vllm-project#17070) Signed-off-by: csy1204 <[email protected]> Co-authored-by: Michael Goin <[email protected]> * [Quantization]add prefix for commandA quantized model (vllm-project#17017) * [Minor] Use larger batch sizes for A100/B100/B200/MI300x (vllm-project#17073) Signed-off-by: Woosuk Kwon <[email protected]> * [Bugfix] Enable V1 usage stats (vllm-project#16986) Signed-off-by: mgoin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Nick Hill <[email protected]> * More informative error when using Transformers backend (vllm-project#16988) Signed-off-by: Harry Mellor <[email protected]> * Addendum Fix to support FIPS enabled machines with MD5 hashing (vllm-project#17043) Signed-off-by: sydarb <[email protected]> * [Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… (vllm-project#16472) Signed-off-by: 开哲 <[email protected]> Co-authored-by: 开哲 <[email protected]> * [V1] Update structured output (vllm-project#16812) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [doc] update to hyperlink (vllm-project#17096) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Add docs for runai_streamer_sharded (vllm-project#17093) Signed-off-by: Omer Dayan (SW-GPU) <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Chore] Remove Sampler from Model Code (vllm-project#17084) Signed-off-by: Woosuk Kwon <[email protected]> * Disable enforce_eager for V1 TPU sampler and structured output tests (vllm-project#17016) Signed-off-by: mgoin <[email protected]> * Simplify `TokenizerGroup` (vllm-project#16790) Signed-off-by: Harry Mellor <[email protected]> * Fix OOT registration test (vllm-project#17099) Signed-off-by: Harry Mellor <[email protected]> * [V1][PP] Optimization: continue scheduling prefill chunks (vllm-project#17080) Signed-off-by: Rui Qiao <[email protected]> * [Misc] Remove OLMo2 config copy (vllm-project#17066) Signed-off-by: Isotr0py <[email protected]> * Improve static type checking in `LoRAModelRunnerMixin` (vllm-project#17104) Signed-off-by: Harry Mellor <[email protected]> * [V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning (vllm-project#16954) Signed-off-by: shen-shanshan <[email protected]> * [Frontend] Using matryoshka_dimensions control the allowed output dimensions. (vllm-project#16970) * Add missing rocm_skinny_gemms kernel test to CI (vllm-project#17060) Signed-off-by: mgoin <[email protected]> * [Misc] refactor example series - structured outputs (vllm-project#17040) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics (vllm-project#16665) Signed-off-by: Mark McLoughlin <[email protected]> * [CI] Add automation for the `tool-calling` github label (vllm-project#17118) Signed-off-by: Russell Bryant <[email protected]> * Updating builkite job for IBM Power (vllm-project#17111) Signed-off-by: Aaruni Aggarwal <[email protected]> * existing torch installation pip command fix for docs (vllm-project#17059) * Molmo Requirements (vllm-project#17026) Signed-off-by: Eyshika Agarwal <[email protected]> Signed-off-by: eyshika <[email protected]> * Add `:markdownhelp:` to `EngineArgs` docs so markdown docstrings render properly (vllm-project#17124) Signed-off-by: Harry Mellor <[email protected]> * Improve configs - `LoRAConfig` + `PromptAdapterConfig` (vllm-project#16980) Signed-off-by: Harry Mellor <[email protected]> * [Docs] Generate correct github links for decorated functions (vllm-project#17125) Signed-off-by: Russell Bryant <[email protected]> * Add collective_rpc to llm engine (vllm-project#16999) Signed-off-by: Yinghai Lu <[email protected]> * Add chat template for Llama 4 models (vllm-project#16428) Signed-off-by: Max de Bayser <[email protected]> * [Misc] Add example to run DeepSeek with Ray Serve LLM (vllm-project#17134) Signed-off-by: Rui Qiao <[email protected]> * Better error message for missing mistral params.json (vllm-project#17132) Signed-off-by: mgoin <[email protected]> * Use custom address for listening socket (vllm-project#15988) Signed-off-by: Jens Glaser <[email protected]> * [FEAT] [ROCm]: AITER Fused MOE V1 Support (vllm-project#16752) Signed-off-by: vllmellm <[email protected]> Co-authored-by: tjtanaa <[email protected]> * [Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 (vllm-project#16864) Signed-off-by: Lucas Wilkinson <[email protected]> * fix float16 support for kimi-vl (vllm-project#17156) Co-authored-by: zhouzaida <[email protected]> * [Doc] V1 : Update LoRA status (vllm-project#17133) Signed-off-by: varun sundar rabindranath <[email protected]> Co-authored-by: varun sundar rabindranath <[email protected]> * [Docs] Fix True->true in supported_models.md (vllm-project#17141) * Move missed `SchedulerConfig` args into scheduler config group in `EngineArgs` (vllm-project#17131) Signed-off-by: Harry Mellor <[email protected]> * [Misc] Clean up redundant code in uniproc_executor.py (vllm-project#16762) Signed-off-by: Lifu Huang <[email protected]> * [Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (vllm-project#15099) Signed-off-by: Mengqing Cao <[email protected]> * [Misc] Benchmark Serving Script Support Appending Results (vllm-project#17028) Signed-off-by: Lucas Wilkinson <[email protected]> * [Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (vllm-project#16457) Signed-off-by: cynthieye <[email protected]> Co-authored-by: MagnetoWang <[email protected]> * [Bugfix] remove fallback in guided_json (int range, patterns) (vllm-project#16725) Signed-off-by: csy1204 <[email protected]> Co-authored-by: 조상연[플레이스 AI] <[email protected]> * [Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (vllm-project#15734) Signed-off-by: Randall Smith <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Co-authored-by: Luka Govedič <[email protected]> * [Doc] Add headings to improve gptqmodel.md (vllm-project#17164) Signed-off-by: windsonsea <[email protected]> * Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (vllm-project#17158) * [Doc] Add two links to disagg_prefill.md (vllm-project#17168) Signed-off-by: windsonsea <[email protected]> * [Doc] Move todo out of beam search docstring (vllm-project#17183) Signed-off-by: Alex-Brooks <[email protected]> * [Bugfix] Fix mistral model tests (vllm-project#17181) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix Mistral ChatCompletionRequest Body Exception (vllm-project#16769) Signed-off-by: Jasmond Loh <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * Bump Transformers to 4.51.3 (vllm-project#17116) Signed-off-by: Harry Mellor <[email protected]> * Use Transformers helper `get_text_config()` instead of checking for `text_config` (vllm-project#17105) Signed-off-by: Harry Mellor <[email protected]> * [doc] update wrong hf model links (vllm-project#17184) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Inline Molmo requirements (vllm-project#17190) Signed-off-by: DarkLight1337 <[email protected]> * [Security] Use safe serialization and fix zmq setup for mooncake pipe (vllm-project#17192) Signed-off-by: Shangming Cai <[email protected]> Co-authored-by: Shangming Cai <[email protected]> * [V1] Move usage stats to worker and start logging TPU hardware (vllm-project#16211) * [Bugfix] Fix hybrid model tests (vllm-project#17182) Signed-off-by: DarkLight1337 <[email protected]> * Fix Python packaging edge cases (vllm-project#17159) Signed-off-by: Christian Heimes <[email protected]> * [BugFix][Frontend] Fix `LLM.chat()` tokenization (vllm-project#16081) Signed-off-by: Nick Hill <[email protected]> * [V1][Spec Decode] EAGLE-3 Support (vllm-project#16937) Signed-off-by: Bryan Lu <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Co-authored-by: Bryan Lu <[email protected]> * [Misc] Refine ray_serve_deepseek example (vllm-project#17204) Signed-off-by: Rui Qiao <[email protected]> * [Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (vllm-project#17180) Signed-off-by: Chen Zhang <[email protected]> * [AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (vllm-project#17215) Signed-off-by: Randall Smith <[email protected]> * [v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625) * [Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (vllm-project#17142) Signed-off-by: James Wu <[email protected]> * [MISC][AMD] Add unused annotation to rocm kernel file (vllm-project#17097) Signed-off-by: Lu Fang <[email protected]> * [doc] add Anything LLM integration (vllm-project#17216) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Minor][Spec Decode] Add use_eagle to SpeculativeConfig (vllm-project#17213) Signed-off-by: Woosuk Kwon <[email protected]> * [Doc] Minor fix for the vLLM TPU setup page (vllm-project#17206) Signed-off-by: Yarong Mu <[email protected]> * [Minor][Models] Fix Return Types of Llama & Eagle (vllm-project#17220) Signed-off-by: Woosuk Kwon <[email protected]> * Allocate kv_cache with stride order (vllm-project#16605) Signed-off-by: shuw <[email protected]> * [ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (vllm-project#17011) Signed-off-by: charlifu <[email protected]> * [V1][Metrics] Allow V1 AsyncLLM to use custom logger (vllm-project#14661) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]> * [BugFix] Avoid race conditions in zero-copy tensor transmission (vllm-project#17203) Signed-off-by: Nick Hill <[email protected]> * [CI/test] Fix Eagle Correctness Test (vllm-project#17209) Signed-off-by: Woosuk Kwon <[email protected]> * [Core] Remove prompt string from engine core data structures (vllm-project#17214) Signed-off-by: Nick Hill <[email protected]> * [Bugfix] Fix missing int type for `-n` in multi-image example (vllm-project#17223) * [Bugfix] Fix standard models tests (vllm-project#17217) Signed-off-by: DarkLight1337 <[email protected]> * [Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (vllm-project#17186) Signed-off-by: Agata Dobrzyniewicz <[email protected]> * [V1] Add `structural_tag` support using xgrammar (vllm-project#17085) * [BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (vllm-project#17088) Signed-off-by: Andy Xie <[email protected]> * [Chore] added stubs for `vllm_flash_attn` during development mode (vllm-project#17228) Signed-off-by: Aaron Pham <[email protected]> * [Docs] Update structured output doc for V1 (vllm-project#17135) Signed-off-by: Russell Bryant <[email protected]> * [Bugfix] fix error due to an uninitialized tokenizer when using `skip_tokenizer_init` with `num_scheduler_steps` (vllm-project#9276) Signed-off-by: changjun.lee <[email protected]> * Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (vllm-project#16573) Signed-off-by: Lu Fang <[email protected]> * [MISC] rename interval to max_recent_requests (vllm-project#14285) * [Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation (vllm-project#16878) Signed-off-by: imkero <[email protected]> * [Minor] Fix lint error in main branch (vllm-project#17233) Signed-off-by: Woosuk Kwon <[email protected]> * [CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh (vllm-project#16271) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * Update test_flash_attn.py (vllm-project#17102) Signed-off-by: ShuaibinLi <[email protected]> * [Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (vllm-project#12591) Signed-off-by: Randall Smith <[email protected]> * [Misc] Make cached tokenizer pickle-compatible (vllm-project#17048) Signed-off-by: DarkLight1337 <[email protected]> * [Bugfix] Fix QWen2 VL multimodal mapping (vllm-project#17240) Signed-off-by: Jee Jee Li <[email protected]> * [Bugfix] Get a specific type of layer from forward context (vllm-project#17222) Signed-off-by: Chen Zhang <[email protected]> * [MISC] Use string annotation types for class definitions (vllm-project#17244) Signed-off-by: Jade Zheng <[email protected]> * [Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (vllm-project#17033) Signed-off-by: sfc-gh-zhwang <[email protected]> * [Bugfix] Fix Lora Name Parsing (vllm-project#17196) Signed-off-by: Alex-Brooks <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> * [NVIDIA] Support Cutlass MLA for Blackwell GPUs (vllm-project#16032) Signed-off-by: kaixih <[email protected]> * [Feature] support sequence parallelism using compilation pass (vllm-project#16155) Signed-off-by: cascade812 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> * [doc] Add feature status legend (vllm-project#17257) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Metrics] Fix minor inconsistencies in bucket progression (vllm-project#17262) Signed-off-by: DarkLight1337 <[email protected]> * [V1][Spec Decode] Make eagle compatible with prefix caching. (vllm-project#17137) Signed-off-by: LiuXiaoxuanPKU <[email protected]> * [BugFix] Fix vllm_flash_attn install issues (vllm-project#17267) Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Aaron Pham <[email protected]> * [Bugfix] Fix missing ARG in Dockerfile for arm64 platforms (vllm-project#17261) Signed-off-by: lkm-schulz <[email protected]> * [Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (vllm-project#16751) Signed-off-by: Ther-LF <[email protected]> * [Bugfix] Fix Mistral3 spatial merge error (vllm-project#17270) Signed-off-by: mgoin <[email protected]> * [Doc] Fix wrong github link in LMCache examples (vllm-project#17274) Signed-off-by: KuntaiDu <[email protected]> * [Doc] small fix (vllm-project#17277) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Validate `stop_token_ids` contents (vllm-project#17268) Signed-off-by: Nick Hill <[email protected]> * [Minor][Models] Pass partial_rotary_factor parameter to rope (vllm-project#17266) Signed-off-by: evian <[email protected]> Co-authored-by: evian <[email protected]> * [Core] Remove legacy input mapper/processor from V0 (vllm-project#15686) Signed-off-by: DarkLight1337 <[email protected]> * [Model] Add Granite Speech Support (vllm-project#16246) Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> * Update tpu_worker.py 's typo (vllm-project#17288) * Add missing class docstring for `PromptAdapterConfig` (vllm-project#17302) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix] Add missing `get_language_model` to new MLLMs (vllm-project#17300) Signed-off-by: DarkLight1337 <[email protected]> * [doc] update wrong model id (vllm-project#17287) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Misc] Minor typo/grammar in `platforms/interface.py` (vllm-project#17307) Signed-off-by: NickLucche <[email protected]> * [Misc] Clean up Qwen2.5-Omni code (vllm-project#17301) Signed-off-by: DarkLight1337 <[email protected]> * [Docs] Add a security guide (vllm-project#17230) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * Improve conversion from dataclass configs to argparse arguments (vllm-project#17303) Signed-off-by: Harry Mellor <[email protected]> * Make name of `compressed-tensors` quant method consistent across vLLM (vllm-project#17255) Signed-off-by: Harry Mellor <[email protected]> * Explicitly explain quant method override ordering and ensure all overrides are ordered (vllm-project#17256) Signed-off-by: Harry Mellor <[email protected]> * [Security] Don't bind tcp zmq socket to all interfaces (vllm-project#17197) Signed-off-by: Russell Bryant <[email protected]> * [Chore] cleanup license indicators in light of SPDX (vllm-project#17259) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Russell Bryant <[email protected]> * [BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) (vllm-project#17283) Signed-off-by: Lucas Wilkinson <[email protected]> * [Bugfix] Fix moe weight losing all extra attrs after `process_weights_after_loading`. (vllm-project#16854) Signed-off-by: charlifu <[email protected]> * [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <[email protected]> * Support loading transformers models with named parameters (vllm-project#16868) Signed-off-by: Alex <[email protected]> * [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <[email protected]> * [Benchmark] Add single turn MTBench to Serving Bench (vllm-project#17202) * [Optim] Compute multimodal hash only once per item (vllm-project#17314) Signed-off-by: DarkLight1337 <[email protected]> * implement Structural Tag with Guidance backend (vllm-project#17333) Signed-off-by: Michal Moskal <[email protected]> * [V1][Spec Decode] Make Eagle model arch config driven (vllm-project#17323) * [model] make llama4 compatible with pure dense layers (vllm-project#17315) Signed-off-by: Lucia Fang <[email protected]> * [Bugfix] Fix `numel()` downcast in fused_layernorm_dynamic_per_token_quant.cu (vllm-project#17316) * Ignore `'<string>'` filepath (vllm-project#17330) Signed-off-by: rzou <[email protected]> * [Bugfix] Add contiguous call inside rope kernel wrapper (vllm-project#17091) Signed-off-by: 苏政渊 <[email protected]> Co-authored-by: 苏政渊 <[email protected]> * [Misc] Add a Jinja template to support Mistral3 function calling (vllm-project#17195) Signed-off-by: chaunceyjiang <[email protected]> * [Model] support MiniMax-VL-01 model (vllm-project#16328) Signed-off-by: qingjun <[email protected]> * [Misc] Move config fields to MultiModalConfig (vllm-project#17343) Signed-off-by: DarkLight1337 <[email protected]> * [Misc]Use a platform independent interface to obtain the device attributes (vllm-project#17100) * [Fix] Documentation spacing in compilation config help text (vllm-project#17342) Signed-off-by: Zerohertz <[email protected]> * [Build][Bugfix] Restrict setuptools version to <80 (vllm-project#17320) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Model] Ignore rotary embed load for Cohere model (vllm-project#17319) * Update docs requirements (vllm-project#17379) Signed-off-by: Harry Mellor <[email protected]> * [Doc] Fix QWen3MOE info (vllm-project#17381) Signed-off-by: Jee Jee Li <[email protected]> * [Bugfix] Clean up MiniMax-VL and fix processing (vllm-project#17354) Signed-off-by: DarkLight1337 <[email protected]> * `pre-commit autoupdate` (vllm-project#17380) Signed-off-by: Harry Mellor <[email protected]> * [Frontend] Support `chat_template_kwargs` in `LLM.chat` (vllm-project#17356) Signed-off-by: DarkLight1337 <[email protected]> * Transformers backend tweaks (vllm-project#17365) Signed-off-by: Harry Mellor <[email protected]> * Fix: Spelling of inference (vllm-project#17387) * Improve literal dataclass field conversion to argparse argument (vllm-project#17391) Signed-off-by: Harry Mellor <[email protected]> * [V1] Remove num_input_tokens from attn_metadata (vllm-project#17193) Signed-off-by: Chen Zhang <[email protected]> * [Bugfix] add qwen3 reasoning-parser fix content is None when disable … (vllm-project#17369) Signed-off-by: mofanke <[email protected]> * fix gemma3 results all zero (vllm-project#17364) Signed-off-by: mayuyuace <[email protected]> * [Misc][ROCm] Exclude `cutlass_mla_decode` for ROCm build (vllm-project#17289) Signed-off-by: Tianyuan Wu <[email protected]> * Enabling multi-group kernel tests. (vllm-project#17115) Signed-off-by: Alexei V. Ivanov <[email protected]> * [Docs] Propose a deprecation policy for the project (vllm-project#17063) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Doc][Typo] Fixing label in new model requests link in overview.md (vllm-project#17400) * [TPU][V1][CI] Replace `python3 setup.py develop` with standard `pip install --e` on TPU (vllm-project#17374) Signed-off-by: NickLucche <[email protected]> * [CI] Uses Python 3.11 for TPU (vllm-project#17359) Signed-off-by: Aaron Pham <[email protected]> * [CI/Build] Add retry mechanism for add-apt-repository (vllm-project#17107) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [Bugfix] Fix Minicpm-O-int4 GPTQ model inference (vllm-project#17397) Signed-off-by: Isotr0py <[email protected]> * Simplify (and fix) passing of guided decoding backend options (vllm-project#17008) Signed-off-by: Harry Mellor <[email protected]> * Remove Falcon3 2x7B from CI (vllm-project#17404) Signed-off-by: Harry Mellor <[email protected]> * Fix: Python package installation for opentelmetry (vllm-project#17049) Signed-off-by: Dilip Gowda Bhagavan <[email protected]> * [V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (vllm-project#17211) Signed-off-by: Bryan Lu <[email protected]> * Remove Bamba 9B from CI (vllm-project#17407) Signed-off-by: Harry Mellor <[email protected]> * [V1][Feature] Enable Speculative Decoding with Structured Outputs (vllm-project#14702) Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> * [release] Always git fetch all to get latest tag on TPU release (vllm-project#17322) * Truncation control for embedding models (vllm-project#14776) Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Co-authored-by: Max de Bayser <[email protected]> * Update PyTorch to 2.7.0 (vllm-project#16859) * Improve configs - `ModelConfig` (vllm-project#17130) Signed-off-by: Harry Mellor <[email protected]> * Fix call to `logger.info_once` (vllm-project#17416) Signed-off-by: Harry Mellor <[email protected]> * Fix some speculative decode tests with tl.dot (vllm-project#17371) Signed-off-by: Huy Do <[email protected]> * Support LoRA for Mistral3 (vllm-project#17428) Signed-off-by: mgoin <[email protected]> * [Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue (vllm-project#17298) Signed-off-by: Kunshang Ji <[email protected]> * [Hardware][Intel GPU] Upgrade to torch 2.7 (vllm-project#17444) Signed-off-by: Kunshang Ji <[email protected]> Co-authored-by: Qiming Zhang <[email protected]> * [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]> * [MODEL ADDITION] Ovis2 Model Addition (vllm-project#15826) Signed-off-by: Marco <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> * Make the _apply_rotary_emb compatible with dynamo (vllm-project#17435) * [Misc] Remove deprecated files (vllm-project#17447) Signed-off-by: chaunceyjiang <[email protected]> * [V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (vllm-project#15755) Signed-off-by: rongfu.leng <[email protected]> * [TPU][V1][CI] Update regression test baseline for v6 CI (vllm-project#17064) Signed-off-by: NickLucche <[email protected]> * [Core] Prevent side-channel attacks via cache salting (vllm-project#17045) Signed-off-by: Marko Rosenmueller <[email protected]> * [V1][Metrics] add support for kv event publishing (vllm-project#16750) Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> * [Feature] The Qwen3 reasoning parser supports guided decoding (vllm-project#17466) Signed-off-by: chaunceyjiang <[email protected]> * [Docs] Add command for running mypy tests from CI (vllm-project#17475) Signed-off-by: Russell Bryant <[email protected]> * [Fix] Support passing args to logger (vllm-project#17425) Signed-off-by: Aaron Pham <[email protected]> * [Bugfix] Fixed mistral tokenizer path when pointing to file (vllm-project#17457) Signed-off-by: Pete Savage <[email protected]> * [V1] Allow turning off pickle fallback in vllm.v1.serial_utils (vllm-project#17427) Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> * [Docs] Update optimization.md doc (vllm-project#17482) Signed-off-by: mgoin <[email protected]> * [BugFix] Fix authorization of openai_transcription_client.py (vllm-project#17321) Signed-off-by: zh Wang <[email protected]> * [Bugfix][ROCm] Restrict ray version due to a breaking release (vllm-project#17480) Signed-off-by: Gregory Shtrasberg <[email protected]> * [doc] add install tips (vllm-project#17373) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * doc: fix bug report Github template formatting (vllm-project#17486) Signed-off-by: David Xia <[email protected]> * [v1][Spec Decode] Make sliding window compatible with eagle prefix caching (vllm-project#17398) Signed-off-by: Chen Zhang <[email protected]> * Bump Compressed Tensors version to 0.9.4 (vllm-project#17478) Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: mgoin <[email protected]> * [Misc] Rename Audios -> Audio in Qwen2audio Processing (vllm-project#17507) Signed-off-by: Alex-Brooks <[email protected]> * [CI][TPU] Skip Multimodal test (vllm-project#17488) Signed-off-by: Siyuan Liu <[email protected]> * [Bugfix][ROCm] Fix import error on ROCm (vllm-project#17495) Signed-off-by: Gregory Shtrasberg <[email protected]> * [Bugfix] Temporarily disable gptq_bitblas on ROCm (vllm-project#17411) Signed-off-by: Yan Cangang <[email protected]> * [CI][TPU] Skip structured outputs+spec decode tests on TPU (vllm-project#17510) Signed-off-by: mgoin <[email protected]> * [CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (vllm-project#17500) Signed-off-by: mgoin <[email protected]> * [CI/Build] Reorganize models tests (vllm-project#17459) Signed-off-by: DarkLight1337 <[email protected]> * FIxing the AMD test failures caused by PR#16457 (vllm-project#17511) Signed-off-by: Alexei V. Ivanov <[email protected]> * [Build] Require setuptools >= 77.0.3 for PEP 639 (vllm-project#17389) Signed-off-by: Russell Bryant <[email protected]> * [ROCm] Effort to reduce the number of environment variables in command line (vllm-project#17229) Signed-off-by: Hongxia Yang <[email protected]> * [BugFix] fix speculative decoding memory leak when speculation is disabled (vllm-project#15506) Signed-off-by: Noah Yoshida <[email protected]> * [BugFix] Fix mla cpu - missing 3 required positional arguments (vllm-project#17494) Signed-off-by: Lucas Wilkinson <[email protected]> * Avoid overwriting vllm_compile_cache.py (vllm-project#17418) Signed-off-by: Keyun Tong <[email protected]> * [Core] Enable IPv6 with vllm.utils.make_zmq_socket() (vllm-project#16506) Signed-off-by: Russell Bryant <[email protected]> * [Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content (vllm-project#17515) Signed-off-by: chaunceyjiang <[email protected]> * Improve configs - `ObservabilityConfig` (vllm-project#17453) Signed-off-by: Harry Mellor <[email protected]> * [Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model (vllm-project#17285) Signed-off-by: Teruaki Ishizaki <[email protected]> * [Frontend] Show progress bar for adding requests (vllm-project#17525) Signed-off-by: DarkLight1337 <[email protected]> * [Misc] Clean up test docstrings and names (vllm-project#17521) Signed-off-by: DarkLight1337 <[email protected]> * [FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X (vllm-project#17530) Signed-off-by: tjtanaa <[email protected]> * Fix more broken speculative decode tests (vllm-project#17450) Signed-off-by: Huy Do <[email protected]> * [doc] add streamlit integration (vllm-project#17522) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> * [FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config (vllm-project#17535) Signed-off-by: tjtanaa <[email protected]> * [Feature][Frontend]: Deprecate --enable-reasoning (vllm-project#17452) Signed-off-by: chaunceyjiang <[email protected]> * [ROCm] remove unsupported archs from rocm triton flash-attention supported list (vllm-project#17536) Signed-off-by: Hongxia Yang <[email protected]> * [torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (vllm-project#10867) Signed-off-by: Sage Moore <[email protected]> * [Misc] refactor example - cpu_offload_lmcache (vllm-project#17460) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]> --------- Signed-off-by: youkaichao <[email protected]> Signed-off-by: Michal Adamczyk <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: reidliu41 <[email protected]> Signed-off-by: vllmellm <[email protected]> Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: gitover22 <[email protected]> Signed-off-by: chaunceyjiang <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: windsonsea <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: csy1204 <[email protected]> Signed-off-by: sydarb <[email protected]> Signed-off-by: 开哲 <[email protected]> Signed-off-by: Omer Dayan (SW-GPU) <[email protected]> Signed-off-by: Rui Qiao <[email protected]> Signed-off-by: Isotr0py <[email protected]> Signed-off-by: shen-shanshan <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Aaruni Aggarwal <[email protected]> Signed-off-by: Eyshika Agarwal <[email protected]> Signed-off-by: eyshika <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Signed-off-by: Max de Bayser <[email protected]> Signed-off-by: Jens Glaser <[email protected]> Signed-off-by: varun sundar rabindranath <[email protected]> Signed-off-by: Lifu Huang <[email protected]> Signed-off-by: Mengqing Cao <[email protected]> Signed-off-by: cynthieye <[email protected]> Signed-off-by: Randall Smith <[email protected]> Signed-off-by: Luka Govedič <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Jasmond Loh <[email protected]> Signed-off-by: Shangming Cai <[email protected]> Signed-off-by: Christian Heimes <[email protected]> Signed-off-by: Bryan Lu <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: James Wu <[email protected]> Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Yarong Mu <[email protected]> Signed-off-by: shuw <[email protected]> Signed-off-by: charlifu <[email protected]> Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]> Signed-off-by: Andy Xie <[email protected]> Signed-off-by: Aaron Pham <[email protected]> Signed-off-by: changjun.lee <[email protected]> Signed-off-by: imkero <[email protected]> Signed-off-by: ShuaibinLi <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Jade Zheng <[email protected]> Signed-off-by: sfc-gh-zhwang <[email protected]> Signed-off-by: kaixih <[email protected]> Signed-off-by: cascade812 <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: LiuXiaoxuanPKU <[email protected]> Signed-off-by: lkm-schulz <[email protected]> Signed-off-by: Ther-LF <[email protected]> Signed-off-by: KuntaiDu <[email protected]> Signed-off-by: evian <[email protected]> Signed-off-by: Alex-Brooks <[email protected]> Signed-off-by: NickLucche <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Alex <[email protected]> Signed-off-by: Michal Moskal <[email protected]> Signed-off-by: Lucia Fang <[email protected]> Signed-off-by: rzou <[email protected]> Signed-off-by: 苏政渊 <[email protected]> Signed-off-by: qingjun <[email protected]> Signed-off-by: Zerohertz <[email protected]> Signed-off-by: Gregory Shtrasberg <[email protected]> Signed-off-by: mofanke <[email protected]> Signed-off-by: mayuyuace <[email protected]> Signed-off-by: Tianyuan Wu <[email protected]> Signed-off-by: Alexei V. Ivanov <[email protected]> Signed-off-by: Dilip Gowda Bhagavan <[email protected]> Signed-off-by: Benjamin Chislett <[email protected]> Signed-off-by: Gabriel Marinho <[email protected]> Signed-off-by: Huy Do <[email protected]> Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: Marco <[email protected]> Signed-off-by: isotr0py <[email protected]> Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Marko Rosenmueller <[email protected]> Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Pete Savage <[email protected]> Signed-off-by: zh Wang <[email protected]> Signed-off-by: David Xia <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Siyuan Liu <[email protected]> Signed-off-by: Yan Cangang <[email protected]> Signed-off-by: Hongxia Yang <[email protected]> Signed-off-by: Noah Yoshida <[email protected]> Signed-off-by: Keyun Tong <[email protected]> Signed-off-by: Teruaki Ishizaki <[email protected]> Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Sage Moore <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: youkaichao <[email protected]> Co-authored-by: Chendi.Xue <[email protected]> Co-authored-by: Michal Adamczyk <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu41 <[email protected]> Co-authored-by: vllmellm <[email protected]> Co-authored-by: Nick Hill <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: huafeng <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Michael Yao <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Travis Johnson <[email protected]> Co-authored-by: Yong Hoon Shin <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: Sangyeon Cho <[email protected]> Co-authored-by: Chen Xia <[email protected]> Co-authored-by: Areeb Syed <[email protected]> Co-authored-by: 张宇 <[email protected]> Co-authored-by: 开哲 <[email protected]> Co-authored-by: omer-dayan <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Rui Qiao <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Shanshan Shen <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Aaruni Aggarwal <[email protected]> Co-authored-by: Atilla <[email protected]> Co-authored-by: Eyshika Agarwal <[email protected]> Co-authored-by: Yinghai Lu <[email protected]> Co-authored-by: Maximilien de Bayser <[email protected]> Co-authored-by: jglaser <[email protected]> Co-authored-by: tjtanaa <[email protected]> Co-authored-by: Zaida Zhou <[email protected]> Co-authored-by: zhouzaida <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: varun sundar rabindranath <[email protected]> Co-authored-by: Lifu Huang <[email protected]> Co-authored-by: Mengqing Cao <[email protected]> Co-authored-by: yexin(叶鑫) <[email protected]> Co-authored-by: MagnetoWang <[email protected]> Co-authored-by: 조상연[플레이스 AI] <[email protected]> Co-authored-by: rasmith <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Co-authored-by: Lu Fang <[email protected]> Co-authored-by: Alex Brooks <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Jasmond L <[email protected]> Co-authored-by: Shangming Cai <[email protected]> Co-authored-by: Daniel Li <[email protected]> Co-authored-by: Christian Heimes <[email protected]> Co-authored-by: Benjamin Chislett <[email protected]> Co-authored-by: Bryan Lu <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Yihua Cheng <[email protected]> Co-authored-by: James Wu <[email protected]> Co-authored-by: yarongmu-google <[email protected]> Co-authored-by: Shu Wang <[email protected]> Co-authored-by: Charlie Fu <[email protected]> Co-authored-by: Zijing Liu <[email protected]> Co-authored-by: Agata Dobrzyniewicz <[email protected]> Co-authored-by: Ning Xie <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: changjun.lee <[email protected]> Co-authored-by: Kero Liang <[email protected]> Co-authored-by: Happy <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Jade Zheng <[email protected]> Co-authored-by: Flex Wang <[email protected]> Co-authored-by: Kaixi Hou <[email protected]> Co-authored-by: cascade <[email protected]> Co-authored-by: Lily Liu <[email protected]> Co-authored-by: Lennart K. M. Schulz <[email protected]> Co-authored-by: TherLF <[email protected]> Co-authored-by: Kuntai Du <[email protected]> Co-authored-by: Wanrui Dai <[email protected]> Co-authored-by: evian <[email protected]> Co-authored-by: idouba <[email protected]> Co-authored-by: Nicolò Lucchesi <[email protected]> Co-authored-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]> Co-authored-by: Alex Wu <[email protected]> Co-authored-by: Ekagra Ranjan <[email protected]> Co-authored-by: Michał Moskal <[email protected]> Co-authored-by: Lucia Fang <[email protected]> Co-authored-by: Richard Barnes <[email protected]> Co-authored-by: Richard Zou <[email protected]> Co-authored-by: Zhengyuan Su (苏政渊) <[email protected]> Co-authored-by: 苏政渊 <[email protected]> Co-authored-by: qscqesze <[email protected]> Co-authored-by: ponix-j <[email protected]> Co-authored-by: Hyogeun Oh (오효근) <[email protected]> Co-authored-by: Gregory Shtrasberg <[email protected]> Co-authored-by: a2q1p <[email protected]> Co-authored-by: mofanke <[email protected]> Co-authored-by: Qiming Zhang <[email protected]> Co-authored-by: TY-AMD <[email protected]> Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]> Co-authored-by: casinca <[email protected]> Co-authored-by: Dilip Gowda Bhagavan <[email protected]> Co-authored-by: Bryan Lu <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Gabriel Marinho <[email protected]> Co-authored-by: Huy Do <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: Marco <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Co-authored-by: Marko Rosenmueller <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: Pete Savage <[email protected]> Co-authored-by: zh Wang <[email protected]> Co-authored-by: David Xia <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Siyuan Liu <[email protected]> Co-authored-by: NaLan ZeYu <[email protected]> Co-authored-by: Hongxia Yang <[email protected]> Co-authored-by: Noah Yoshida <[email protected]> Co-authored-by: Keyun Tong <[email protected]> Co-authored-by: Teruaki Ishizaki <[email protected]> Co-authored-by: Sage Moore <[email protected]>
vllm-project#17369) Signed-off-by: mofanke <[email protected]>
vllm-project#17369) Signed-off-by: mofanke <[email protected]>
vllm-project#17369) Signed-off-by: mofanke <[email protected]> Signed-off-by: Mu Huai <[email protected]>
FIX (#17357)
add a new reasoning-parser qwen3
Code Attribution
gaocegege/vllm
project's deepseek_r1_reasoning_parser.pytest for request