-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
[TPU] Fix the test_sampler #17820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPU] Fix the test_sampler #17820
Conversation
Signed-off-by: Jevin Jiang <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Can you PTAL? @mgoin Thanks! |
Cc @NickLucche |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for fix, let's see if CI is green!
Yeah, it's strange when theres's NaN as kv cache is initialized by using The NaN value should be introduced during the model execution, then it is written to kv cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix!
The NaN part is a bit concerning, can we dig into the reason as of why we're producing nan values at runtime? I think the kernel assumption is fine as we can't recover from that situation anyway. |
Signed-off-by: 汪志鹏 <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
TLDR: this PR is temporary quick fix for broken CI for tests/v1/tpu/test_sampler.py. Kernel fix is WIP.
After #16458, for the
test_sampler_different
intests/v1/tpu/test_sampler.py
, we will use (32, 32) block shape instead of (128, 32) block shape. However, this triggers some correctness issue. The root cause is kernel assumes kv_cache has no NAN value. We should fix that in kernel! Meanwhile, just to unblock current CI for TPU, we can reducemax_num_batched_tokens
to make the test pass and this should not affect correctness testing. (I have no idea why whenmax_num_batched_tokens==512
, the kv cache will have uninitialized NAN value.)