Skip to content

debug test sampler inconsistency #16939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Chenyaaang
Copy link
Contributor

@Chenyaaang Chenyaaang commented Apr 21, 2025

this fails vllm/tests/v1/tpu/test_sampler.py at line 46, basically when it asserts same sampling_params should have same result, it fails.
the results are:
"The robot is a humanoid with a sleek, metallic body and a pair of glowing eyes. It has been programmed to follow orders and perform tasks, but it has always been a machine, devoid of emotions and consciousness. One day, while working in a factory, the robot overhears a conversation between two workers. They are"
vs
"The robot is a humanoid with a sleek, metallic body and a pair of glowing eyes. It has been programmed to perform tasks for humans, but it has always been a machine, devoid of emotions or consciousness. One day, while working in a factory, the robot experiences a strange sensation in its mind. It begins to"

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added v1 tpu Related to Google TPUs labels Apr 21, 2025
@Chenyaaang Chenyaaang force-pushed the debug-test-sampler branch 3 times, most recently from 1d73364 to 0e80e6b Compare April 21, 2025 19:18
@@ -23,7 +23,7 @@ def test_sampler_different(model_name: str):
different results.
"""
llm = LLM(model_name,
enforce_eager=False,
enforce_eager=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You run the test as VLLM_USE_V1=1 python tests/v1/tpu/test_sampler.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, VLLM_USE_V1=1 pytest -s -vv tests/v1/tpu/test_sampler.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enabling enforce_eager makes the test faster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the temperature is 0?

Copy link
Contributor Author

@Chenyaaang Chenyaaang Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temperature is set below, for the test I'm failing, batch size = 16, the temperatures are all 0.1, so it's using the sampler instead of greedy sampling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry updated in description, when it asserts same sampling_params should lead to the same result, it fails.

@mgoin
Copy link
Member

mgoin commented Apr 21, 2025

There are natural numerical instabilities that make it difficult to have deterministic results over many tokens. I think it is reasonable for your test that ~20 tokens match before it starts to diverge.

@NickLucche
Copy link
Contributor

I agree with @mgoin , I think you can safely loosen the checks on this test in your PR #16499 to land it.

My initial intention with this test was actually to test that different results would be produced when significantly different sampling params were provided. I think this particular check was changed after review.
Perhaps we could bring it back to its original logic.

@Chenyaaang Chenyaaang closed this Apr 23, 2025
@Chenyaaang Chenyaaang deleted the debug-test-sampler branch April 23, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tpu Related to Google TPUs v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants