-
-
Notifications
You must be signed in to change notification settings - Fork 7.6k
[Feature]: guided decoding on TPU #11104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm down to work on these features :). |
Thanks @carlesoctav - to get started, we need to enable support for LogitsProcessing on TPUs. Do you need a pointer to get started? |
Thanks so much for lending a hand @carlesoctav ! Indeed, this is super important :) |
hi, I've been working on making these features viable and concluded with this approach:
However, there are still some missing parameters needed for a also here's the diff for the changes I made: |
cc @dyli-google 👍 |
Sorry for the late reply. Is there any update on this? Is carlesoctav@34703fc still the latest commit? Also, do you want to create a pull request for this? Thanks. |
I did some investigation yesterday, carlesoctav's way is workable, but given the pr to support structured decoding on GPU V1 (#12388), we only need to do the same thing on v1/worker/tpu_model_runner as v1/worker/gpu_model_runner. I can implement it after the pr is merged. |
I believe this was completed by #16499 |
🚀 The feature, motivation and pitch
I’m not sure if this is possible, but right now the
execute_model
function on theTPUModelRunner
is only outputting the predicted token_ids, rather than the distribution of tokens that we can sample from with some guidance (e.g., using outlines). I believe structured output is becoming more common, and most projects that require LLMs need this structured output feature.Alternatives
No response
Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: