[Feature]: guided decoding on TPU #11104

carlesoctav · 2024-12-11T14:26:43Z

🚀 The feature, motivation and pitch

I’m not sure if this is possible, but right now the execute_model function on the TPUModelRunner is only outputting the predicted token_ids, rather than the distribution of tokens that we can sample from with some guidance (e.g., using outlines). I believe structured output is becoming more common, and most projects that require LLMs need this structured output feature.

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

carlesoctav · 2024-12-11T14:27:20Z

I'm down to work on these features :).

robertgshaw2-redhat · 2024-12-17T01:51:04Z

Thanks @carlesoctav - to get started, we need to enable support for LogitsProcessing on TPUs. Do you need a pointer to get started?

bvrockwell · 2024-12-17T01:53:38Z

Thanks so much for lending a hand @carlesoctav ! Indeed, this is super important :)

carlesoctav · 2024-12-20T03:47:36Z

hi, I've been working on making these features viable and concluded with this approach:

Extract the logits_processors in the prepare_sample function (previously outputting just n, p, t params).
Pass the logits_processors as one of the parameters for the ModelWrapper.forward class.
Iteratively apply the logits_processor (similar to the _apply_logits_processor function).

However, there are still some missing parameters needed for a LogitProcessor, mainly prompt_token_ids or past_tokens_ids and it required sample_indices to extract those params.
may I know how I can get these parameters? Do I need to extract them in the prepare_input function?

also here's the diff for the changes I made:
carlesoctav@34703fc

bvrockwell · 2025-01-07T02:17:09Z

cc @dyli-google 👍

bvrockwell · 2025-01-29T01:11:59Z

@Chenyaaang

dyli-google · 2025-02-02T07:56:45Z

@carlesoctav

Sorry for the late reply.

Is there any update on this? Is carlesoctav@34703fc still the latest commit?

Also, do you want to create a pull request for this?

Thanks.

Chenyaaang · 2025-02-26T18:36:28Z

I did some investigation yesterday, carlesoctav's way is workable, but given the pr to support structured decoding on GPU V1 (#12388), we only need to do the same thing on v1/worker/tpu_model_runner as v1/worker/gpu_model_runner. I can implement it after the pr is merged.

russellb · 2025-04-23T18:32:19Z

I believe this was completed by #16499

carlesoctav added the feature request New feature or request label Dec 11, 2024

russellb added the structured-output label Jan 16, 2025

russellb mentioned this issue Jan 17, 2025

[Usage]: Running guided decoding on vllm for TPUs #11855

Closed

1 task

carlesoctav mentioned this issue Mar 24, 2025

[TPU][V1] Guided decoding on TPU #15401

Draft

Chenyaaang mentioned this issue Apr 11, 2025

[Core][V1][TPU] Enable structured decoding on TPU V1 #16499

Merged

russellb added this to Structured Output Apr 22, 2025

russellb moved this to In progress in Structured Output Apr 23, 2025

russellb closed this as completed Apr 23, 2025

github-project-automation bot moved this from In progress to Done in Structured Output Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: guided decoding on TPU #11104

[Feature]: guided decoding on TPU #11104

carlesoctav commented Dec 11, 2024

carlesoctav commented Dec 11, 2024

Uh oh!

robertgshaw2-redhat commented Dec 17, 2024

Uh oh!

bvrockwell commented Dec 17, 2024

Uh oh!

carlesoctav commented Dec 20, 2024

Uh oh!

bvrockwell commented Jan 7, 2025

Uh oh!

bvrockwell commented Jan 29, 2025

Uh oh!

dyli-google commented Feb 2, 2025

Uh oh!

Chenyaaang commented Feb 26, 2025

Uh oh!

russellb commented Apr 23, 2025

Uh oh!

Uh oh!

[Feature]: guided decoding on TPU #11104

[Feature]: guided decoding on TPU #11104

Comments

carlesoctav commented Dec 11, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

carlesoctav commented Dec 11, 2024

Uh oh!

robertgshaw2-redhat commented Dec 17, 2024

Uh oh!

bvrockwell commented Dec 17, 2024

Uh oh!

carlesoctav commented Dec 20, 2024

Uh oh!

bvrockwell commented Jan 7, 2025

Uh oh!

bvrockwell commented Jan 29, 2025

Uh oh!

dyli-google commented Feb 2, 2025

Uh oh!

Chenyaaang commented Feb 26, 2025

Uh oh!

russellb commented Apr 23, 2025

Uh oh!