-
Notifications
You must be signed in to change notification settings - Fork 11.9k
tts : add support for Orpheus #12476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Howdy! I'd like to give this issue a try. |
It would be awesome if we could get xcodec, we already have plenty of tts but no ttmusic yet. Everything is a llama model and snac/wavtokenizer/xcodec the vocoding is all that's missing #11467 |
Technically yes, though we have to improve the way that these codecs are implemented and supported in general. Probably be able to have a single GGUF file with both the LLM and the codec, instead of separate. And be able to create either separate decoder / vocoder contexts from a single model. Or alternatively, a combined decoder+vocoder context. At least that's my general idea for supporting multi-modal cases, though I'm still figuring it out.
Mostly yes, but we also have to figure out how to do audio streaming. I am not sure yet how it works, but with Orpheus we should be able to understand it because it supports real-time streaming. |
This works pretty well on my Mac M1 running two instances of llama-server and fastrtc. |
HF: https://huggingface.co/collections/canopylabs/orpheus-tts-67d9ea3f6c05a941c06ad9d2
These TTS models seem suitable for supporting. To do that, we need to implement the SNAC audio codec: https://github.com/hubertsiuzdak/snac/
Sample implementation using Python-based inference of SNAC: https://github.com/isaiahbjork/orpheus-tts-local
Similar model support (OuteTTS): #10784
Can be used as a reference how to implement this.
The text was updated successfully, but these errors were encountered: