Skip to content

mtmd : (WIP) add ultravox audio input #13623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented May 18, 2025

Supersede #12745

TODO writing more in details:

  • mel filter bank is hard-coded
  • Why I'm using miniaudio.h, why it's compiled in as separated lib (but static link)?
  • Plan to rename some prefix image to media or buffer
  • Deprecation of some mtmd_ API

@github-actions github-actions bot added examples python python script changes labels May 18, 2025
@ngxson
Copy link
Collaborator Author

ngxson commented May 18, 2025

Ok somehow it works magically, the code is still nowhere near finish

Tested using first 6 seconds from https://www.youtube.com/watch?v=vP4iY1TtS3s

image

Comment on lines +3 to +11
#define MINIAUDIO_IMPLEMENTATION
#define MA_NO_ENCODING
#define MA_NO_DEVICE_IO
#define MA_NO_RESOURCE_MANAGER
#define MA_NO_NODE_GRAPH
#define MA_NO_ENGINE
#define MA_NO_GENERATION
#define MA_API static
#include "miniaudio.h"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov I initially use dr_wav.h here, but I struggled to write myself a resampling algorithm to downsample/upsample audio to 16KHz. I ended up using miniaudio.h here which provide decoding wav, mp3, flac, etc while also come with resampling built-in.

However, the caveat is that this single-header library is 3MB of code, and most of the components are disabled upon compilation as you see here.

What do you think about keeping this lib? I think the other components can be useful for TTS, as it allow us to play the generated audio without an external command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to use it - we also switched to miniaudio.h for the whisper.cpp examples: ggml-org/whisper.cpp#2759.

Just make sure it will be used only in the examples/tools and that libmtmd will not depend on it to function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants