mtmd : (WIP) add ultravox audio input #13623

ngxson · 2025-05-18T21:15:01Z

Supersede #12745

TODO writing more in details:

mel filter bank is hard-coded
Why I'm using miniaudio.h, why it's compiled in as separated lib (but static link)?
Plan to rename some prefix image to media or buffer
Deprecation of some mtmd_ API

ngxson · 2025-05-18T23:10:21Z

Ok somehow it works magically, the code is still nowhere near finish

Tested using first 6 seconds from https://www.youtube.com/watch?v=vP4iY1TtS3s

ngxson · 2025-05-19T20:28:11Z

tools/mtmd/mtmd-audio.cpp

+#define MINIAUDIO_IMPLEMENTATION
+#define MA_NO_ENCODING
+#define MA_NO_DEVICE_IO
+#define MA_NO_RESOURCE_MANAGER
+#define MA_NO_NODE_GRAPH
+#define MA_NO_ENGINE
+#define MA_NO_GENERATION
+#define MA_API static
+#include "miniaudio.h"


@ggerganov I initially use dr_wav.h here, but I struggled to write myself a resampling algorithm to downsample/upsample audio to 16KHz. I ended up using miniaudio.h here which provide decoding wav, mp3, flac, etc while also come with resampling built-in.

However, the caveat is that this single-header library is 3MB of code, and most of the components are disabled upon compilation as you see here.

What do you think about keeping this lib? I think the other components can be useful for TTS, as it allow us to play the generated audio without an external command.

It's ok to use it - we also switched to miniaudio.h for the whisper.cpp examples: ggml-org/whisper.cpp#2759.

Just make sure it will be used only in the examples/tools and that libmtmd will not depend on it to function.

ngxson added 9 commits May 4, 2025 17:06

convert ok, load ok

4fa0c27

warmup ok

8b73116

test

4ac7940

still does not work?

4282465

fix padding

45cdb7f

temporary give up

f3605b9

Merge branch 'master' into xsn/mtmd_ultravox

1804fa2

fix merge conflict

bc708b4

build_ultravox()

de20afd

github-actions bot added examples python python script changes labels May 18, 2025

ngxson added 8 commits May 19, 2025 10:46

rm test

bbe4940

Merge branch 'master' into xsn/mtmd_ultravox

4d44460

fix merge conflict

8d7d75a

add necessary mtmd APIs

dce799d

first working version (only 4s of audio)

f151854

will this monster compile?

9a0dcb6

fix compile

1a90395

please compile

4a8c092

ngxson commented May 19, 2025

View reviewed changes

ngxson added 3 commits May 19, 2025 22:29

fPIC

6f23ad1

fix windows

cf38b47

various fixes

cf4f5d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : (WIP) add ultravox audio input #13623

mtmd : (WIP) add ultravox audio input #13623

ngxson commented May 18, 2025 •

edited

Loading

ngxson commented May 18, 2025

ngxson May 19, 2025

ggerganov May 20, 2025

mtmd : (WIP) add ultravox audio input #13623

Are you sure you want to change the base?

mtmd : (WIP) add ultravox audio input #13623

Conversation

ngxson commented May 18, 2025 • edited Loading

ngxson commented May 18, 2025

ngxson May 19, 2025

Choose a reason for hiding this comment

ggerganov May 20, 2025

Choose a reason for hiding this comment

ngxson commented May 18, 2025 •

edited

Loading