ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

vineelabhinav · 2025-05-17T09:25:17Z

This PR adds SVE kernel support for F32 datatype specific to Mamba Model on ARM architecture.
Major code changes:

Add SVE support for ggml_vec_dot_f32() function.
Add SVE support for ggml_compute_forward_ssm_scan_f32() function.
Add SVE support for ggml_vec_mad_f32() function.
Add SVE support for ggml_vec_scale_f32() function.

Performance

This PR improves performance by ~1.3x compared to the previous NEON-based implementation.
Model: falcon-mamba-7B-F32.gguf
Command: ./build/bin/llama-bench -m falcon-mamba-7B-F32.gguf -t 8,16,32,64 -p 128,1024 -n 0

Task1: Prompt Length: 128 tokens, Generated Tokens: 1 token

Threads	Neon (Tokens/sec)	SVE (Tokens/sec)	Ratio
8	9.21	12.52	1.36
16	17.89	23.85	1.33
32	32.3	41.59	1.29
64	53.08	62.94	1.19

Task2: Prompt Length: 1024 tokens, Generated Tokens: 1 token

Threads	Neon (Tokens/sec)	SVE (Tokens/sec)	Ratio
8	8.95	11.66	1.3
16	17.3	21.97	1.27
32	31.07	38.48	1.24
64	50.73	58.99	1.16

Perplexity

There is no change in model accuracy as a result of this PR.
Command: ./build/bin/llama-perplexity -s 0 -np 128 -t 64 -m falcon-mamba-7B-F32.gguf -c 128 -b 128 --chunks 16 -f scripts/wikitext-2-raw/wiki.test.raw

NEON	SVE
7.6153 +/- 0.66890	7.6153 +/- 0.66890

Contributor: Vineel Abhinav Gottala

cc: @Vithulep

vineelabhinav added 2 commits May 17, 2025 08:58

F32-Mamba-SVE

8581c89

F32-Mamba-SVE

55b5545

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 17, 2025

vineelabhinav added 2 commits May 19, 2025 14:51

Resolve test errors-1

b4ab67c

Resolve test errors-2

5cd9e35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

vineelabhinav commented May 17, 2025 •

edited

Loading

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

Are you sure you want to change the base?

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

Conversation

vineelabhinav commented May 17, 2025 • edited Loading

Performance

Perplexity

vineelabhinav commented May 17, 2025 •

edited

Loading