Skip to content

ggml: aarch64: Implement SVE F32 kernels for Mamba Model #13602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

vineelabhinav
Copy link

@vineelabhinav vineelabhinav commented May 17, 2025

This PR adds SVE kernel support for F32 datatype specific to Mamba Model on ARM architecture.
Major code changes:

  1. Add SVE support for ggml_vec_dot_f32() function.
  2. Add SVE support for ggml_compute_forward_ssm_scan_f32() function.
  3. Add SVE support for ggml_vec_mad_f32() function.
  4. Add SVE support for ggml_vec_scale_f32() function.

Performance

This PR improves performance by ~1.3x compared to the previous NEON-based implementation.
Model: falcon-mamba-7B-F32.gguf
Command: ./build/bin/llama-bench -m falcon-mamba-7B-F32.gguf -t 8,16,32,64 -p 128,1024 -n 0

  • Task1: Prompt Length: 128 tokens, Generated Tokens: 1 token
Threads Neon (Tokens/sec) SVE  (Tokens/sec) Ratio
8 9.21 12.52 1.36
16 17.89 23.85 1.33
32 32.3 41.59 1.29
64 53.08 62.94 1.19
  • Task2: Prompt Length: 1024 tokens, Generated Tokens: 1 token
Threads Neon (Tokens/sec) SVE  (Tokens/sec) Ratio
8 8.95 11.66 1.3
16 17.3 21.97 1.27
32 31.07 38.48 1.24
64 50.73 58.99 1.16

Perplexity

There is no change in model accuracy as a result of this PR.
Command: ./build/bin/llama-perplexity -s 0 -np 128 -t 64 -m falcon-mamba-7B-F32.gguf -c 128 -b 128 --chunks 16 -f scripts/wikitext-2-raw/wiki.test.raw

NEON SVE
7.6153 +/- 0.66890 7.6153 +/- 0.66890

Contributor: Vineel Abhinav Gottala

cc: @Vithulep

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant