#

moe

Here are 59 public repositories matching this topic...

LLaMA-Factory

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Updated May 3, 2025
Python

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava deepseek-llm deepseek llama3 llama3-1 deepseek-v3 deepseek-r1 deepseek-r1-zero qwen3 llama4

Updated May 5, 2025
Python

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

moe multi-modal mixture-of-experts large-vision-language-model

Updated Dec 3, 2024
Python

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

pytorch transformer moe llm llm-serving llm-training flash-attention

Updated Apr 3, 2025
Python

davidmrau / mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

pytorch moe re-implementation mixture-of-experts sparsely-gated-mixture-of-experts

Updated Apr 19, 2024
Python

pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

moe llama mixture-of-experts llm continual-pre-training expert-partition

Updated Dec 6, 2024
Python

sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Updated Jul 2, 2024
Python

open-compass / MixtralKit

A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

moe mistral llm

Updated Dec 15, 2023
Python

ScienceOne-AI / DeepSeek-671B-SFT-Guide

An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (DeepSeek-V3/R1 满血版 671B 全参数微调的开源解决方案，包含从训练到推理的完整代码和脚本，以及实践中积累一些经验和结论。)

python moe sft llm deepseek-r1

Updated Mar 13, 2025
Python

ymcui / Chinese-Mixtral

中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

nlp moe 64k mixture-of-experts 32k large-language-models llm mixtral

Updated Apr 30, 2024
Python

SkyworkAI / MoH

MoH: Multi-Head Attention as Mixture-of-Head Attention

transformer moe attention vit dit mixture-of-experts llms

Updated Oct 29, 2024
Python

IBM / ModuleFormer

ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.

lm moe

Updated Apr 10, 2024
Python

SkyworkAI / MoE-plus-plus

[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

moe mixture-of-experts large-language-models llms

Updated Oct 16, 2024
Python

inclusionAI / Ling

Ling is a MoE LLM provided and open-sourced by InclusionAI.

machine-learning rl moe llm llm-reasoning

Updated Apr 18, 2025
Python

shufangxun / LLaVA-MoD

[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

moe preference knowledge-distillation distillation kd hallucination mixture-of-experts large-language-models llm rlhf mllm llava multimodal-large-language-models qwen

Updated Mar 31, 2025
Python

kyegomez / MoE-Mamba

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

ai ml moe swarms multi-modality multi-modal-fusion

Updated Apr 6, 2025
Python

kyegomez / SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

ai ml moe llama multi-modal mixture-model mixture-of-experts mixture-of-models gpt4

Updated Apr 4, 2025
Python

simplifine-llm / Simplifine

🚀 Easy, open-source LLM finetuning with one-line commands, seamless cloud integration, and popular optimization frameworks. ✨

open-source cloud ai moe llama gpt lora mistral phi fine-tuning peft large-language-models llm instruction-tuning llm-training finetuning-llms fine-tuning-llm qwen llama3

Updated Aug 14, 2024
Python

LINs-lab / DynMoE

[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

moe language-model mixture-of-experts adaptive-computation multimodal-large-language-models

Updated Feb 7, 2025
Python

OpenSparseLLMs / LLaMA-MoE-v2

🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

sparsity moe attention llama fine-tuning sft mixture-of-experts instruction-tuning llama3

Updated Dec 3, 2024
Python

Improve this page

Add a description, image, and links to the moe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."