Poke | Mixture-of-Experts

Mixture-of-Experts

8 videos • 53 views • by Tunadorable

MaskMoE: Forcing rare tokens to only use one expert

Tunadorable
Download

What happens when you take MoE scaling laws seriously?

Tunadorable
Download

Multi-Head Mixture-of-Experts

Tunadorable
Download

Exponentially Faster Language Modeling

Tunadorable
Download

MoE-Level Performance Without The Added Computation

Tunadorable
Download

MoE LLMs with Dense Training for Better Performance

Tunadorable
Download

If early layers don't need tons of experts, can we save compute?

Tunadorable
Download

Do we really need to use every single transformer layer?

Tunadorable
Download