Moe

Jan 12, 2024

Mixture of Experts

In a mixture of experts architecture like Mixtral, multiple feed forward networks constitute experts, each with a unique focus. Gate weights select a fixed number of experts and their relevance. The input is passed through each of these experts and the outputs are weghted by relevance and added together.

Comments

No comments found for this article.

Join the discussion for this article on this ticket. Comments appear on this page instantly.