Moe
Mixture of Experts

In a mixture of experts architecture like Mixtral, multiple feed forward networks constitute experts, each with a unique focus. Gate weights select a fixed number of experts and their relevance. The input is passed through each of these experts and the outputs are weghted by relevance and added together.
Comments
No comments found for this article.
Join the discussion for this article on this ticket. Comments appear on this page instantly.