Wednesday, 14 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » List of Large Mixture of Experts (MoE) Models: Architecture, Performance, and Innovations in Scalable AI Solutions
AITechnology

List of Large Mixture of Experts (MoE) Models: Architecture, Performance, and Innovations in Scalable AI Solutions

capernaum
Last updated: 2024-11-17 01:24
capernaum
Share
List of Large Mixture of Experts (MoE) Models: Architecture, Performance, and Innovations in Scalable AI Solutions
SHARE

Mixture of Experts (MoE) models represents a significant breakthrough in machine learning, offering an efficient approach to handling large-scale models. Unlike dense models, where all parameters are active during inference, MoE models activate only a fraction of their parameters. This approach balances computational efficiency with scalability, making MoE models highly attractive for various use cases. MoE models achieve efficiency by activating fewer parameters while maintaining a larger total parameter count. This design introduces unique trade-offs, including increased architectural complexity, but it provides greater flexibility for developers and researchers.

Let’s explore the largest MoE models released to date, focusing on their architecture, capabilities, and relative performance. These models are all publicly available and exceed 100 billion parameters. The analysis is ordered chronologically by release date, with rankings provided where available from the LMSYS leaderboard as of November 4, 2024.

Google’s Switch-C Transformer is one of the earliest models in the MoE space. Released on Hugging Face in November 2022, it boasts a staggering 1.6 trillion total parameters, supported by 2048 experts. Despite being an early innovator in this domain, Switch-C is now considered outdated, as it is not ranked on modern benchmarks like LMSYS. However, it remains noteworthy as a foundational MoE model and continues to influence subsequent innovations. Smaller variants of the Switch-C Transformer are also available, offering more accessible entry points for experimentation.

In March 2024, X AI released Grok-1, a model with 314 billion total parameters and 86 billion active during inference. Unlike its predecessor, Grok-1 utilizes a smaller pool of experts, eight in total, with only two active per inference task. Its 8k context length is suitable for moderately long input sequences, though it is not competitive with newer models. While Grok-1 has limited adoption and is not ranked on LMSYS, its successor, Grok-2, has shown promise in preliminary benchmarks. Grok-2, yet to be publicly released, has ranked fifth overall in specific LMSYS tasks, suggesting that future iterations of this model could redefine performance benchmarks in the MoE landscape.

Shortly after Grok-1, Databricks released DBRX in late March 2024. This model features 132 billion total parameters, with 36 billion active, spread across 16 experts. Its 32k context length significantly outpaces many contemporaries, allowing it to process longer input sequences efficiently. DBRX is supported by multiple backends, including llamacpp, exllama v2, and vLLM, making it a versatile choice for developers. Despite its strong architecture, its LMSYS rankings place it only at 90th overall and 78th for hard prompts in English, indicating room for improvement in quality and adoption.

April 2024 saw the release of Mistral AI’s Mixtral 8x22b. This model stands out with its 141 billion total parameters and 39 billion active during inference. It incorporates eight experts, two of which are chosen dynamically based on the input. With a 64k context length, Mixtral is well-suited for tasks requiring extensive input handling. While its LMSYS rankings, 70th overall and 66th on hard prompts, indicate middling performance, its compatibility with multiple backends ensures usability across diverse platforms.

Another April release was Snowflake’s Arctic, an MoE model with 480 billion total parameters but only 17 billion active during inference. Arctic’s unique design combines sparse (7 billion) and dense (10 billion) components distributed among 128 experts. However, its performance falls short, ranking 99th overall on LMSYS and a notably low 101st for hard prompts. Its limited 4k context length further restricts its applicability, making it a less competitive option despite its innovative architecture.

Skywork joined the MoE space in June 2024 with the release of Skywork-MoE. This model features 146 billion total parameters, of which 22 billion are active, and employs 16 experts during inference. With an 8k context length, it supports moderately lengthy tasks but lacks LMSYS rankings, which suggests limited testing or adoption. The base model is the only available version, as the promised chat variant has yet to be released.

In August 2024, AI21 Labs released Jamba 1.5 Large, a hybrid model that merges MoE and mamba-transformer architectures. With 398 billion total parameters and 98 billion active, Jamba 1.5 Large offers an exceptional 256k context length, making it ideal for tasks requiring extensive input processing. Its LMSYS rankings reflect its high performance, placing 34th overall and 28th for hard prompts. Additionally, Jamba models excel in context benchmarks, particularly the RULER context benchmark, solidifying their reputation for long-context tasks.

DeepSeek V2.5, released in September 2024, currently leads the MoE space in performance. This model incorporates 236 billion total parameters, with 21 billion active during inference. Its architecture includes 160 experts, of which six are dynamically chosen and two are shared, resulting in eight active parameters. With a 128k context length, DeepSeek V2.5 demonstrates robust capabilities for long-context tasks. It ranks 18th overall on LMSYS and 6th for hard prompts, outperforming all available MoE models. Earlier iterations, such as DeepSeek V2, laid the groundwork for its success.

The most recent addition to the MoE family is Tencent’s Hunyuan Large, released in November 2024. With 389 billion total parameters and 52 billion active, Hunyuan Large employs a unique design, where one expert is chosen dynamically and one is shared. This results in two active parameters during inference. Its 128k context length matches that of DeepSeek V2.5, positioning it as a strong competitor. While it is not yet ranked on LMSYS, early indications suggest it could rival or surpass DeepSeek’s performance.

Among the MoE models discussed, DeepSeek V2.5 is the most robust option currently available. However, newer models such as Hunyuan Large and the anticipated Grok-2 may soon shift the rankings. Models like Jamba 1.5 Large also highlight the strengths of hybrid architectures, particularly in tasks requiring extensive context handling. The LMSYS rankings, while useful for initial comparisons, do not capture every nuance of model performance, especially for specialized tasks.

In conclusion, MoE models represent a growing frontier in AI, offering scalable and efficient solutions tailored to diverse applications. Developers and researchers are encouraged to explore these models based on specific use cases, leveraging their unique architectures to optimize performance. As the field evolves, the MoE landscape will likely witness further innovations, pushing the boundaries of what these architectures can achieve.


This article is based on this Reddit post. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate Transactions– From Framework to Production

The post List of Large Mixture of Experts (MoE) Models: Architecture, Performance, and Innovations in Scalable AI Solutions appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Tyler Winklevoss: D.O.G.E. Is Crucial for Fighting US Inflation Tyler Winklevoss: D.O.G.E. Is Crucial for Fighting US Inflation
Next Article Top 5 Changes Coming to Cryptocurrency Market in 2025 Top 5 Changes Coming to Cryptocurrency Market in 2025
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain
AI

A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain

By capernaum
Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization
AI

Agent-Based Debugging Gets a Cost-Effective Alternative: Salesforce AI Presents SWERank for Accurate and Scalable Software Issue Localization

By capernaum

This AI Paper Investigates Test-Time Scaling of English-Centric RLMs for Enhanced Multilingual Reasoning and Domain Generalization

By capernaum
Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification
AIMachine LearningTechnology

Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?