Monday, 7 Apr 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    What About Millet and Diabetes? 
    What About Millet and Diabetes? 

    What were the remarkable results of a crossover study randomizing hundreds of…

    By capernaum
    Apple is quietly building an AI health coach
    Apple is quietly building an AI health coach

    Apple is reportedly developing an “AI doctor,” according to Mark Gurman at…

    By capernaum
    Is Millet a Nutritious Grain? 
    Is Millet a Nutritious Grain? 

    Millet isn’t the name of a specific grain, but a generic term…

    By capernaum
    Does Processed Meat Affect Our Lung Function? 
    Does Processed Meat Affect Our Lung Function? 

    If the nitrites in foods like ham and bacon cause lung damage,…

    By capernaum
    Is There a Limit to How Many Lychee Fruit We Should Eat? 

    There is a toxin in lychee fruit that can be harmful, but…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Finance
  • Stock Market
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens
AITechnology

Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens

capernaum
Last updated: 2025-01-27 05:55
capernaum
Share
Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens
SHARE

The advancements in large language models (LLMs) have significantly enhanced natural language processing (NLP), enabling capabilities like contextual understanding, code generation, and reasoning. However, a key limitation persists: the restricted context window size. Most LLMs can only process a fixed amount of text, typically up to 128K tokens, which limits their ability to handle tasks requiring extensive context, such as analyzing lengthy documents or debugging large codebases. These constraints often necessitate workarounds like text chunking, increasing computational complexity. Overcoming these challenges requires models that can extend context lengths efficiently without compromising performance.

Qwen AI’s Latest Release

Qwen AI has introduced two new models, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, designed to support context lengths of up to 1 million tokens. Developed by the Qwen team at Alibaba Group, these models also come with an open-sourced inference framework optimized for handling long contexts. This advancement enables developers and researchers to work with larger datasets in a single pass, offering a practical solution for applications that demand extended context processing. Additionally, the models feature improvements in sparse attention mechanisms and kernel optimization, resulting in faster processing times for extended inputs.

Technical Details and Benefits

The Qwen2.5-1M series retains a Transformer-based architecture, incorporating features like Grouped Query Attention (GQA), Rotary Positional Embeddings (RoPE), and RMSNorm for stability over long contexts. Training involved both natural and synthetic datasets, with tasks like Fill-in-the-Middle (FIM), paragraph reordering, and position-based retrieval enhancing the model’s ability to handle long-range dependencies. Sparse attention methods such as Dual Chunk Attention (DCA) allow for efficient inference by dividing sequences into manageable chunks. Progressive pre-training strategies, which gradually scale context lengths from 4K to 1M tokens, optimize efficiency while controlling computational demands. The models are fully compatible with vLLM’s open-source inference framework, simplifying integration for developers.

Results and Insights

Benchmark results demonstrate the capabilities of the Qwen2.5-1M models. In the Passkey Retrieval Test, the 7B and 14B variants successfully retrieved hidden information from 1 million tokens, showcasing their effectiveness in long-context scenarios. In other benchmarks, including RULER and Needle in a Haystack (NIAH), the 14B model outperformed alternatives like GPT-4o-mini and Llama-3. Sparse attention techniques contributed to reduced inference times, achieving speedups of up to 6.7x on Nvidia H20 GPUs. These results highlight the models’ ability to combine efficiency with high performance, making them suitable for real-world applications requiring extensive context.

Conclusion

The Qwen2.5-1M series addresses critical limitations in NLP by significantly extending context lengths while maintaining efficiency and accessibility. By overcoming constraints that have long hindered LLMs, these models open new possibilities for applications ranging from analyzing large datasets to processing entire code repositories. With innovations in sparse attention, kernel optimization, and long-context pre-training, Qwen2.5-1M offers a practical and effective tool for tackling complex, context-heavy tasks.


Check out the Paper, Models on Hugging Face and Technical Details. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 70k+ ML SubReddit.

🚨 [Recommended Read] Nebius AI Studio expands with vision models, new language models, embeddings and LoRA (Promoted)

The post Qwen AI Releases Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M: Allowing Deployment with Context Length up to 1M Tokens appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Litecoin ETF Speculation Heats Up as LTC Whale Activity Soars Litecoin ETF Speculation Heats Up as LTC Whale Activity Soars
Next Article Cardano (ADA) In The Red: Struggles Persist Under $1.00 Cardano (ADA) In The Red: Struggles Persist Under $1.00
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

Transformer Meets Diffusion: How the Transfusion Architecture Empowers GPT-4o’s Creativity
AIMachine LearningTechnology

Transformer Meets Diffusion: How the Transfusion Architecture Empowers GPT-4o’s Creativity

By capernaum
This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability Method to Trace Internal Reasoning in Claude 3.5 Haiku
AIMachine LearningTechnology

This AI Paper from Anthropic Introduces Attribution Graphs: A New Interpretability Method to Trace Internal Reasoning in Claude 3.5 Haiku

By capernaum
Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
AIMachine LearningTechnology

Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models

By capernaum
Reducto AI Released RolmOCR: A SoTA OCR Model Built on Qwen 2.5 VL, Fully Open-Source and Apache 2.0 Licensed for Advanced Document Understanding
AIMachine LearningTechnology

Reducto AI Released RolmOCR: A SoTA OCR Model Built on Qwen 2.5 VL, Fully Open-Source and Apache 2.0 Licensed for Advanced Document Understanding

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?