Monday, 19 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Eating to Keep Ulcerative Colitis in Remission 
    Eating to Keep Ulcerative Colitis in Remission 

    Plant-based diets can be 98 percent effective in keeping ulcerative colitis patients…

    By capernaum
    Foods That Disrupt Our Microbiome
    Foods That Disrupt Our Microbiome

    Eating a diet filled with animal products can disrupt our microbiome faster…

    By capernaum
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Travel
  • Data Science
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » The Future is in Your Pocket: How to Move AI to Smartphones
AIData ScienceMachine Learning

The Future is in Your Pocket: How to Move AI to Smartphones

capernaum
Last updated: 2024-11-18 10:44
capernaum
Share
The Future is in Your Pocket: How to Move AI to Smartphones
SHARE

The Future is in Your Pocket: How to Move AI to Smartphones

For years, the promise of truly intelligent, conversational AI has felt out of reach. We’ve marveled at the abilities of ChatGPT, Gemini, and other large language models (LLMs) – composing poems, writing code, translating languages – but these feats have always relied on the vast processing power of cloud GPUs. Now, a quiet revolution is brewing, aiming to bring these incredible capabilities directly to the device in your pocket: an LLM on your smartphone.

This shift isn’t just about convenience; it’s about privacy, efficiency, and unlocking a new world of personalized AI experiences. 

However, shrinking these massive LLMs to fit onto a device with limited memory and battery life presents a unique set of challenges. To understand this complex landscape, I spoke with Aleksei Naumov, Lead AI Research Engineer at Terra Quantum, a leading figure in the field of LLM compression. 

Indeed, Naumov recently published a paper on this subject which is being heralded as an extraordinary and significant innovation in neural network compression – ‘TQCompressor: Improving Tensor Decomposition Methods in Neural Networks via Permutations’ – at the IEEE International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2024), a conference where researchers, scientists, and industry professionals come together to present and discuss the latest advancements in multimedia technology.

“The main challenge is, of course, the limited main memory (DRAM) available on smartphones,” Naumov said. “Most models cannot fit into the memory of a smartphone, making it impossible to run them.”

He points to Meta’s Llama 3.2-8B model as a prime example. 

“It requires approximately 15 GB of memory,” Naumov said. “However, the iPhone 16 only has 8 GB of DRAM, and the Google Pixel 9 Pro offers 16 GB. Furthermore, to operate these models efficiently, one actually needs even more memory – around 24 GB, which is offered by devices like the NVIDIA RTX 4090 GPU, starting at $1800.”

This memory constraint isn’t just about storage; it directly impacts a phone’s battery life.

“The more memory a model requires, the faster it drains the battery,” Naumov said. “An 8-billion parameter LLM consumes about 0.8 joules per token. A fully charged iPhone, with approximately 50 kJ of energy, could only sustain this model for about two hours at a rate of 10 tokens per second, with every 64 tokens consuming around 0.2% of the battery.”

So, how do we overcome these hurdles? Naumov highlights the importance of model compression techniques.

“To address this, we need to reduce model sizes,” Naumov said. “There are two primary approaches: reducing the number of parameters or decreasing the memory each parameter requires.”

He outlines strategies like distillation, pruning, and matrix decomposition to reduce the number of parameters and quantization to decrease each parameter’s memory footprint.

“By storing model parameters in INT8 instead of FP16, we can reduce memory consumption by about 50%,” Naumov said.

While Google’s Pixel devices, with their TensorFlow-optimized TPUs, seem like an ideal platform for running LLMs, Naumov cautions that they don’t solve the fundamental problem of memory limitations.

“While the Tensor Processing Units (TPUs) used in Google Pixel devices do offer improved performance when running AI models, which can lead to faster processing speeds or lower battery consumption, they do not resolve the fundamental issue of the sheer memory requirements of modern LLMs, which typically exceed smartphone memory capacities,” Naumov said.

The drive to bring LLMs to smartphones goes beyond mere technical ambition. It’s about reimagining our relationship with AI and addressing the limitations of cloud-based solutions.

“Leading models like ChatGPT-4 have over a trillion parameters,” Naumov said. “If we imagine a future where people depend heavily on LLMs for tasks like conversational interfaces or recommendation systems, it could mean about 5% of users’ daily time is spent interacting with these models. In this scenario, running GPT-4 would require deploying roughly 100 million H100 GPUs. The computational scale alone, not accounting for communication and data transmission overheads, would be equivalent to operating around 160 companies the size of Meta. This level of energy consumption and associated carbon emissions would pose significant environmental challenges.”

The vision is clear: a future where AI is seamlessly integrated into our everyday lives, providing personalized assistance without compromising privacy or draining our phone batteries.

“I foresee that many LLM applications currently relying on cloud computing will transition to local processing on users’ devices,” Naumov said. “This shift will be driven by further model downsizing and improvements in smartphone computational resources and efficiency.”

He paints a picture of a future where the capabilities of LLMs could become as commonplace and intuitive as auto-correct is today. This transition could unlock many exciting possibilities. Thanks to local LLMs, imagine enhanced privacy where your sensitive data never leaves your device.

Picture ubiquitous AI with LLM capabilities integrated into virtually every app, from messaging and email to productivity tools. Think of the convenience of offline functionality, allowing you to access AI assistance even without an internet connection. Envision personalized experiences where LLMs learn your preferences and habits to provide truly tailored support.

For developers eager to explore this frontier, Naumov offers some practical advice.

“First, I recommend selecting a model that best fits the intended application,” Naumov said. “Hugging Face is an excellent resource for this. Look for recent models with 1-3 billion parameters, as these are the only ones currently feasible for smartphones. Additionally, try to find quantized versions of these models on Hugging Face. The AI community typically publishes quantized versions of popular models there.”

He also suggests exploring tools like llama.cpp and bitsandbytes for model quantization and inference.

The journey to bring LLMs to smartphones is still in its early stages, but the potential is undeniable. As researchers like Aleksei Naumov continue to push the boundaries of what’s possible, we’re on the cusp of a new era in mobile AI, one where our smartphones become truly intelligent companions, capable of understanding and responding to our needs in ways we’ve only begun to imagine.

Share This Article
Twitter Email Copy Link Print
Previous Article CBRE hires Hugh Macdonald as Apac head of capital advisers CBRE hires Hugh Macdonald as Apac head of capital advisers
Next Article Battle Of The Hotel Chain Apps: Which One Is Superior? Battle Of The Hotel Chain Apps: Which One Is Superior?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

Tether Unveils QVAC, a New Way to Run AI Without Cloud
AICryptocurrency

Tether Unveils QVAC, a New Way to Run AI Without Cloud

By capernaum

How to Build a Powerful and Intelligent Question-Answering System by Using Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain Framework

By capernaum
SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents
AIMachine LearningTechnology

SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

By capernaum
AWS Open-Sources Strands Agents SDK to Simplify AI Agent Development
AITechnology

AWS Open-Sources Strands Agents SDK to Simplify AI Agent Development

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?