Google unveils Gemma 3n: offline multimodal AI model running on phones with just 2GB RAM

Google unveils Gemma 3n: offline multimodal AI model running on phones with just 2GB RAM | cliQ Latest Newspoint | 27/06/2025 20:39:14

In a significant step towards making advanced artificial intelligence more accessible, Google has launched its on-device AI model, Gemma 3n, capable of performing complex multimodal tasks directly on smartphones without the need for an internet connection. The model, announced during May 2025, is designed to operate on devices with limited memory, using as little as 2GB of RAM while supporting audio, image, video, and text processing, enabling powerful AI features on phones and low-power edge devices.

Efficient AI Performance on Low-Power Devices

Gemma 3n is powered by a new architecture known as MatFormer (Matryoshka Transformer), which contains smaller, fully-functional sub-models within larger ones, allowing developers to scale the model’s performance based on hardware capabilities. Available in two versions, E2B (operating on 2GB memory) and E4B (requiring 3GB), the models house 5 to 8 billion raw parameters while functioning efficiently, similar to smaller models in terms of resource use.

This efficiency is achieved using innovations like Per-Layer Embeddings (PLE), which shift computational loads from the device’s graphics processor to its central processor, freeing up memory for other tasks. Additionally, Gemma 3n introduces KV Cache Sharing, which accelerates the processing of long audio and video inputs, improving real-time response times by up to two times for use cases such as voice assistants and live video analysis on mobile devices.

Advanced Speech and Vision Capabilities

For speech-based functionalities, Gemma 3n integrates an audio encoder adapted from Google’s Universal Speech Model, enabling on-device speech-to-text conversions and language translations without relying on cloud processing. The model shows robust performance in translating between English and European languages, including Spanish, French, Italian, and Portuguese.

The vision capabilities of Gemma 3n are powered by MobileNet-V5, Google’s new lightweight vision encoder that can process video streams at up to 60 frames per second on devices like Google Pixel, allowing for smooth, real-time video analysis while maintaining high accuracy and speed.

Expanding Developer Access and Impact

Gemma 3n is accessible to developers through popular AI frameworks such as Hugging Face Transformers, Ollama, MLX, and llama.cpp, making integration seamless for creating applications that require offline AI capabilities. In conjunction with the launch, Google has introduced the “Gemma 3n Impact Challenge,” encouraging developers to build innovative offline applications using the model, with a $150,000 prize pool for winning entries.

The model supports over 140 languages with the ability to understand content in 35 languages while operating entirely offline. This feature makes Gemma 3n particularly useful for applications in remote areas lacking internet connectivity or in privacy-focused scenarios where cloud-based AI solutions are not viable. By enabling advanced AI capabilities on devices with minimal memory requirements, Google’s Gemma 3n sets a new benchmark in efficient, scalable on-device AI, expanding the possibilities for developers and users seeking powerful offline AI experiences.

The post Google unveils Gemma 3n: offline multimodal AI model running on phones with just 2GB RAM | cliQ Latest appeared first on CliQ INDIA.