iPhone 17 Pro streams 400‑billion‑parameter AI model. Model runs at 0.6 tokens/sec, draining 15% battery per hour and stressing the SSD

Anemll engineers ran a 400‑billion‑parameter AI model on an iPhone 17 Pro by streaming weights from the phone’s SSD straight to the GPU, bypassing the 12 GB RAM ceiling. Flash‑MoE processes active slices at 0.6 tokens per second, drawing ~15 % of the battery per hour and about 2 × 10⁹ SSD reads hourly, highlighting speed and durability limits for on‑device AI.

23 March 2026

—

News

Priya Desai

Anemll engineers ran a 400‑billion‑parameter AI model on an iPhone 17 Pro, streaming weights directly from the device's SSD at 0.6 tokens per second. The team bypassed the phone's 12 GB RAM limit by reading model data on demand. The proof of concept shows massive language models can operate on consumer smartphones without cloud connections. Performance remains too slow for real‑time chat, yet the technique opens a path toward privacy‑first AI apps that keep sensitive data on device.

What's new: Flash‑MoE, a mixture of experts architecture, activates only a tiny subset of the model's 400 billion parameters for each token. That selective approach cuts active memory demand. The SSD feeds weights to the GPU in slices, turning storage speed into the main bottleneck. Each token takes roughly two seconds to generate, a pace that rules out fluid conversations but suits offline drafting or summarization tasks.

How it works: A token represents the smallest unit of text the model processes. Conventional inference loads the entire model into RAM, which would require over 200 GB. Anemll's method reads only the active parameters from flash storage, so the phone handles a fraction of the total weight at any moment.

Trade‑offs in battery and hardware wear: Sustained inference drains the battery significantly. SSD read cycles increase substantially under continuous use, potentially shortening device longevity with heavy workloads. The phone stays cool during short sessions, yet extended generation heats the chassis. Developers will need to balance power draw against user expectations for always‑on AI features.

Market momentum: Edge AI chip shipments reached 44.2% of all AI chip volume in 2025, according to Mordor Intelligence. Meanwhile, Pew Research found that 85% of U.S. consumers expect stronger privacy protections, reinforcing demand for on‑device inference.

What it means for developers: Prototyping privacy‑first apps becomes practical. Medical note‑taking, legal document review, and personal journaling can now run entirely offline. Real‑time chat remains out of reach until storage speeds double or model architectures shrink further. The technique may also reduce cloud costs for enterprises willing to trade latency for data sovereignty.

Will storage‑streamed models enable everyday privacy‑first AI on smartphones, or will cloud‑based inference continue to dominate for speed‑sensitive tasks?

What is this about?

Feed

UK's social media ban for kids under 16: What your family needs to know

New rules target major platforms like TikTok and Instagram by a specific age limit

Aurora Fieldsabout 1 hour ago

Helion Energy has become the first company in the world to receive regulatory licenses for a fusion power facility from the Washington Department of Health.

The Washington-based startup is now on track to supply fusion power to Microsoft by 2028

Tasha Greeneabout 1 hour ago

UW launches an AI minor in 2027. Here is how to prepare your academic path

New curriculum focuses on critical literacy and ethical implications for every major

Tasha Greeneabout 2 hours ago

Bumblebees solve complex puzzles through sudden insight. See what it means for your view of nature

New research shows insects can achieve "aha moments" previously thought unique to primates

Nadia Bennettabout 5 hours ago

Google launches Gemini 3.5 Live Translate. See how it changes your international calls

The new AI model supports 70+ languages and preserves your natural voice in real-time

Tasha Greene7 days ago

Apple Wallet adds Digital ID in iOS 27. You can now leave your physical passport at home

New features let you scan loyalty cards and split bills using Apple Intelligence

Logan Price10 June 2026

Harvard releases longevity report. Here is how you can start your healthspan plan today

New guidelines offer a clear roadmap for diet, movement, and supplement safety

James Whitmore10 June 2026

Anthropic launches Fable 5 and Mythos 5: find out if your team should wait for the restricted release

Tasha Greene10 June 2026

Longevity data is often unreliable. Here's how to spot the flaws before you trust a 'Blue Zone' claim

James Whitmore10 June 2026

Life Biosciences enters human trials for age-reversal. Here is what it means for your future

Itzel Serrano10 June 2026

iPhone 17 Pro streams 400‑billion‑parameter AI model. Model runs at 0.6 tokens/sec, draining 15% battery per hour and stressing the SSD

March 23, 2026, 5:14 pm-News

Priya Desai

Will storage‑streamed models enable everyday privacy‑first AI on smartphones, or will cloud‑based inference continue to dominate for speed‑sensitive tasks?

Follow topics and authors from this story to get more personalized recommendations and email updates.

Priya Desai Tech

What is this about?

Feed

UK's social media ban for kids under 16: What your family needs to know

New rules target major platforms like TikTok and Instagram by a specific age limit

Aurora Fieldsabout 1 hour ago

Helion Energy has become the first company in the world to receive regulatory licenses for a fusion power facility from the Washington Department of Health.

The Washington-based startup is now on track to supply fusion power to Microsoft by 2028

Tasha Greeneabout 1 hour ago

iPhone 17 Pro streams 400‑billion‑parameter AI model. Model runs at 0.6 tokens/sec, draining 15% battery per hour and stressing the SSD

Feed

UK's social media ban for kids under 16: What your family needs to know

Helion Energy has become the first company in the world to receive regulatory licenses for a fusion power facility from the Washington Department of Health.

UW launches an AI minor in 2027. Here is how to prepare your academic path

Bumblebees solve complex puzzles through sudden insight. See what it means for your view of nature

Google launches Gemini 3.5 Live Translate. See how it changes your international calls

Apple Wallet adds Digital ID in iOS 27. You can now leave your physical passport at home

Harvard releases longevity report. Here is how you can start your healthspan plan today

Anthropic launches Fable 5 and Mythos 5: find out if your team should wait for the restricted release

Longevity data is often unreliable. Here's how to spot the flaws before you trust a 'Blue Zone' claim

Life Biosciences enters human trials for age-reversal. Here is what it means for your future

Feed

UK's social media ban for kids under 16: What your family needs to know

Helion Energy has become the first company in the world to receive regulatory licenses for a fusion power facility from the Washington Department of Health.

UW launches an AI minor in 2027. Here is how to prepare your academic path

Bumblebees solve complex puzzles through sudden insight. See what it means for your view of nature

Google launches Gemini 3.5 Live Translate. See how it changes your international calls

Apple Wallet adds Digital ID in iOS 27. You can now leave your physical passport at home

Harvard releases longevity report. Here is how you can start your healthspan plan today

Anthropic launches Fable 5 and Mythos 5: find out if your team should wait for the restricted release

Longevity data is often unreliable. Here's how to spot the flaws before you trust a 'Blue Zone' claim

Life Biosciences enters human trials for age-reversal. Here is what it means for your future

iPhone 17 Pro streams 400‑billion‑parameter AI model. Model runs at 0.6 tokens/sec, draining 15% battery per hour and stressing the SSD