• My Feed
  • Home
  • What's Important
  • Media & Entertainment
Search

Stay Curious. Stay Wanture.

© 2026 Wanture. All rights reserved.

  • Terms of Use
  • Privacy Policy
Tech/Gadgets
iPhone 17 Pro streams 400‑billion‑parameter AI model

23 March 2026

—

News

Priya Desai

Anemll engineers ran a 400‑billion‑parameter AI model on an iPhone 17 Pro, streaming weights directly from the device's SSD at 0.6 tokens per second. The team bypassed the phone's 12 GB RAM limit by reading model data on demand. The proof of concept shows massive language models can operate on consumer smartphones without cloud connections. Performance remains too slow for real‑time chat, yet the technique opens a path toward privacy‑first AI apps that keep sensitive data on device.

What's new: Flash‑MoE, a mixture of experts architecture, activates only a tiny subset of the model's 400 billion parameters for each token. That selective approach cuts active memory demand. The SSD feeds weights to the GPU in slices, turning storage speed into the main bottleneck. Each token takes roughly two seconds to generate, a pace that rules out fluid conversations but suits offline drafting or summarization tasks.

How it works: A token represents the smallest unit of text the model processes. Conventional inference loads the entire model into RAM, which would require over 200 GB. Anemll's method reads only the active parameters from flash storage, so the phone handles a fraction of the total weight at any moment.

Trade‑offs in battery and hardware wear: Sustained inference drains the battery significantly. SSD read cycles increase substantially under continuous use, potentially shortening device longevity with heavy workloads. The phone stays cool during short sessions, yet extended generation heats the chassis. Developers will need to balance power draw against user expectations for always‑on AI features.

Market momentum: Edge AI chip shipments reached 44.2% of all AI chip volume in 2025, according to Mordor Intelligence. Meanwhile, Pew Research found that 85% of U.S. consumers expect stronger privacy protections, reinforcing demand for on‑device inference.

What it means for developers: Prototyping privacy‑first apps becomes practical. Medical note‑taking, legal document review, and personal journaling can now run entirely offline. Real‑time chat remains out of reach until storage speeds double or model architectures shrink further. The technique may also reduce cloud costs for enterprises willing to trade latency for data sovereignty.

Will storage‑streamed models enable everyday privacy‑first AI on smartphones, or will cloud‑based inference continue to dominate for speed‑sensitive tasks?

Mobile Bottom Test Banner

What is this about?

  • News/
  • Priya Desai/
  • Tech/
  • Gadgets/
  • iPhone storage value/
  • Apple ecosystem impact/
  • AI infrastructure scaling/
  • Flash-MoE inference/
  • Privacy-first on-device AI/
  • On-device AI battery impact

Feed

    Xbox’s Next Console Chooses Off‑Shelf AMD RDNA 2 GPU

    Xbox’s Next Console Chooses Off‑Shelf AMD RDNA 2 GPU

    AMD GPU brings FidelityFX Super Resolution, echoing PlayStation 5’s RDNA 2

    Priya Desaiabout 8 hours ago
    Foldable iPhone Keeps 4.4 mm Body, Camera Control Button

    Foldable iPhone Keeps 4.4 mm Body, Camera Control Button

    Launching this year, the foldable iPhone is slimmer than the iPhone Air and is eSIM‑only.

    Carter Brooksabout 8 hours ago
    2024 C400 4Matic EV Packs 483 hp, 420‑mile Range

    2024 C400 4Matic EV Packs 483 hp, 420‑mile Range

    U.S. launch offers 3.9‑second 0‑62 mph sprint, $55,000 MSRP and $7,500 federal EV credit

    Ethan Whitakerabout 8 hours ago
    OnePlus drops Ace 6 Ultra gaming phone on April 28

    OnePlus drops Ace 6 Ultra gaming phone on April 28

    Clip‑on case adds buttons, fan, and pass‑through charging for console‑level play

    Carter Brooks1 day ago
    iPhone Express Transit Flaw Lets Thieves Steal $10,000

    iPhone Express Transit Flaw Lets Thieves Steal $10,000

    Locked iPhone Visa cards can be tapped for fraud; Apple and Visa have no fix

    Carter Brooks1 day ago
    China's Dola‑Seed‑2.0 Cuts Gap to 2.7% vs. Claude Opus 4.6

    China's Dola‑Seed‑2.0 Cuts Gap to 2.7% vs. Claude Opus 4.6

    Rachel Stein1 day ago
    Apple delays refreshed Mac Studio launch to October

    Apple delays refreshed Mac Studio launch to October

    Supply shortages push MacBook Pro to late‑2026/early‑2027, Q4 earnings at risk

    Carter Brooks1 day ago
    Apple delays M6‑Pro/M6‑Max OLED MacBook Pro to 2027

    Apple delays M6‑Pro/M6‑Max OLED MacBook Pro to 2027

    Shortages delay M6‑Pro and M6‑Max to early 2027, pushing back the new Mac

    Carter Brooks1 day ago
    Apple unveils glassy Siri on Dynamic Island with iOS 27

    Apple unveils glassy Siri on Dynamic Island with iOS 27

    WWDC 2026: Siri expands Island on iPhone 14 Pro and up, arriving with iOS 27

    Carter Brooks1 day ago
    Honor Robot Shatters Half‑Marathon Record in 50:26

    Honor Robot Shatters Half‑Marathon Record in 50:26

    Robot Beats Human Benchmark by 7 Minutes, With 40% Fully Autonomous

    Marcus Dillard2 days ago
    Loading...
Tech/Gadgets

iPhone 17 Pro streams 400‑billion‑parameter AI model

23 March 2026

—

News

Priya Desai

Anemll engineers ran a 400‑billion‑parameter AI model on an iPhone 17 Pro, streaming weights directly from the device's SSD at 0.6 tokens per second. The team bypassed the phone's 12 GB RAM limit by reading model data on demand. The proof of concept shows massive language models can operate on consumer smartphones without cloud connections. Performance remains too slow for real‑time chat, yet the technique opens a path toward privacy‑first AI apps that keep sensitive data on device.

What's new: Flash‑MoE, a mixture of experts architecture, activates only a tiny subset of the model's 400 billion parameters for each token. That selective approach cuts active memory demand. The SSD feeds weights to the GPU in slices, turning storage speed into the main bottleneck. Each token takes roughly two seconds to generate, a pace that rules out fluid conversations but suits offline drafting or summarization tasks.

How it works: A token represents the smallest unit of text the model processes. Conventional inference loads the entire model into RAM, which would require over 200 GB. Anemll's method reads only the active parameters from flash storage, so the phone handles a fraction of the total weight at any moment.

Trade‑offs in battery and hardware wear: Sustained inference drains the battery significantly. SSD read cycles increase substantially under continuous use, potentially shortening device longevity with heavy workloads. The phone stays cool during short sessions, yet extended generation heats the chassis. Developers will need to balance power draw against user expectations for always‑on AI features.

Market momentum: Edge AI chip shipments reached 44.2% of all AI chip volume in 2025, according to Mordor Intelligence. Meanwhile, Pew Research found that 85% of U.S. consumers expect stronger privacy protections, reinforcing demand for on‑device inference.

What it means for developers: Prototyping privacy‑first apps becomes practical. Medical note‑taking, legal document review, and personal journaling can now run entirely offline. Real‑time chat remains out of reach until storage speeds double or model architectures shrink further. The technique may also reduce cloud costs for enterprises willing to trade latency for data sovereignty.

Will storage‑streamed models enable everyday privacy‑first AI on smartphones, or will cloud‑based inference continue to dominate for speed‑sensitive tasks?

What is this about?

  • News/
  • Priya Desai/
  • Tech/
  • Gadgets/
  • iPhone storage value/
  • Apple ecosystem impact/
  • AI infrastructure scaling/
  • Flash-MoE inference/
  • Privacy-first on-device AI/
  • On-device AI battery impact

Feed

    Xbox’s Next Console Chooses Off‑Shelf AMD RDNA 2 GPU

    Xbox’s Next Console Chooses Off‑Shelf AMD RDNA 2 GPU

    AMD GPU brings FidelityFX Super Resolution, echoing PlayStation 5’s RDNA 2

    Priya Desaiabout 8 hours ago
    Foldable iPhone Keeps 4.4 mm Body, Camera Control Button

    Foldable iPhone Keeps 4.4 mm Body, Camera Control Button

    Launching this year, the foldable iPhone is slimmer than the iPhone Air and is eSIM‑only.

    Carter Brooksabout 8 hours ago
    2024 C400 4Matic EV Packs 483 hp, 420‑mile Range

    2024 C400 4Matic EV Packs 483 hp, 420‑mile Range

    U.S. launch offers 3.9‑second 0‑62 mph sprint, $55,000 MSRP and $7,500 federal EV credit

    Ethan Whitakerabout 8 hours ago
    OnePlus drops Ace 6 Ultra gaming phone on April 28

    OnePlus drops Ace 6 Ultra gaming phone on April 28

    Clip‑on case adds buttons, fan, and pass‑through charging for console‑level play

    Carter Brooks1 day ago
    iPhone Express Transit Flaw Lets Thieves Steal $10,000

    iPhone Express Transit Flaw Lets Thieves Steal $10,000

    Locked iPhone Visa cards can be tapped for fraud; Apple and Visa have no fix

    Carter Brooks1 day ago
    China's Dola‑Seed‑2.0 Cuts Gap to 2.7% vs. Claude Opus 4.6

    China's Dola‑Seed‑2.0 Cuts Gap to 2.7% vs. Claude Opus 4.6

    Rachel Stein1 day ago
    Apple delays refreshed Mac Studio launch to October

    Apple delays refreshed Mac Studio launch to October

    Supply shortages push MacBook Pro to late‑2026/early‑2027, Q4 earnings at risk

    Carter Brooks1 day ago
    Apple delays M6‑Pro/M6‑Max OLED MacBook Pro to 2027

    Apple delays M6‑Pro/M6‑Max OLED MacBook Pro to 2027

    Shortages delay M6‑Pro and M6‑Max to early 2027, pushing back the new Mac

    Carter Brooks1 day ago
    Apple unveils glassy Siri on Dynamic Island with iOS 27

    Apple unveils glassy Siri on Dynamic Island with iOS 27

    WWDC 2026: Siri expands Island on iPhone 14 Pro and up, arriving with iOS 27

    Carter Brooks1 day ago
    Honor Robot Shatters Half‑Marathon Record in 50:26

    Honor Robot Shatters Half‑Marathon Record in 50:26

    Robot Beats Human Benchmark by 7 Minutes, With 40% Fully Autonomous

    Marcus Dillard2 days ago
    Loading...