AI AgentsJune 28, 20263 min read

Beyond Hand-Coded Pipelines: Sakana AI’s Fugu Ultra Shifts Multi-Agent Orchestration into the Model Layer

Tokyo-based Sakana AI has released Fugu and Fugu Ultra, the world's first "orchestration models" that act as a learned Conductor. By collapsing traditional multi-agent frameworks into a single, recursively calling API, Fugu achieves state-of-the-art reasoning and coding performance without manual workflow coding.

Key takeaways

• Tokyo-based Sakana AI has released Fugu and Fugu Ultra, the world's first "orchestration models" that act as a learned Conductor
• By collapsing traditional multi-agent frameworks into a single, recursively calling API, Fugu achieves state-of-the-art reasoning and coding performance without manual workflow coding

Beyond Hand-Coded Pipelines: Sakana AI’s Fugu Ultra Shifts Multi-Agent Orchestration into the Model Layer

The AI agent landscape in mid-2026 is undergoing a quiet but rapid shift. For the past two years, building robust multi-agent systems meant writing hundreds of lines of fragile boilerplate code. Frameworks like LangGraph, CrewAI, and Microsoft's newly released Agent Framework 1.0 required developers to meticulously map out directed acyclic graphs (DAGs), hardcode agent backstories, and manually handle data flow.

All of that changed on June 22, 2026. Tokyo-based Sakana AI—co-founded by Llion Jones (co-author of the landmark "Attention Is All You Need" transformer paper)—officially launched Sakana Fugu and Fugu Ultra. This is a paradigm-shifting family of "orchestration models" that moves multi-agent coordination from external, hand-coded software frameworks directly into the foundation model itself.

The Model as the Conductor

Instead of acting as a standalone generator, the Fugu model functions as a learned neural "Conductor". When a user submits a complex task, Fugu doesn't just output a response; it dynamically evaluates the problem and coordinates a pool of publicly accessible, high-performing specialist models.

Under the hood, Fugu relies on two major architectural breakthroughs published at ICLR 2026: TRINITY (a system that assigns dynamic "Thinker, Worker, and Verifier" roles to agents) and a learned orchestration loop. Fugu autonomously decides which expert models to call, how to decompose the task, and how to verify the outputs. Most shockingly, Fugu can recursively call instances of itself to check its work, continuing the loop until it achieves a high-confidence consensus.

An informative 3D flow diagram comparing tradition...

Performance That Challenges the Monoliths

By aggregating and optimizing a collective ecosystem of frontier models, Fugu Ultra—the performance-tuned variant of the lineup—achieves results that rival massive, single-vendor monoliths.

Because Fugu is built on collaborative orchestration, it delivers top-tier cognitive capabilities without relying on single-vendor APIs or being subject to restrictive export controls. In benchmarking data, Fugu Ultra showcased spectacular results across several complex reasoning tracks:

SWE-Bench Pro: Fugu Ultra scored 73.7% (run under a mini-swe-agent scaffold), outperforming Claude Opus 4.8 (69.2%) and GPT-5.5 (58.6%).
Humanity's Last Exam: This expert-level multidisciplinary benchmark saw Fugu Ultra pull off a 50.0% success rate, outpacing Claude Opus 4.8 (49.8%) and leaving GPT-5.5 (41.4%) far behind.

The Trade-offs: Latency and "Black Box" Routing

Of course, shifting orchestration into a model-level abstraction isn't without its costs:

Latency: While the standard Fugu model is optimized for lower-latency interactive tasks (clocking in around 4 seconds), Fugu Ultra is designed purely for quality. It trades speed for correctness, with complex multi-step reasoning tasks regularly taking anywhere from 8 to 160 seconds to complete.
Hidden Orchestration Overhead: Because Fugu generates hidden "orchestration tokens" as it coordinates background agents, token costs can accumulate rapidly behind the scenes, though Sakana has committed to exposing these token metrics transparently.
The "Black Box" Problem: Developers lose the fine-grained, line-by-line control afforded by graph frameworks like LangGraph. You are trustfully delegating the routing logic to a neural net.

Ultimately, Sakana Fugu proves that the future of AI is not just about training bigger monolithic models. By treating multi-agent orchestration as a learned, unified model-level API, Fugu is paving a faster, more flexible route to high-tier intelligence.

Grounded sources & citations

Enjoyed this? Get the next one

Subscribe to the newsletter and the next playbook lands in your inbox — no spam, unsubscribe anytime.

Beyond Hand-Coded Pipelines: Sakana AI’s Fugu Ultra Shifts Multi-Agent Orchestration into the Model Layer

Key takeaways

Beyond Hand-Coded Pipelines: Sakana AI’s Fugu Ultra Shifts Multi-Agent Orchestration into the Model Layer

The Model as the Conductor

Performance That Challenges the Monoliths

The Trade-offs: Latency and "Black Box" Routing

Tags

Grounded sources & citations

What to read next

The ZK Soundness Crisis: Inside Trail of Bits’ Hack on Google's Quantum Proof

Hardware as a Smart Contract: Inside IoTeX’s "Yap" Upgrade and the Death of Verbose DePIN Telemetry

Inside the $32M Humanity Protocol "Hack": Real Exploit or Staged Insider "Crime Pump"?

Enjoyed this? Get the next one