#AI 02.21.2026 — 3 MIN READ

Small Models, Big Impact: Why the Future Belongs to Specialized AI Agents

I've been working with LLMs and AI agents for three years now. During that time I've tried a lot — from massive cloud models to small local setups. My assessment after this time: I consider the large generalist models more of a transitional technology. From my perspective, the productive future belongs to small, specialized models.

Why I'm skeptical about large generalists long-term

Current flagship models can do a bit of everything. Code, poetry, medicine, law — all in one model. That's impressive for demos and prototypes. But for sustained productive operation, I think it's the wrong setup.

Concretely: An agent that processes invoices doesn't need to know about Shakespeare. A customer service bot doesn't need differential equations. But it pays for that — in latency, in token costs, in unnecessary noise in the output.

I see large models more as research instruments. They show what's possible. For real operations, I believe you need something different.

The Heise article is right — and still falls short

Harald Weiss wrote an article about Agentic AI on heise.de, critically examining business expectations. His points: Non-deterministic behavior makes AI agents unpredictable, hidden costs are underestimated, only 10% of companies see real ROI.

The diagnosis is correct on many points. But from my perspective, the conclusion falls short.

Abstract depiction: Human and machine, both non-deterministic — NON-DETERMINISM — APPLIES TO BOTH SIDES

Humans are just as non-deterministic

The main argument against AI agents is: They behave non-deterministically. Same input, different output. Errors happen.

But the same applies to humans. If I assign a task to an employee, I never get exactly the same result as last time. People interpret differently, have good and bad days, make careless mistakes. And yet companies run — because their processes are designed for it.

What I find difficult to accept: Demanding perfection from AI while applying different standards to humans. Either non-determinism is a problem — then it's a problem for humans too. Or you build processes that can handle it. And that's where it gets interesting.

Task scopes that are thought too large

In many AI agent projects that don't run as hoped, I see a pattern: The agent is supposed to do too much at once. Some envision an agent replacing the entire clerk — with browser navigation, email, research, decision-making. All in one.

In my experience, that's rarely the best approach. The task scopes often become too large for a single agent.

Divide and conquer: Large task broken into small fragments — DIVIDE AND CONQUER — CUT TASKS DOWN TO SIZE

Divide and conquer with agentic teams

What has worked better for me: Agentic teams. Each agent has its own task, its own briefing, its own context. No agent needs to do everything.

What's interesting about this: That's exactly how you reduce non-deterministic behavior. When an agent only has a small, clearly defined scope, the room for errors automatically shrinks. The decision space is limited. The impact of a wrong decision is limited.

And the larger, more expensive models? They still have their place — as orchestrators. They can coordinate the smaller specialist agents, assign the right tasks, evaluate results. But they don't have to do the detail work themselves.

The right environment for agents

A point that's often overlooked in my view: You need to create the right environment for agents. Some imagine an AI agent simply navigating existing interfaces via browser — just like a human. That can work in certain scenarios, but I don't consider it the strongest approach.

I prefer API-based environments where agents can operate safely and efficiently. Clear interfaces, defined inputs and outputs, secured decision spaces. That takes effort. But the return from automation can be considerably higher than the cost of human labor for repetitive tasks — where errors happen just as well.

On-premise, fast, cheap

Small specialized models have another practical advantage: They can run locally. No cloud dependency, no data privacy debates, no ongoing API costs. A 7B model fine-tuned for a specific task responds in milliseconds and costs virtually nothing to operate.

The setup that has proven itself for me: Specialized small models for the bulk of tasks, a large model as orchestrator, clear interfaces between agents. Everyone does what they do best.

Conclusion

When people say Agentic AI doesn't work, they're often right — regarding the approach they're observing. Turning a large generalist loose on a complex set of tasks and hoping it sorts things out rarely works reliably in practice.

But that's not the only approach. Specialized agents in small, secured domains, coordinated by an orchestrator, with API-based interfaces instead of browser navigation — that has worked for me. And I believe it gets better and cheaper as models get smaller and more specialized.

My direction: Not one large generalist, but a team of many small specialists.

How I Made an AI Model Faster on My MacBook — And What Went Wrong Pokémon With Arguments — How BattleTalk Was Built A Brief Time Travel Into the Age of AI