The wrong default
Most engineers who adopt AI tooling settle on one model and use it for everything. This is understandable — onboarding to a tool takes time, and switching costs feel real. It is also wrong.
The performance difference between the right model and the wrong model for a given task is not 10 or 20 percent. For some tasks it is the difference between an output you can use and one you have to throw away. The heterogeneous stack takes an extra week to set up and saves hours every week after.
This is the model selection heuristic I have landed on after two years of production AI engineering.
Claude for code and careful reasoning
Claude — specifically Claude Sonnet and Opus — is my default for anything that involves code generation, code review, or reasoning about system behavior.
The property that distinguishes Claude for code is instruction adherence. When I say "only modify files in this directory" or "do not change the public interface," Claude follows those constraints more reliably than the alternatives. For agentic workflows where the cost of scope creep is high, this matters more than raw capability.
Claude is also the strongest model I have used for structured reasoning tasks: cost-benefit analysis of architectural decisions, evaluation of competing approaches, identifying the non-obvious failure mode in a design. The thinking is visible and followable.
Where Claude is weaker: web-based research tasks, tasks requiring real-time information, and tasks where you want to throw documents at the model rather than describe them in context.
Gemini for long-context and document-heavy work
Gemini 2.0 Pro has a context window measured in millions of tokens. This is not a footnote — it fundamentally changes which tasks are tractable.
I use Gemini when the input is large: a full codebase that needs to be analyzed for patterns, a long set of specifications that need to be reconciled, a collection of documents that need to be synthesized. With Claude, I would need to chunk the input, lose cross-document connections, and reassemble the output. With Gemini, I paste everything in.
Gemini's multimodal capabilities also make it the right choice for tasks that involve images, diagrams, or PDFs. Analyzing a system architecture diagram, extracting data from a scanned document, reviewing a set of UI mockups — Gemini handles these naturally.
The tradeoff: Gemini's instruction adherence is less consistent than Claude's. For open-ended research tasks where you want to throw material at a model and get synthesis back, this does not matter. For precise engineering tasks with specific constraints, it does.
Manus for web research and autonomous browsing
Manus is a different category. It is not a language model you prompt — it is an agent that uses a browser. The distinction matters.
The tasks I give to Manus are ones where the answer requires navigating the current web: finding the latest documentation for an API that changed three months ago, researching what competitors have built, pulling pricing data from multiple vendor pages, collecting examples of how other engineers have solved a specific problem.
For all of these, a language model with a training cutoff is the wrong tool. Manus goes to the source.
What Manus is not: a coding tool. I have seen engineers try to use it for development tasks and get mediocre results. The web agent capability is the value; the coding capability is secondary. Use it where the web is the database.
GPT-4o for speed and conversation
GPT-4o is faster than Claude Opus or Gemini 2.0 Pro and cheaper to run. For tasks where quality and speed are both acceptable tradeoffs — quick sanity checks, exploring a space of options, initial drafts — GPT-4o is often the right pick.
It is also the best model I have used for back-and-forth conversational exploration. When I am thinking out loud about a design problem and want a model to push back and ask questions, GPT-4o's conversational fluency makes the exchange feel natural in a way that the more deliberate Claude responses do not.
I use GPT-4o for: quick lookups, exploratory conversations, initial drafts I intend to refine, summarization of documents I do not need to upload to Gemini.
The selection heuristic
I work through this mentally before starting any AI-assisted task:
- Does the task require current web information? → Manus
- Is the input too large for a normal context window? → Gemini
- Does the task involve precise code generation or constrained modification? → Claude
- Is this a quick exploration where speed matters more than depth? → GPT-4o
Most tasks land in Claude or GPT-4o. The Manus and Gemini selections are more specific but pay disproportionate dividends when they are the right pick.
What this costs
Running four separate tools means four subscriptions, four interfaces, and the context-switching overhead of moving between them. For a solo engineer, this is roughly $80 to $150 per month.
I track this against the alternative: the time cost of using a suboptimal model for every task. The breakeven is not a close call.
The practical setup
I keep four browser tabs pinned. The selection heuristic runs in seconds once it is habit. The main cost is the first month of building the habit — fighting the urge to reach for the familiar model when a different one is clearly better.
After a month it is automatic.