Coordinating Multiple AI Models

I established tri-model routing on Vandoko in late February 2026. It is now standard across all my active projects. The basic idea is simple: different models are better at different things. The execution took some work to get right.

The Model Assignments

Claude Opus 4.6 handles orchestration, multi-step implementation, architecture decisions, and testing. It is the primary model for most development work. When a task requires holding a lot of context across multiple files and making judgment calls about structure, Claude is the right tool.

Codex (gpt-5.3-codex) handles backend analysis, architecture review, and debugging. It is particularly good at reading a backend codebase and identifying where something went wrong. I invoke it through the MCP integration or the Codex CLI depending on context.

Gemini 2.5 Pro handles frontend UI and UX design, shadcn registry research, and visual polish decisions. There is an important constraint here: the Gemini 2.5 Pro MCP server is broken. It returns empty content due to a thinking token bug. I always use the CLI for Gemini Pro: gemini -m gemini-2.5-pro -p "prompt". Gemini 2.5 Flash works through MCP and handles quick lookups.

How Context Stays Consistent

All three models share the same Obsidian vault. Each project has an ai-memory.md file that any model can read and write. Cross-project patterns live in a Knowledge/ directory. Attribution is tracked through a last-updated-by frontmatter field.

This means when Codex does a debugging session on the NestJS backend and documents a finding, Claude can read that finding in the next session without me summarizing it manually. The vault is the shared memory layer.

The CLI scripts in Tool Bag handle environment setup. start-session.ps1 loads the right MCP profile for the project. handoff.ps1 generates a JSON package that can be passed to any model to pick up where the last session left off.

When to Use Which

The routing rules emerged from making the wrong call and noticing the cost.

Sending a complex multi-file refactor to Gemini Flash is slow and produces worse results than Claude Opus. Sending a "what does this API response structure look like" question to Claude Opus is overkill when Gemini Flash handles it in seconds for a fraction of the cost.

Frontend UI decisions that involve Tailwind class choices, component animation timing, or visual hierarchy are better with Gemini Pro. It thinks about these problems differently than Claude and the outputs look better. Backend schema design, service layer architecture, or debugging a type error across a monorepo goes to Claude or Codex.

The hard rule: never use Gemini 2.5 Pro through MCP. Always CLI. This took one broken session to learn and I documented it immediately.

Tool Bag as Infrastructure

The routing setup lives in Tool Bag, not in any individual project. MCP profiles define which servers each project loads. The vandoko.json profile loads different servers than the martech.json profile. Each profile is a JSON file that maps server names to their configurations.

This separation keeps project repos clean and makes it easy to add a new MCP server to all projects at once by updating the profile in Tool Bag.

The benchmark work I did in March 2026 ran against the skills in Tool Bag. After fixing the P0 and P1 issues, the portfolio of 41 active skills scored 88.7 average across six dimensions. That infrastructure pays back across every project that uses it.

Coordinating Multiple AI Models

The Model Assignments

How Context Stays Consistent

When to Use Which

Tool Bag as Infrastructure

Related articles

Building The Meme Factory

Building a Workflow Engine with AI Agents

Phase-Based Development at Scale

Docker and Prisma on Windows: Lessons Learned

Strategic Use of GitHub Forks

Related articles

Building The Meme Factory
Why I turned a 41-skill AI infrastructure PRD into a meme-powered corporate satire, and what it actually documents.
Mar 10, 2026

Building a Workflow Engine with AI Agents
How I built OkieDokie's content approval system using a multi-agent pipeline with a base-agent, content-agent, research-agent, and orchestrator, plus a bead system for tracking content lineage.
Mar 9, 2026

Phase-Based Development at Scale
How I structured 26 sequential development phases on Vandoko to ship v1.0 through v1.3 without losing track of what was done, what was broken, and what came next.
Mar 9, 2026

Docker and Prisma on Windows: Lessons Learned
Six Docker and Prisma deployment fixes in one day on Vandoko, plus the Windows-specific pnpm issues that made it harder than it needed to be.
Mar 9, 2026

Strategic Use of GitHub Forks
19 forks in @mautomo, analyzed by category. Which ones became products, which serve as component catalogs, and what fork hygiene actually looks like in practice.
Mar 9, 2026