I Reverse-Engineered Claude Code and Built a Virtual CTO

I spent a weekend pulling Claude Code apart to see what was actually inside. The interesting bits were not the agent abstractions everyone talks about. The interesting bits were four primitives that compose into a full SDLC pipeline if you wire them together right.

This is a write-up of what I found, why the existing toolkits do not cover the lifecycle, and what I built from the parts.

What we found inside Claude Code

Lifecycle hooks. Three entry points: SessionStart, PreCompact, and SubagentStart. Each executes a Bash command. That is enough to load project context on every session, write a handoff file before context compaction kills your state, and inject context into every subagent without each agent having to re-discover the project.

The agent system. Agents are markdown files with YAML frontmatter — model, tools, max steps, skills. They inherit the parent process's filesystem access. Context arrives via hooks. They are microservices with markdown contracts, not classes with constructors.

Skills as composition. Skills attach to agents through the frontmatter. The senior-dev agent references test-driven-development and inherits the TDD discipline automatically. Methodology becomes reusable across agents. You can write a skill once and let four agents adopt it without copying prompts.

Model stratification. Different agents use different models:

Tier	Use case	Frequency
Opus	Architecture, ADRs	Expensive, rare
Sonnet	Implementation	Sweet spot, default
Haiku	Routine work, log triage	Cheap, frequent

This reflects an architectural decision about thinking versus executing. Most code review can run on Haiku. Architecture cannot.

Why gstack does not work as an SDLC

gstack gives you useful tools — /qa for browser testing, /ship for versioned PRs, /security for audits. They are well made.

What gstack does not give you is orchestration. You manually call /qa, then /review, then /ship, in the right order, remembering that security needs to run before deploy. The tool does not know what feature you are shipping or which gates apply.

The gaps:

No state transfer between skills (the QA report does not flow into the security review)
No project-type awareness (a payment gateway has different validation rules than a REST API)
No enforcement gates that prevent premature deployment
No staging layer

gstack is excellent for single-developer interactive sessions. It does not cover the full development lifecycle, and it does not pretend to.

What great_cto solves

The whole thing is built around one question:

Can you describe a feature in a single sentence, approve it twice, and get it to production?

If the answer is yes, the workflow worked. If the answer is no, something has to change.

Five problems and the file each one writes:

Problem 1 — Manual step selection causes context loss between commands. Solution — Intent mapping. Type /start "build user auth", the full pipeline launches. The user does not pick the steps.

Problem 2 — 44+ project types each need different validation rules. Solution — LLM-scored detection plus an archetype table. The right rules activate automatically. A commerce archetype merges in the regulated strict thresholds without the user knowing the names.

Problem 3 — Security review gets skipped under deadline pressure. Solution — Two mandatory gates (architecture, ship) enforced based on project type. You cannot deploy a commerce archetype without passing gate:security.

Problem 4 — Deployment failures require manual rollback and root-cause analysis. Solution — Staging validates critical paths first. Production failures trigger automatic rollback and spawn an l3-support agent that opens an incident.

Problem 5 — Context disappears after /compact or session restart. Solution — File-based memory. PROJECT.md, architecture docs, decision records, QA reports — all in markdown, all in git. Session restart shows a three-line summary on its own.

Seven agents, seven roles

tech-lead           Opus     Architecture, ADRs, task breakdown
senior-dev          Sonnet   Feature branch, TDD, PR
qa-engineer         Haiku    Test plans, regression checks
security-officer    Sonnet   OWASP, secrets, dependency audit
devops              Haiku    Staging, validation, rollback
l3-support          Sonnet   Incidents, postmortems
project-auditor     Sonnet   Drift detection, gap analysis

A few details worth knowing:

tech-lead writes architecture docs with alternatives considered, breaks work into tasks with explicit file ownership. Six months later you can git log docs/decisions/ and reconstruct why we did it.

senior-dev creates a feature branch, writes failing tests (RED), implements (GREEN), refactors (REFACTOR), creates a PR with a conventional commit. The TDD loop is enforced because the agent reads the test-driven-development skill.

qa-engineer reads the actual code first instead of working from a spec sheet. Identifies critical paths. Compares performance against a baseline. Regressions of 15%+ become P1 bugs automatically.

security-officer scans for secrets, audits dependencies for CVEs, applies the right ruleset based on archetype — OWASP for web-service, PCI-DSS for commerce, HIPAA for health, SOC2 for regulated.

devops captures performance baselines, writes user-facing changelogs, handles rollbacks, spawns support agents on failure. Knows the difference between staging smoke tests and production validation.

l3-support reproduces bugs, traces code, analyzes recent changes, drafts a hypothesis, writes postmortems with timeline and prevention steps. Output gets persisted to docs/postmortems/ so future-you can grep for the same root cause.

project-auditor identifies the tech stack, finds coverage gaps, detects project drift. Runs on startup and on demand.

Three principles

File memory instead of framework state. All context lives in markdown files accessible via standard commands. No "did the agent persist state correctly?" debugging. If the file is there, the state is there.

Unix tools instead of abstractions. Agents use git, grep, awk, find. No proprietary magic. If the agent disappears tomorrow, the files you can still read with cat.

Seven agents as a hard ceiling. More agents create more failure points and more inter-agent confusion. For specialists, a 419-agent catalog (template-bridge) provides on-demand additions. The core stays at seven.

What it looks like in practice

/start "REST API for a TODO app on Node.js + SQLite"

What happens automatically:

Project type detected: rest-api (confidence 9/10)

tech-lead         → architecture docs + 6 tasks broken down
senior-dev        → 425 lines of code, 14 tests, all passing
qa-engineer       → 5 critical paths checked, 2 P2 bugs found
security-officer  → parameterized queries confirmed, 2 P2 issues
devops            → staging commands generated, release notes drafted

DECISION 1: approve architecture? [yes/no]
[approved]

DECISION 2: ship to production? [yes/no]
[approved]

Deployed. Beads tasks closed. brain.md updated.

Two approvals. Zero context switches.

Why I wrote this

Most "agent toolkit" projects on GitHub are either thin wrappers over a single LLM call or massive frameworks that try to do everything. The middle is empty. I wanted the middle.

great_cto is what fills it for me. Maybe it fills the same gap for you.

Repo: github.com/avelikiy/great_cto — MIT, 15 files, ~1600 lines (at time of writing).

Underlying platform: Claude Code.

Written by Alexander Velikiy. AI-First CTO. Builds and scales fintech and AI systems from scratch — combining deep engineering, product thinking, and AI-driven execution.

I Reverse-Engineered Claude Code and Built a Virtual CTO From the Parts

What we found inside Claude Code

Why gstack does not work as an SDLC

What great_cto solves

Seven agents, seven roles

Three principles

What it looks like in practice

Why I wrote this

Comments

More from this blog

I Built a Self-Improving SDLC Pipeline for Claude Code After a 4-Hour Grafana Bug

I stopped asking my team 'how are we doing?'. Now I open one screen and see everything.

What Separates a Great CTO from an Ordinary One — and Why I Got Tired of Doing It By Hand

Command Palette

What we found inside Claude Code

Why gstack does not work as an SDLC

What great_cto solves

Seven agents, seven roles

Three principles

What it looks like in practice

Why I wrote this

Comments

More from this blog