I Reverse-Engineered Claude Code and Built a Virtual CTO From the Parts
Lifecycle hooks, model stratification, file-based memory, and seven agents that ship features with two approvals. ~1600 lines, MIT, no SaaS.

I spent a weekend pulling Claude Code apart to see what was actually inside. The interesting bits were not the agent abstractions everyone talks about. The interesting bits were four primitives that compose into a full SDLC pipeline if you wire them together right.
This is a write-up of what I found, why the existing toolkits do not cover the lifecycle, and what I built from the parts.
What we found inside Claude Code
Lifecycle hooks. Three entry points: SessionStart, PreCompact, and SubagentStart. Each executes a Bash command. That is enough to load project context on every session, write a handoff file before context compaction kills your state, and inject context into every subagent without each agent having to re-discover the project.
The agent system. Agents are markdown files with YAML frontmatter — model, tools, max steps, skills. They inherit the parent process's filesystem access. Context arrives via hooks. They are microservices with markdown contracts, not classes with constructors.
Skills as composition. Skills attach to agents through the frontmatter. The senior-dev agent references test-driven-development and inherits the TDD discipline automatically. Methodology becomes reusable across agents. You can write a skill once and let four agents adopt it without copying prompts.
Model stratification. Different agents use different models:
| Tier | Use case | Frequency |
|---|---|---|
| Opus | Architecture, ADRs | Expensive, rare |
| Sonnet | Implementation | Sweet spot, default |
| Haiku | Routine work, log triage | Cheap, frequent |
This reflects an architectural decision about thinking versus executing. Most code review can run on Haiku. Architecture cannot.
Why gstack does not work as an SDLC
gstack gives you useful tools — /qa for browser testing, /ship for versioned PRs, /security for audits. They are well made.
What gstack does not give you is orchestration. You manually call /qa, then /review, then /ship, in the right order, remembering that security needs to run before deploy. The tool does not know what feature you are shipping or which gates apply.
The gaps:
- No state transfer between skills (the QA report does not flow into the security review)
- No project-type awareness (a payment gateway has different validation rules than a REST API)
- No enforcement gates that prevent premature deployment
- No staging layer
gstack is excellent for single-developer interactive sessions. It does not cover the full development lifecycle, and it does not pretend to.
What great_cto solves
The whole thing is built around one question:
Can you describe a feature in a single sentence, approve it twice, and get it to production?
If the answer is yes, the workflow worked. If the answer is no, something has to change.
Five problems and the file each one writes:
Problem 1 — Manual step selection causes context loss between commands.
Solution — Intent mapping. Type /start "build user auth", the full pipeline launches. The user does not pick the steps.
Problem 2 — 44+ project types each need different validation rules.
Solution — LLM-scored detection plus an archetype table. The right rules activate automatically. A commerce archetype merges in the regulated strict thresholds without the user knowing the names.
Problem 3 — Security review gets skipped under deadline pressure.
Solution — Two mandatory gates (architecture, ship) enforced based on project type. You cannot deploy a commerce archetype without passing gate:security.
Problem 4 — Deployment failures require manual rollback and root-cause analysis.
Solution — Staging validates critical paths first. Production failures trigger automatic rollback and spawn an l3-support agent that opens an incident.
Problem 5 — Context disappears after /compact or session restart.
Solution — File-based memory. PROJECT.md, architecture docs, decision records, QA reports — all in markdown, all in git. Session restart shows a three-line summary on its own.
Seven agents, seven roles
tech-lead Opus Architecture, ADRs, task breakdown
senior-dev Sonnet Feature branch, TDD, PR
qa-engineer Haiku Test plans, regression checks
security-officer Sonnet OWASP, secrets, dependency audit
devops Haiku Staging, validation, rollback
l3-support Sonnet Incidents, postmortems
project-auditor Sonnet Drift detection, gap analysis
A few details worth knowing:
tech-lead writes architecture docs with alternatives considered, breaks work into tasks with explicit file ownership. Six months later you can git log docs/decisions/ and reconstruct why we did it.
senior-dev creates a feature branch, writes failing tests (RED), implements (GREEN), refactors (REFACTOR), creates a PR with a conventional commit. The TDD loop is enforced because the agent reads the test-driven-development skill.
qa-engineer reads the actual code first instead of working from a spec sheet. Identifies critical paths. Compares performance against a baseline. Regressions of 15%+ become P1 bugs automatically.
security-officer scans for secrets, audits dependencies for CVEs, applies the right ruleset based on archetype — OWASP for web-service, PCI-DSS for commerce, HIPAA for health, SOC2 for regulated.
devops captures performance baselines, writes user-facing changelogs, handles rollbacks, spawns support agents on failure. Knows the difference between staging smoke tests and production validation.
l3-support reproduces bugs, traces code, analyzes recent changes, drafts a hypothesis, writes postmortems with timeline and prevention steps. Output gets persisted to docs/postmortems/ so future-you can grep for the same root cause.
project-auditor identifies the tech stack, finds coverage gaps, detects project drift. Runs on startup and on demand.
Three principles
File memory instead of framework state. All context lives in markdown files accessible via standard commands. No "did the agent persist state correctly?" debugging. If the file is there, the state is there.
Unix tools instead of abstractions. Agents use git, grep, awk, find. No proprietary magic. If the agent disappears tomorrow, the files you can still read with cat.
Seven agents as a hard ceiling. More agents create more failure points and more inter-agent confusion. For specialists, a 419-agent catalog (template-bridge) provides on-demand additions. The core stays at seven.
What it looks like in practice
/start "REST API for a TODO app on Node.js + SQLite"
What happens automatically:
Project type detected: rest-api (confidence 9/10)
tech-lead → architecture docs + 6 tasks broken down
senior-dev → 425 lines of code, 14 tests, all passing
qa-engineer → 5 critical paths checked, 2 P2 bugs found
security-officer → parameterized queries confirmed, 2 P2 issues
devops → staging commands generated, release notes drafted
DECISION 1: approve architecture? [yes/no]
[approved]
DECISION 2: ship to production? [yes/no]
[approved]
Deployed. Beads tasks closed. brain.md updated.
Two approvals. Zero context switches.
Why I wrote this
Most "agent toolkit" projects on GitHub are either thin wrappers over a single LLM call or massive frameworks that try to do everything. The middle is empty. I wanted the middle.
great_cto is what fills it for me. Maybe it fills the same gap for you.
Repo: github.com/avelikiy/great_cto — MIT, 15 files, ~1600 lines (at time of writing).
Underlying platform: Claude Code.
Written by Alexander Velikiy. AI-First CTO. Builds and scales fintech and AI systems from scratch — combining deep engineering, product thinking, and AI-driven execution.
