Evolution: ship cycles, semver, and deprecation
Skills evolve. The audit-driven hardening cycle that produced karpathy-wiki v2.2 is the same cycle this repo expects authors to use for their own skills. This page covers the cycle, how to version a skill, how to deprecate a public surface without breaking cached users, and the license-of-skills nuance.The audit cycle
Source:LESSONS 3 (the brainstorming-spec-plan-execute pipeline as a gestalt); LESSONS 7 (recommendations).
The cycle has five phases:
- Audit. Read every line of the SKILL.md, every script, every test fixture. Produce a numbered findings list with severity (BLOCKER / HIGH / MEDIUM / LOW) and an explicit “things working well” + “things deliberately not covered” section. Defer 30-50 percent of findings to a future audit; do not try to fix everything in one ship.
- Spec. Brainstorm the architectural shape. Look for findings that collapse into one fix (the v2.2 architectural cut to delete
sources/collapsed three audit findings into one cut). The spec doc names the architectural decisions and the per-finding plan. - Plan. Convert the spec into bite-sized implementation tasks. Each task has a verbatim test body, BEFORE / AFTER blocks, a commit message. The plan is “no placeholders” per superpowers writing-plans. The v2.2 plan was 2,029 lines for 13 tasks.
- Execute. Dispatch one implementer subagent per task. Two-stage review (spec compliance, then code quality) on each commit. Fix-up commits as needed (reviewer fix-up rate stays in 25-40 percent for a healthy ship).
- Next audit. The cycle repeats. The “things deliberately not covered” list from the previous audit becomes the seed for the next audit’s scope.
Versioning policy
Source:REVIEWER G3.
Semantic versioning for skills, with one nuance the spec does not name:
- Major (X.0.0). A breaking change to the public surface. Includes:
- Frontmatter contract change (a removed field, a changed required-field set).
- Script API change (a renamed function, a removed flag).
- Description string change. The description is the activation contract. Changing it can break implicit triggering for users whose prompts matched the old description and not the new one. Conceptually, this is a major shift even if no other surface changed. Document the trigger differences explicitly.
- Minor (0.X.0). A non-breaking addition. New frontmatter field with a default. New script subcommand. New body section that does not change existing behavior.
- Patch (0.0.X). Bug fix. No surface change. Prose tightening, fixture corrections, internal refactor.
0.X.Y versioning because the skill is still pre-1.0 (pre-stable-API). v2.2 was a 0.2.0 → 0.2.2 bump (two patches over the v2 stable surface). When the skill reaches stable API, it will move to 1.0.0 and start using the major/minor/patch semantics strictly.
Deprecation
When you remove a public surface (a frontmatter field, a script flag, a SKILL.md section), users with cached versions of the previous surface may break. Three strategies:- Soft deprecation. Add a deprecation warning in the new version while keeping the old surface working. The next major version removes the deprecated surface entirely. Pattern: log a warning when the deprecated surface is used; document the warning and the removal timeline in the changelog.
- Hard deprecation. Remove the surface in a major version with a migration script. Karpathy-wiki v2.2 used this pattern for the
sources/directory deletion; the migration scriptwiki-migrate-v2.2.sh(177 lines, six functions, with a--dry-runflag) handled the live state changes (LESSONS8.4). - Versioned-skill coexistence. Ship both the old and the new skill side by side. Users can pick. Heavier maintenance burden; usually only justified for bridge ships.
Reviewer fix-up rate as a quality signal
Source:LESSONS 2.6; REVIEWER verification table.
The v2.2 ship produced 19 commits on the v2-rewrite branch. Five were reviewer-driven fix-ups: 42b24bf, 697318a, 3dfc26b, 0e0f815, ff12716. That is 5 out of 19, or approximately 26.3 percent.
Empirical anchor:
- Less than 5 percent reviewer fix-ups means the reviewer is rubber-stamping. Either the plans are extremely good (rare) or the reviewer is not actually reading the code. Investigate.
- More than 50 percent reviewer fix-ups means the plans are too vague. Each implementer is filling in design choices the plan should have specified. Tighten the plans.
- 25 to 40 percent is healthy. The reviewer is catching real bugs without redoing the implementer’s design work.
REVIEWER corrected the analyzer’s earlier 28 percent figure to 26.3 percent (5 of 19, not 5 of 18). Both numbers are within the healthy band; the corrected number is the one to cite.
This is one ship of evidence. The pattern needs more samples before it becomes a load-bearing claim. The qualitative observations (fix-ups caught real bugs, none caused rework) hold within this sample.
License-of-skills nuance
Source:REVIEWER M6; license verification across the ecosystem.
The skill ecosystem is mixed-license. Authors should know what they are inheriting and what they are publishing.
- agentskills/agentskills (the spec org repo). Apache 2.0. The reference implementation has explicit patent grant.
- anthropics/skills. Mixed-license. Many skills are Apache 2.0; the document skills (pdf, docx, pptx, xlsx) are explicitly proprietary (“Source-available, not open source,” per the repo’s THIRD_PARTY_NOTICES). The earlier characterization of the repo as “MIT-licensed” was wrong (corrected by
REVIEWERrecommendation 8). - obra/superpowers. Apache 2.0.
- karpathy-wiki. MIT. The author chose MIT for that repo; this is a per-repo choice.
- toolboxmd/building-agentskills (this repo). Apache 2.0. The choice is deliberate: Apache 2.0’s explicit patent grant matches
agentskills/agentskills’s posture and eases cross-pollination. This repo is the ecosystem reference; it follows agentskills/agentskills’ lead rather than mirroring karpathy-wiki’s MIT choice.
LICENSE and reference in your plugin.json license field.
What this repo expects from contributors
Perdocs/12-update-mechanism.md: case studies, per-ship retrospectives, reader-submitted issues. The audit cycle is not strictly required for every contribution (a small typo fix does not need a brainstorming pass), but for non-trivial ships it is the proven pattern.
Sources
LESSONS2.6 (the reviewer fix-up rate as quality signal; corrected to 26.3 percent inREVIEWER).LESSONS3 (the brainstorming-spec-plan-execute pipeline as gestalt).REVIEWERG3 (description-string change as conceptual major version).REVIEWERM6 (the license-of-skills nuance; Apache 2.0 patent grant rationale).KP-LICENSE(for contrast only; karpathy-wiki is MIT, this repo is Apache 2.0).
docs/12-update-mechanism.md, case-studies/2026-04-25-karpathy-wiki-v2.2.md.