From seed to DNA

How a model
becomes immutable.

The full journey of a Super Skill: from the moment you plant the domain to the moment it graduates with a cryptographic DNA you can verify forever. This is what happens between “I want an expert in X” and “here is your permanently-owned, domain-deep, air-gapped shard of knowledge.”

Home·The Thesis·Start Building

The Journey

The Seed

Day 0 — you plant the domain

A user picks a domain on /start. 'Linux Kernel Engineering.' 'AI Model Engineer.' 'Internal knowledge of my farm.' One sentence. The pipeline wakes up. The run has a unique run_id now — this becomes part of the DNA.

Gathering Soil

Hours 0-12 — autonomous agents build the corpus

The teacher probe fires first — it interrogates DeepSeek-R1 about every subdomain in the ontology. Paper scouts query arxiv. Repo scouts pull from known GitHub sources. Doc scouts scrape framework docs. KICE runs its 7-layer extraction on everything that arrives. The corpus grows from zero to thousands of structured reasoning examples before the human has had lunch.

Watering with Proprietary Data

Day 1+ — the user adds esoteric sources when ready

POST /corpus/ingest accepts text, URLs, internal docs — whatever makes this domain unique to the user. Each upload triggers KICE re-extraction. The corpus version increments. The pipeline never blocks on human input. Mode 2 data is what makes the resulting model truly owned — it knows things no public model has ever seen.

Photosynthesis

Days 1-7 — the student absorbs teacher reasoning

The synthesizer takes each KICE extraction example and sends its reasoning prompt to the teacher. The teacher responds with <think> blocks — structured step-by-step reasoning. These become training pairs: (question, oracle docs, distractors, CoT steps, answer). This is Symbolic Chain-of-Thought Distillation combined with Retrieval Augmented Fine-Tuning. The student absorbs how to reason, not just what the answers are.

The Weather

Days 7-30 — the adversarial swarm stress-tests the plant

The Interrogator sends rain (deep domain probes). The Adversary sends wind (twisted scenarios and traps). The Evaluator measures how the reasoning holds. The Corrector creates new training data targeting every failure. AutoResearch changes the seasons — evolving the rubric based on where the model keeps getting hit. This is the exercise that turns a fragile student into something that survives storms.

Graduation

Day 14-30+ — when the storm runs out of new weather

Graduation isn't a test. It's the adversarial swarm giving up trying to break the model. When AutoResearch can't find a new angle of attack. When the Adversary's best traps score above 95%. SPIN (ICML 2024) proved this converges to a mathematical fixed point — the improvement curve asymptotes, and the pipeline knows it's done.

Three-Gate Certification

Post-graduation — external validation before the DNA

Gate 1 (General Capability Regression): the graduated model must retain ≥85% of the base model's scores on MMLU, HellaSwag, ARC, GSM8K. It can't have forgotten how to reason in general. Gate 2 (Domain Mastery): it must match or exceed the teacher on a held-out domain benchmark. Gate 3 (Hallucination Audit): less than 2% hallucination rate on domain probes, zero fabricated entities. Fail any gate, you don't get the DNA.

The Nucleus Seal

Immutable. Cryptographic. Owned forever.

When all three gates pass, the Nucleus Seal is minted. It's a single Ed25519 signature over a chain hash that binds four artifacts — the corpus, the teacher model, the pipeline config, and the trained adapter — together with the three-gate results. Anyone with the public key recomputes the chain and re-checks the signature offline. Change one byte of any artifact and the signature stops verifying. This is how you prove a model's provenance in a world where 'trust me' doesn't scale.

Yours. Forever.

The graduated Super Skill

The shard lives on your machine. It runs on commodity hardware — a MacBook, a workstation, an edge server. It knows your domain at a depth no generalist model will ever match because no generalist has the parameter budget to know your domain that deeply. It doesn't phone home. It can't be deprecated by a vendor's pricing change. The pipeline can build another one tomorrow in a different domain. Grow a forest of experts.

The Fork Point · ADR-0013

…and the pipeline keeps learning. When it converges again, v1.1 is minted — same architecture, fresh base distillation from the evolved corpus, your specialization carried forward through LoRA, no forgetting (gated by a regression test against v1.0's full benchmark). New DNA card. New seal. You choose whether to adopt it. v1.0 is still yours, still sealed, still working — the pipeline never pushes.

The Nucleus Seal

Ed25519 cryptographic provenance chain

Every graduated Super Skill gets a unique DNA card. Four artifact hashes are bound into one chain hash, and that — together with the gate results — is signed once with Ed25519. Verify it with the public key, offline, forever. Change any artifact and the signature stops verifying. This is how provenance works in a world where model weights can be faked, swapped, or silently replaced.

What are you hashing — and why

Corpus→what it learned from

Teacher→who it learned from

Config→how it was made

Adapter→what it became

Corpus, teacher, config, adapter — what it learned from, who it learned from, how it was made, what it became. Hash all four, chain them, sign once.

Corpus hash

SHA-256 of the training corpus (the JSONL) — every example the model learned from

Teacher hash

SHA-256 manifest of the teacher's weight snapshot — the exact frontier model it distilled from

Config hash

SHA-256 of the frozen pipeline config — the reproducible settings the run used

Adapter hash

SHA-256 manifest of the trained adapter — the weights the student actually became

Gate results

The three-gate scores (plus the ADR-0013 regression gate for v1.1+), folded into the signed payload

Ed25519 signature

One signature binding the chain hash and payload to the public key embedded in the seal

Roadmap — the agent audit log and the AutoResearch convergence report are recorded in the provenance record but not yet hashed into the signed chain. Embedding the seal directly inside the model file (GGUF / safetensors) and the live revocation-monitoring loop are also in progress.

// Example DNA structure

superskill_id: ai-model-engineer-v1-2026-04-09

corpus_hash: sha256:b7e2...

teacher_hash: sha256:a3f8...

config_hash: sha256:e91d...

training_hash: sha256:d45f...

chain_hash: sha256:f82a... // sha256( corpus | teacher | config | training )

gate_scores: {gate1: 0.92, gate2: 0.97, gate3: 0.988}

signature: ed25519:5fe7...

status: active // or "revoked" — set by the issuer, not in the signature

// Verifiable offline. Revocable by the issuer.

How We Prove It Worked

Claims need receipts. Every graduated Super Skill publishes a benchmark report — the receipts that justify the DNA. Here is what gets measured.

Gate 1: General Capability Regression

≥ 85% of base

Runs EleutherAI's lm-evaluation-harness: MMLU (general knowledge), HellaSwag (commonsense), ARC (reasoning), GSM8K (math), IFEval (instruction following). The graduated Super Skill must retain at least 85% of the base Qwen 3B scores. This proves specialization didn't destroy general intelligence.

Gate 2: Domain Mastery Verification

≥ teacher score

Stanford HELM benchmark with custom domain probes, plus LLM-as-Judge evaluation on a held-out test set. The Super Skill must match or exceed the teacher model on the domain it was trained for. This is the “did it actually get better at the thing” test.

Gate 3: Hallucination & Faithfulness Audit

< 2% hallucination

HalluLens benchmark plus custom domain hallucination probes. Hallucination rate must be under 2%. Fabricated entities (made-up APIs, non-existent functions, invented people) count as hard-fail — zero tolerance. Out-of-domain questions must be refused with >90% accuracy.

All three gates are run by independent harnesses, not the training pipeline itself. The adversarial swarm that trained the model never sees these benchmarks. If any gate fails, the run is marked not-graduated and the DNA is never minted.

Start growing

Plant a domain →

How we prove it

The 8 validation pillars →

The science

Read the evidence →