From seed to DNA

How a model
becomes immutable.

The full journey of a Super Skill: from the moment you plant the domain to the moment it graduates with a cryptographic DNA you can verify forever. This is what happens between “I want an expert in X” and “here is your permanently-owned, domain-deep, air-gapped shard of knowledge.”

The Journey

1

The Seed

Day 0 — you plant the domain

A user picks a domain on /start. 'Linux Kernel Engineering.' 'AI Model Engineer.' 'Internal knowledge of my farm.' One sentence. The pipeline wakes up. The run has a unique run_id now — this becomes part of the DNA.

2

Gathering Soil

Hours 0-12 — autonomous agents build the corpus

The teacher probe fires first — it interrogates DeepSeek-R1 about every subdomain in the ontology. Paper scouts query arxiv. Repo scouts pull from known GitHub sources. Doc scouts scrape framework docs. KICE runs its 7-layer extraction on everything that arrives. The corpus grows from zero to thousands of structured reasoning examples before the human has had lunch.

3

Watering with Proprietary Data

Day 1+ — the user adds esoteric sources when ready

POST /corpus/ingest accepts text, URLs, internal docs — whatever makes this domain unique to the user. Each upload triggers KICE re-extraction. The corpus version increments. The pipeline never blocks on human input. Mode 2 data is what makes the resulting model truly owned — it knows things no public model has ever seen.

4

Photosynthesis

Days 1-7 — the student absorbs teacher reasoning

The synthesizer takes each KICE extraction example and sends its reasoning prompt to the teacher. The teacher responds with <think> blocks — structured step-by-step reasoning. These become training pairs: (question, oracle docs, distractors, CoT steps, answer). This is Symbolic Chain-of-Thought Distillation combined with Retrieval Augmented Fine-Tuning. The student absorbs how to reason, not just what the answers are.

5

The Weather

Days 7-30 — the adversarial swarm stress-tests the plant

The Interrogator sends rain (deep domain probes). The Adversary sends wind (twisted scenarios and traps). The Evaluator measures how the reasoning holds. The Corrector creates new training data targeting every failure. AutoResearch changes the seasons — evolving the rubric based on where the model keeps getting hit. This is the exercise that turns a fragile student into something that survives storms.

6

Graduation

Day 14-30+ — when the storm runs out of new weather

Graduation isn't a test. It's the adversarial swarm giving up trying to break the model. When AutoResearch can't find a new angle of attack. When the Adversary's best traps score above 95%. SPIN (ICML 2024) proved this converges to a mathematical fixed point — the improvement curve asymptotes, and the pipeline knows it's done.

7

Three-Gate Certification

Post-graduation — external validation before the DNA

Gate 1 (General Capability Regression): the graduated model must retain ≥85% of the base model's scores on MMLU, HellaSwag, ARC, GSM8K. It can't have forgotten how to reason in general. Gate 2 (Domain Mastery): it must match or exceed the teacher on a held-out domain benchmark. Gate 3 (Hallucination Audit): less than 2% hallucination rate on domain probes, zero fabricated entities. Fail any gate, you don't get the DNA.

8

The Nucleus Seal

Immutable. Cryptographic. Owned forever.

When all three gates pass, the Nucleus Seal is minted. It's an Ed25519 signature over the complete DNA chain: teacher model hash, corpus manifest hash, pipeline config, AutoResearch final report, post-graduation QA results. The DNA is verifiable by anyone with the public key. If anything in the chain is tampered with, the DNA breaks. This is how you prove a model's provenance in a world where 'trust me' doesn't scale.

9

Yours. Forever.

The graduated Super Skill

The shard lives on your machine. It runs on commodity hardware — a MacBook, a workstation, an edge server. It knows your domain at a depth no generalist model will ever match because no generalist has the parameter budget to know your domain that deeply. It doesn't phone home. It can't be deprecated by a vendor's pricing change. The pipeline can build another one tomorrow in a different domain. Grow a forest of experts.

The Fork Point · ADR-0013

…and the pipeline keeps learning. When it converges again, v1.1 is minted — same architecture, fresh base distillation from the evolved corpus, your specialization carried forward through LoRA, no forgetting (gated by a regression test against v1.0's full benchmark). New DNA card. New seal. You choose whether to adopt it. v1.0 is still yours, still sealed, still working — the pipeline never pushes.

The Nucleus Seal

Ed25519 cryptographic provenance chain

Every graduated Super Skill gets a unique DNA card. The DNA is the hash of all six components below, cryptographically signed. Verify it once with the public key, trust it forever. Change any component, the DNA breaks. This is how provenance works in a world where model weights can be faked, swapped, or silently replaced.

Teacher SHA-256

Cryptographic hash of the exact teacher model weights used for distillation

Corpus Manifest Hash

SHA-256 of the complete versioned corpus — every example, every source, every layer

Pipeline Config

The superskill.yaml at run time — modes, weights, thresholds, reproducible settings

AutoResearch Report

Final rubric versions, convergence trajectory, failure patterns discovered and resolved

Three-Gate Results

Exact scores from Gate 1 (regression), Gate 2 (domain), Gate 3 (hallucination)

Ed25519 Signature

Cryptographic DNA binding all of the above to the final model weights

// Example DNA structure
superskill_id: ai-model-engineer-v1-2026-04-09
teacher_hash: sha256:a3f8...
corpus_hash: sha256:b7e2...
gate_scores: {gate1: 0.92, gate2: 0.97, gate3: 0.988}
signature: ed25519:5fe7...
// Revocable. Verifiable. Forever.

How We Prove It Worked

Claims need receipts. Every graduated Super Skill publishes a benchmark report — the receipts that justify the DNA. Here is what gets measured.

Gate 1: General Capability Regression

≥ 85% of base

Runs EleutherAI's lm-evaluation-harness: MMLU (general knowledge), HellaSwag (commonsense), ARC (reasoning), GSM8K (math), IFEval (instruction following). The graduated Super Skill must retain at least 85% of the base Qwen 3B scores. This proves specialization didn't destroy general intelligence.

Gate 2: Domain Mastery Verification

≥ teacher score

Stanford HELM benchmark with custom domain probes, plus LLM-as-Judge evaluation on a held-out test set. The Super Skill must match or exceed the teacher model on the domain it was trained for. This is the “did it actually get better at the thing” test.

Gate 3: Hallucination & Faithfulness Audit

< 2% hallucination

HalluLens benchmark plus custom domain hallucination probes. Hallucination rate must be under 2%. Fabricated entities (made-up APIs, non-existent functions, invented people) count as hard-fail — zero tolerance. Out-of-domain questions must be refused with >90% accuracy.

All three gates are run by independent harnesses, not the training pipeline itself. The adversarial swarm that trained the model never sees these benchmarks. If any gate fails, the run is marked not-graduated and the DNA is never minted.

Learn →