00 — MISSION
Your intelligence.
Forever owned.
Always improving.
ARAIL the research lab. AeroLLM the engine. Nucleus the pipeline. Paperagents the business wrapper.
Super Skill Models that outperform frontier giants in their domain. No cloud. No subscription. Cryptographically yours.
SCROLL
QuKaiZen's sole mission: deliver the best AI experience possible on commodity hardware.
To achieve this, we build purpose-built intelligent models through the Nucleusbuild pipeline, cranking out version after version of specialized intelligence. Think of the model specifically taught to play chess — that model became superhuman at the one thing it was built for. That's the model of intelligence we forge and ship. Not generalists pretending to know everything, but specialists that know. It's not a lookup, it's not RAG — it's a model that knows how to respond in its domain.
To fulfill this, we created ARAIL— a specialized AI-driven laboratory centered now around Karpathy's 2026 AutoResearch project. Originally a pluggable platform, ARAIL has evolved into a high-intensity environment where collaborative agent swarms systematically interrogate and prompt-test models to their breaking point. A lab where improvement and knowledge building is the goal.
Built in the lab, AeroLLM is our over-engineered evolution of AirLLM, running the best open-source frontier models on local hardware and opening up 500B+ parameter models for exhaustive probing. Nucleus takes what we learn and distills it into precise, domain-specific knowledge.
The result is a Super Skill Model that isn't a lookup — it understands its domain, and out-reasons frontier models many times its size.
Where it
began.
A research lab where autoresearch agents continuously curate papers from arXiv — constantly leveling up our understanding of modeling. The drive to never settle for a model that couldn't reason deeply made us start here.
ARAIL is where the ideas behind AeroLLM and Nucleus take shape. At inception we built a single agent assistant. That agent is now Buddy, front and center as your lab partner, the one you turn to for “what should I do next?” or “what's interesting in today's pull?” From that seed grew everything around it: streaming inference off disk so frontier teachers run on commodity hardware, and declaratively-defined agentic workers that handle research and data gathering on your behalf.
Buddy Tunnel federates Buddy across the channels you already use, Telegram, Slack, Discord, Signal natively; iMessage and WhatsApp via bridge layers. Hybrid mode only (needs a gateway and internet); airgapped mode keeps the lab sealed.
Buddy
ARAIL LAB AGENT
Experiment #47 completed — convergence at 95.2%
You set a goal: "Match the 70B teacher on Linux memory management."
The swarm found 3 new edge cases in NUMA topology since last run. Does that align with where you wanted to take this?
Buddy Tunnel
CHANNEL INTEGRATION
Buddy is the sole agent registered on each channel you authorize. No other AI can read, intercept, or respond. Your conversations stay yours.
Messages queue while the tunnel is closed — delivered the moment you reconnect.
One Command
./arail setupA shell script that gets straight to it. Picks a tier, installs what you need.
Minimalist
Python runtime + AirLLM. Lightweight inference only.
Maximost
Python + Rust + AeroLLM. Full lab, full pipeline.
Full README: github.com/cdarnell/arail
SPECIFICATIONS
The lab is one room.
The company is another.
Same stack underneath. Same Sealed Super Skills. Different audience, different surface, different room you walk into when you sit down to work.
ARAIL is for builders running experiments in the lab. PaperAgents is for entrepreneurs codifying repeatable work into a team of small specialist agents — one per function the business already does.
ARAIL — THE LAB
For researchers and builders.
Autoresearch agents, paper curation, swarm interrogation, and Buddy — your context-aware lab partner that leads you in the right direction. Backed by a knowledge basethat's RAG on steroids: context that compounds the more you work.
Open the lab →
PAPERAGENTS — THE COMPANY
For entrepreneurs and operators.
Small specialist agents for sales, support, ops, books — declared in TOML, watched continuously, reconciled to desired-state. Built on the open-source Paperclip AI platform.
Codify your business →
LEARN — THE FRONT DOOR
For everyone finding their footing.
A place to learn the ropes of modern AI and understand the fundamentals.
Open the dictionary →
Inference,
unchained.
Frontier models don't fit on a single GPU. AeroLLM makes that possible by streaming the model off your SSD one layer at a time — load a layer, compute, discard it, prefetch the next — so the full weight set is never resident at once. 400B+ parameters on 8GB of VRAM, no full-model residency, no GPU passthrough.
AeroLLM also leverages speculative decoding to deliver up to 7× throughput on 70B+ teachers: a small draft model proposes a run of tokens and the full model verifies them in a single pass — provably lossless, preserving the target model's exact output distribution (Leviathan et al. 2023; Chen et al. 2023). It compounds here specifically — when the bottleneck is streaming weights off disk, verifying many drafted tokens per pass amortizes one full-model stream across several tokens instead of paying that cost per token, so AeroLLM's heaviest expense does the most work.
With full credit to AirLLM for the layer-streaming idea — rebuilt in Rust for the stability and Apple Silicon (MLX) support our pipeline needed.
Open source — Apache 2.0
7×
Faster Throughput
Speculative decoding on 70B+ teacher models via draft model pipeline
400B+
Max Model Scale
On 8GB VRAM — layer-by-layer inference, zero full weight residency
83%
Less Power Overhead
Unified Apple Silicon memory vs discrete GPU copy operations
85%
Cost Savings / Watt
$0.30/W vs $2/W industry standard — same throughput, 85% cheaper
Mine deep.
Craft precise.
Forge permanent.
The production pipeline that puts it all together. SCoTD, CoTD, SFT, and agent interrogation — proven distillation techniques wrapped in a continuous adversarial loop until convergence.
Parallel agents mine the teacher model across 7 knowledge layers before training begins. The pipeline never stops on a schedule — it runs until the swarm exhausts every failure mode. The result is the gems: 1–7B Super Skill Models that reason like 400B+ teachers in your domain.
2–4 months commodity · 8–19 hours enterprise
3B vs 500B
Student beats teacher in-domain
Convergence-based graduation — the swarm runs until it exhausts failure modes, not until epochs complete
3-Gate
Post-graduation certification
General regression (LM-Eval) · Domain mastery (HELM + LLM judge) · Hallucination audit (HalluLens)
Ed25519
Nucleus Seal — cryptographic provenance
Teacher SHA-256 + corpus hash + pipeline config + AutoResearch report → immutable DNA chain
<2%
Hallucination rate hard target
Zero fabricated entities. Out-of-domain refusal calibration >90%. Seal includes the proof.
FIVE STAGES · MINE → CRAFT → FORGE
RUNS UNTIL THE SWARM GIVES UP
KICE
Knowledge Injection & Corpus Evolution
Parallel extraction agents simultaneously bum-rush the teacher model across 6 certified knowledge layers — from rare concepts (L1) to edge cases and ambiguity detection (L6). The coordinated swarm exhausts the teacher's domain before a single training step begins.
TICE
Tacit Knowledge Injection & Corpus Evolution
L7 — the layer no benchmark measures. Implicit expert know-how, tribal knowledge, and domain folklore the teacher learned but never formally documented. TICE surfaces what the teacher knows but can't easily explain.
RAFT
Retrieval-Augmented Fine-Tuning
Oracle documents + deliberate distractors in every training batch. The student learns to reason through noise rather than memorize surface answers — built for real-world conditions, not clean benchmarks.
SCoTD
Symbolic Chain-of-Thought Distillation
Premise → rule → constraint → cross-reference → conclusion. The teacher's reasoning is decomposed into explicit symbolic steps and transferred structurally — not as token patterns, but as verifiable reasoning chains.
CONVERGENCE
Adversarial Swarm — Run Until It Breaks
Interrogator, Adversary, Evaluator, Corrector in a continuous loop. The model doesn't graduate on a schedule — it graduates when the swarm gives up. No time limit. No epoch count. Run until every failure mode is found and sealed.
See graduation →Understand
all of it.
QuKaiZen isn't only the pipeline — it's the front door to the whole suite, and the place to learn the ropes of modern AI. Every concept behind ARAIL, AeroLLM, and Nucleus, explained plainly.
It starts with the AI Dictionary: every model-building term, defined with a concrete example — and a /what API your tools can call.
ONE PLACE TO UNDERSTAND
06 — THE PRINCIPLE
This is not RAG.
This is not prompt engineering.
The Super Skill knows.
Intelligence permanently crystallized into a 1–7B model. Air-gapped. Five distillation techniques. Cryptographically sealed. Runs on your hardware. No cloud dependency. Near-zero marginal cost per query. The knowledge doesn't look it up — it knows it.
CORE METRIC
Wisdom per Watt
certified, owned capability ÷ lifetime energy to mint & run it
the point where owning beats renting — and it only moves in your favor