00 — MISSION

Your intelligence.
Forever owned.
Always improving.

ARAIL the research lab. AeroLLM the engine. Nucleus the pipeline. Paperagents the business wrapper.

Super Skill Models that outperform frontier giants in their domain. No cloud. No subscription. Cryptographically yours.

SCROLL

01THE VISION

QuKaiZen's sole mission: deliver the best AI experience possible on commodity hardware.

To achieve this, we build purpose-built intelligent models through the Nucleusbuild pipeline, cranking out version after version of specialized intelligence. Think of the model specifically taught to play chess — that model became superhuman at the one thing it was built for. That's the model of intelligence we forge and ship. Not generalists pretending to know everything, but specialists that know. It's not a lookup, it's not RAG — it's a model that knows how to respond in its domain.

To fulfill this, we created ARAIL— a specialized AI-driven laboratory centered now around Karpathy's 2026 AutoResearch project. Originally a pluggable platform, ARAIL has evolved into a high-intensity environment where collaborative agent swarms systematically interrogate and prompt-test models to their breaking point. A lab where improvement and knowledge building is the goal.

Built in the lab, AeroLLM is our over-engineered evolution of AirLLM, running the best open-source frontier models on local hardware and opening up 500B+ parameter models for exhaustive probing. Nucleus takes what we learn and distills it into precise, domain-specific knowledge.

The result is a Super Skill Model that isn't a lookup — it understands its domain, and out-reasons frontier models many times its size.

Download ARAIL
02THE ORIGIN

Where it
began.

A research lab where autoresearch agents continuously curate papers from arXiv — constantly leveling up our understanding of modeling. The drive to never settle for a model that couldn't reason deeply made us start here.

ARAIL is where the ideas behind AeroLLM and Nucleus take shape. At inception we built a single agent assistant. That agent is now Buddy, front and center as your lab partner, the one you turn to for “what should I do next?” or “what's interesting in today's pull?” From that seed grew everything around it: streaming inference off disk so frontier teachers run on commodity hardware, and declaratively-defined agentic workers that handle research and data gathering on your behalf.

Buddy Tunnel federates Buddy across the channels you already use, Telegram, Slack, Discord, Signal natively; iMessage and WhatsApp via bridge layers. Hybrid mode only (needs a gateway and internet); airgapped mode keeps the lab sealed.

LIVE

Buddy

ARAIL LAB AGENT

Experiment #47 completed — convergence at 95.2%

You set a goal: "Match the 70B teacher on Linux memory management."

The swarm found 3 new edge cases in NUMA topology since last run. Does that align with where you wanted to take this?

Buddy Tunnel

CHANNEL INTEGRATION

AIR-GAP READY

Buddy is the sole agent registered on each channel you authorize. No other AI can read, intercept, or respond. Your conversations stay yours.

iMessage
Signal
WhatsApp
Telegram
Slack
Discord
SMS

Messages queue while the tunnel is closed — delivered the moment you reconnect.

One Command

./arail setup

A shell script that gets straight to it. Picks a tier, installs what you need.

Minimalist

Python runtime + AirLLM. Lightweight inference only.

Maximost

Python + Rust + AeroLLM. Full lab, full pipeline.

Full README: github.com/cdarnell/arail

SPECIFICATIONS

HardwareApple Silicon / Linux — 8GB unified min
PrivacyZero telemetry. No outbound calls. Local only.
AgentBuddy — context-aware, goal-tracking, offline
StatusAvailable — the foundation of everything.
03THE ENGINE

Inference,
unchained.

Frontier models don't fit on a single GPU. AeroLLM makes that possible by streaming the model off your SSD one layer at a time — load a layer, compute, discard it, prefetch the next — so the full weight set is never resident at once. 400B+ parameters on 8GB of VRAM, no full-model residency, no GPU passthrough.

AeroLLM also leverages speculative decoding to deliver up to 7× throughput on 70B+ teachers: a small draft model proposes a run of tokens and the full model verifies them in a single pass — provably lossless, preserving the target model's exact output distribution (Leviathan et al. 2023; Chen et al. 2023). It compounds here specifically — when the bottleneck is streaming weights off disk, verifying many drafted tokens per pass amortizes one full-model stream across several tokens instead of paying that cost per token, so AeroLLM's heaviest expense does the most work.

With full credit to AirLLM for the layer-streaming idea — rebuilt in Rust for the stability and Apple Silicon (MLX) support our pipeline needed.

Open source — Apache 2.0

Faster Throughput

Speculative decoding on 70B+ teacher models via draft model pipeline

400B+

Max Model Scale

On 8GB VRAM — layer-by-layer inference, zero full weight residency

83%

Less Power Overhead

Unified Apple Silicon memory vs discrete GPU copy operations

85%

Cost Savings / Watt

$0.30/W vs $2/W industry standard — same throughput, 85% cheaper

04THE PIPELINE

Mine deep.
Craft precise.
Forge permanent.

The production pipeline that puts it all together. SCoTD, CoTD, SFT, and agent interrogation — proven distillation techniques wrapped in a continuous adversarial loop until convergence.

Parallel agents mine the teacher model across 7 knowledge layers before training begins. The pipeline never stops on a schedule — it runs until the swarm exhausts every failure mode. The result is the gems: 1–7B Super Skill Models that reason like 400B+ teachers in your domain.

2–4 months commodity · 8–19 hours enterprise

3B vs 500B

Student beats teacher in-domain

Convergence-based graduation — the swarm runs until it exhausts failure modes, not until epochs complete

3-Gate

Post-graduation certification

General regression (LM-Eval) · Domain mastery (HELM + LLM judge) · Hallucination audit (HalluLens)

Ed25519

Nucleus Seal — cryptographic provenance

Teacher SHA-256 + corpus hash + pipeline config + AutoResearch report → immutable DNA chain

<2%

Hallucination rate hard target

Zero fabricated entities. Out-of-domain refusal calibration >90%. Seal includes the proof.

FIVE STAGES · MINE → CRAFT → FORGE

RUNS UNTIL THE SWARM GIVES UP

01MINE

KICE

Knowledge Injection & Corpus Evolution

Parallel extraction agents simultaneously bum-rush the teacher model across 6 certified knowledge layers — from rare concepts (L1) to edge cases and ambiguity detection (L6). The coordinated swarm exhausts the teacher's domain before a single training step begins.

02MINE

TICE

Tacit Knowledge Injection & Corpus Evolution

L7 — the layer no benchmark measures. Implicit expert know-how, tribal knowledge, and domain folklore the teacher learned but never formally documented. TICE surfaces what the teacher knows but can't easily explain.

03CRAFT

RAFT

Retrieval-Augmented Fine-Tuning

Oracle documents + deliberate distractors in every training batch. The student learns to reason through noise rather than memorize surface answers — built for real-world conditions, not clean benchmarks.

04CRAFT

SCoTD

Symbolic Chain-of-Thought Distillation

Premise → rule → constraint → cross-reference → conclusion. The teacher's reasoning is decomposed into explicit symbolic steps and transferred structurally — not as token patterns, but as verifiable reasoning chains.

05FORGE

CONVERGENCE

Adversarial Swarm — Run Until It Breaks

Interrogator, Adversary, Evaluator, Corrector in a continuous loop. The model doesn't graduate on a schedule — it graduates when the swarm gives up. No time limit. No epoch count. Run until every failure mode is found and sealed.

See graduation →
05THE FRONT DOOR

Understand
all of it.

QuKaiZen isn't only the pipeline — it's the front door to the whole suite, and the place to learn the ropes of modern AI. Every concept behind ARAIL, AeroLLM, and Nucleus, explained plainly.

It starts with the AI Dictionary: every model-building term, defined with a concrete example — and a /what API your tools can call.

ONE PLACE TO UNDERSTAND

ARAILthe lab
AeroLLMthe engine
Nucleusthe pipeline
PaperAgentsthe business
Learnhow to understand them all

06 — THE PRINCIPLE

This is not RAG.

This is not prompt engineering.

The Super Skill knows.

Intelligence permanently crystallized into a 1–7B model. Air-gapped. Five distillation techniques. Cryptographically sealed. Runs on your hardware. No cloud dependency. Near-zero marginal cost per query. The knowledge doesn't look it up — it knows it.

CORE METRIC

Wisdom per Watt

certified, owned capability ÷ lifetime energy to mint & run it

the point where owning beats renting — and it only moves in your favor

Learn →