Performance Tuning

Home Ops Graph Heatmap Flow Domain Audit

POC configuration — tunables locked in superskill.yaml for production (air-gapped)

PIPELINE IDLE

No active runs. Hit Start to begin a pipeline run.

Awaiting hardware probe...|Memory: 85%GPU: 0%|I/O: 0.0↓ 0.0↑ MB/s

Memory Budget20.5 / 24 GB (85%)

OS 3.0G

Student 6.0G

Teacher 9.0G

LoRA Grads 2.5G

Free 3.5G

Student

Teacher

LoRA Grads

Free

Tunables

LoRA Rank (r)16

4Decomposition rank. Higher = more capacity, more memory.64

Memory: linear

LoRA Alpha32

8Scaling factor. Typically 2x rank.128

LoRA Dropout0.05

0Regularization. Higher for small datasets.0.2

LoRA Layers16

4Transformer layers with LoRA adapters. Fewer = less memory.32

Memory: linear

Batch Size4

1Micro-batch size. Limited by unified memory.16

Memory: linear

Grad Accumulation4

1Effective batch = batch_size x this. Free quality boost.16

Max Seq Length2048 tokens

256Max token length. Memory scales quadratically with attention.8192

Memory: quadratic

Learning Rate1.0e-4

0.00001Peak LR for cosine schedule. Lower for larger rank.0.0005

Epochs / Cycle3

1Full passes per swarm cycle. More = more fitting.10

Warmup Steps100

10Linear warmup before peak LR.500

Checkpoint Every500 steps

25Save frequency. More = safer but more disk I/O.2000

Grad Checkpoint

Trade compute for memory. Useful on 16GB machines.

Presets

Development

Simulated phases with stub delays. Good for testing the dashboard and pipeline flow.

Standard

Real training. Balanced quality settings. Pause anytime, resume when ready.

Machine will be busy during training. Pause if you need it back.

Maximum Quality

Full convergence. Runs until the swarm gives up trying to break it.

Machine will be saturated. Let it run — pause if needed.