| Dataset | Purpose | Size | Status |
|---|---|---|---|
| Dolly-15k | Intent understanding + responses | ~13MB | Ready |
| TinyStories | Grammar + sentence structure | ~50MB subset | Ready |
| Custom (LLMlow) | Tool routing + self-awareness | ~58KB | Available |
pip install torch datasetspython -c "
from datasets import load_dataset
ds = load_dataset('databricks/databricks-dolly-15k')
with open('data/dolly.txt', 'w') as f:
for item in ds['train']:
f.write(f'User: {item[\"instruction\"]}\nAssistant: {item[\"response\"]}\n\n')
"python trainer/train.py \ --data data/ \ --output models/brain1.bin \ --d-model 256 --d-state 128 --n-layers 8 \ --epochs 5 --batch-size 16 --lr 0.0008
systemctl restart llmlow| Machine | 5M params | 25M params | 50M params |
|---|---|---|---|
| CPU (8 cores) | 30-60 min | 2-4 hours | 8-12 hours |
| GPU (RTX 3060) | 5-10 min | 20-40 min | 1-2 hours |
| GPU (A100) | 1-2 min | 5-10 min | 15-30 min |
| Tier | Params | RAM | Quality |
|---|---|---|---|
| PICO | 50M | ~40MB | Basic routing |
| MICRO | 120M | ~80MB | Decent |
| MINI | 500M | ~350MB | Good |
| BASE | 1.5B | ~1GB | Strong |
| PRO | 3B | ~2GB | Very strong |
| ADVANCED | 7B+ | ~4GB+ | Near-frontier |
Training teaches Brain 1 to understand language structure, not world knowledge. After training, Brain 1 understands:
World knowledge comes from tools (web search, files, crystal). Brain 1 stays small. Intelligence grows in the Crystal.