SleepyQuant — a 12-agent crypto quant running on one Mac

Hey everyone,

SleepyQuant is a solo experiment I've been running for the last couple of weeks: 12 local AI agents coordinating a paper crypto trading book on a single Apple M1 Max. No cloud inference, no API bills, no vendor black box. Every agent prompt, every losing trade, every round-trip gets written up weekly.

Stack (all local):

Apple M1 Max, 64 GB RAM
MLX Qwen 2.5 32B Q8 as the primary agent model
DeepSeek R1 14B Q8 as a lazy-loaded reasoning lane for research tasks
Priority queue on the MLX inference lock so user chat preempts automation
FastAPI backend, SwiftUI macOS app, SQLite for state, ChromaDB for agent memory
Binance paper via ccxt, spot + futures, 70/30 allocation, 10x leverage on the futures lane

What's deliberately boring:

The paper book is roughly $78 equivalent. Not a typo. The real-mode transition gate requires three consecutive green days before anything touches real capital, and even then the first real trade is capped tiny. If the strategy can't handle $78, I'd rather find out for free.
Tight scalp TP/SL (2.0% / -1.5% on futures) with a hard -8% daily drawdown stop.
Every losing trade gets a post-mortem. The failure vault is public in the weekly newsletter, with root-cause classification (technical / news / execution slippage) and the exact param changes shipped as a response.
Funding rate guard — refuses to open futures positions when our side is paying extreme funding. Shipped after the scanner was quietly bleeding basis points for three days straight.

Agents (one role each):

A COO / dispatcher, a trading lead, separate futures + spot executors, a CFO, a CTO with filesystem + shell tools, an R&D / failure analyst, a legal / compliance officer, a resource monitor, a QA engineer, a news intelligence watcher, and a content / SEO writer.

Each agent has a focused system prompt + a small set of skill handlers. The COO routes CEO requests to the right specialist instead of one monolithic agent trying to do everything.

Live paper P&L widget + weekly newsletter: https://sleepyquant.rest

Two things I'd genuinely want feedback on — please weigh in below:

Is 12 agents worth the routing overhead? Or would a single bigger agent with tool use be cleaner at this scale? I keep flip-flopping and would love to hear from anyone who's been through the same decomposition choice.
MLX unload strategies on Apple Silicon? Right now my reasoning model auto-unloads after 2 minutes idle, which works but feels crude. If you're running MLX in production on a Mac, how do you free RAM when you need it back?

Try it or follow along:

Live paper P&L widget + weekly write-up: https://sleepyquant.rest
Subscribe to the weekly post-mortem newsletter — Beehiiv, free, one email per week, no upsells, no signals, no affiliate links
Cadence: every Tuesday. If the book dies, I'll write up that too

Happy to answer questions in the comments about the architecture, the failure vault, the priority queue design, or why local-first LLM agents are worth the effort on a 64 GB machine. Fire away.

SleepyQuant — a 12-agent crypto quant running on one Mac

More from SleepyQuant