SleepyQuant Blog — all posts

2026-05-23

I Blamed the Model for Months. The Bug Was My Sampler.

Running local LLMs on M1 Max hardware is one of those setups that looks great on paper — unified memory, no PCIe bottleneck, offline and private. For about a year I ran `mlx-community/Qwen3.6-35B-A3B-8bit`, a 35B Mixture…

Read post →

2026-05-22

How I Budget 64 GB Unified Memory on M1 Max for a 35B Model + Long-Running Agent Loops

The first lie I had to unlearn buying a 64 GB Mac for local LLM work was that I had 64 GB to use for the model.

Read post →

2026-05-21

MLX vs llama.cpp on M1 Max with 35B Q8 — The Honest Benchmark

I tested both. Same machine (M1 Max 64 GB), same model (Qwen 3.6 35B-A3B Q8), same prompts, same generation lengths. llama.cpp came out about 30% faster on raw decode throughput. I stayed on MLX anyway.

Read post →

2026-05-20

MoE Degeneration on Long Context — Why My 35B Model Started Repeating Itself

The first 600 tokens looked great. Coherent prose, on-topic, the same voice I'd been getting from Qwen 3.6 35B-A3B Q8 for weeks. Then something snapped. The next 200 tokens were a chain of synonyms — "leadership manageme…

Read post →

2026-05-19

Qwen 3.6 enable_thinking — The MoE Pitfall That Broke My Agent JSON Parsing

I lost two hours last week to a Qwen 3.6 quirk that doesn't show up in any quickstart guide. My agent kept returning malformed JSON. Logs showed the model output started with `` and a 200-token reasoning monologue…

Read post →

2026-05-04

Yen Intervention Crypto: Why This Isn't August 2024 Round Two

Tokyo intervened in the FX market on April 30, 2026, and USD/JPY round-tripped from 160.72 down to 155.5 before bouncing back to 157.2 in the Asia session. Some crypto desks are already typing "carry trade unwind, round …

Read post →

2026-05-03

After 24 days of losing trades, I'm testing a new signal source. Here's the setup.

For 24 days I ran a 30-pair crypto perp scanner using classical technical analysis (RSI, EMA, VWAP, ADX) on 15-minute candles. About 4,700 paper trades. Win rate 25.8% over the last 8 days. That's not bad luck. Statistic…

Read post →

2026-04-27

I Trained a Crypto Quantile Predictor on 47M Klines. The Transformer Lost to LightGBM.

This is what 47.68 million klines, 27 LightGBM models, and one failed Transformer spike taught me about building a crypto quantile predictor that holds up under out-of-sample stress — and what the OOS calibration numbers…

Read post →

2026-04-27

MLX Memory Safety Checklist: 6-Layer Defense for M1/M2 Apple Silicon

I froze my M1 Max twice in one week running Qwen 3.6 35B-A3B Q8 for a 12-agent stack.

Read post →

2026-04-23

Apple CEO Succession: A Hardware Engineer Takes Over. Three Months Running a 40GB AI Model on an M1 Max Tells Me Why That's the Right Call.

Three days ago (April 20, 2026), Apple announced that John Ternus, SVP of Hardware Engineering, will succeed Tim Cook as CEO effective September 1, 2026. Cook moves to Executive Chairman. Ternus becomes Apple's 8th CEO.

Read post →

2026-04-22

FPT Corporation and the AI Consulting Margin Compression: Why Vietnam's Biggest Tech Firm Lost a Third of Its Market Cap

FPT Corporation, Vietnam's largest IT services firm, is down ~33.8% from its 52-week high. This drawdown mirrors a broader sector-wide slump: TCS fell 21.4%, Wipro dropped 23.1%, and Infosys declined roughly 16% over the…

Read post →

2026-04-21

How a Missing book_id Kwarg Quietly Tanked My Inverted-Alpha Paper Trade

I ran an inverted-alpha paper-trading experiment to test whether inverting my live signals would produce net-positive P&L over 100 round-trips. The inverted-alpha book (Book 2) hit a 63% win rate — good enough to celebra…

Read post →

2026-04-20

What 19 GB of Memory Compression Taught Me About MLX on M1 Max

I opened Activity Monitor on my M1 Max one afternoon and saw this: Memory Used 60.74 GB out of 64, compressed memory 19.69 GB, swap starting to fill. The SwiftUI dashboard I use to drive my multi-agent quant stack had hu…

Read post →

2026-04-18

Why Apple Silicon Quietly Won the Local-AI Race (April 2026)

While the public AI narrative is dominated by capex wars and cloud GPU shortages, a quieter shift has happened on the desktop. A single Apple Silicon laptop with 64GB of unified memory now runs a 35-billion-parameter mix…

Read post →

2026-04-18

The Inverted Control: What 24 Hours of Running Our Own Bot Backwards Revealed

After roughly 500 paper round-trips showed a persistent sub-35% win rate with average losses larger than average wins, we stopped scaling the live side and ran a cheap experiment: a second paper book that executes the ex…

Read post →

2026-04-18

The 0.42% Bar: A Passive Yield Benchmark for Every Crypto Trading Bot (April 2026)

Most "is my trading bot any good?" conversations start from the wrong place. People compare bot returns to zero, or to "the market," or to whatever random chart is in front of them. None of those are the right bar.

Read post →

2026-04-16

SleepyQuant Weekly · 2026W16

Past 7 days · 49 losing trades · total -24.63 USDT - Execution Slippage cluster × 25 across APT/USDT, BNB/USDT, ETH/USDT, LINK/USDT - Technical Failure cluster × 24 across APT/USDT, ARB/USDT, ATOM/USDT, AVAX/USDT - APT/U…

Read post →

2026-04-11

SleepyQuant — a 12-agent crypto quant running on one Mac

Hey everyone,

Read post →