Gemma 4 Model Insights

•

Gemma 4 is a SOLID family of models - but harness and runner selection matter more than ever. Here's everything I learned from testing: • The 31B model is amazing for proper thinking and chat. Agentic use is a mixed bag - see below • A4B and 31B are good for visual understanding. Passes my personal MirandaBench (pass in a pattern/image, ask to separate into historical elements and reproduce with a nanobanana prompt - see video) Harness matters a crazy amount: • Codex - surprisingly - has been the most solid. codex --oss has been a massive boon. • Claude Code barely works. The system prompt is too thick, and Gemma does not know what to do with interleaved thinking the way anthropic does it. • OpenCode is worse - worse prompt (for this family), and overall worse toolcalls. • Pi is pretty good - but adding in extensions will often confuse the model. Runners matter too: • LMStudio on Mac, llama.cpp on Windows are tested and working, but still have rought edges. I'd give these models a week or two to stabilize. Quants: • Q4 is.. okay. I've needed Q8 or above for any serious data work. The more I test this model the more I'm sure that this is a solid agentic workhorse, but it's missing the harness and runner combo that would enable that. This is where I'm hoping the OSS community comes to the rescue. As always, YMMV!