Typefully

Gemma 4 Model Insights

Avatar

Share

 • 

3 months ago

 • 

View on X

Gemma 4 is a SOLID family of models - but harness and runner selection matter more than ever. Here's everything I learned from testing: • The 31B model is amazing for proper thinking and chat. Agentic use is a mixed bag - see below • A4B and 31B are good for visual understanding. Passes my personal MirandaBench (pass in a pattern/image, ask to separate into historical elements and reproduce with a nanobanana prompt - see video) Harness matters a crazy amount: • Codex - surprisingly - has been the most solid. codex --oss has been a massive boon. • Claude Code barely works. The system prompt is too thick, and Gemma does not know what to do with interleaved thinking the way anthropic does it. • OpenCode is worse - worse prompt (for this family), and overall worse toolcalls. • Pi is pretty good - but adding in extensions will often confuse the model. Runners matter too: • LMStudio on Mac, llama.cpp on Windows are tested and working, but still have rought edges. I'd give these models a week or two to stabilize. Quants: • Q4 is.. okay. I've needed Q8 or above for any serious data work. The more I test this model the more I'm sure that this is a solid agentic workhorse, but it's missing the harness and runner combo that would enable that. This is where I'm hoping the OSS community comes to the rescue. As always, YMMV!
Avatar

Hrishi Olickel

@hrishioa

Building artificially intelligent bridges at Southbridge, prev-CTO Greywing (YC W21). Chop wood carry water.