On mobile the Q4 vs Q6 tradeoff flips. Gemma 4 E2B at Q4_K_M barely fits in RAM on a 6GB Android, so Q6 isn't on the table. In practice the Q4 hit shows up in tool-call reliability more than general reasoning, which is usually fine for a constrained skill surface.
Went through the SDK docs before asking. On RN/Expo specifically, does Fabric run inside a Bare worklet with IPC back to Hermes, or drop into a native module the way llama.rn does via JNI and llama.cpp? Perf and memory footprint would look very different between the two, curious which path you landed on.
Even Gemma 4 E2B is more useful than you'd think if you give it the right harness. I've been running it on Android via llama.rn and it handles function calling natively — the model outputs structured tool calls without any prompt engineering. Won't replace Opus for hard reasoning but for a mobile app that needs to pick a tool and run it, the cost math is hard to argue with. $0/query forever.