Hacker News .hnnew | past | comments | ask | show | jobs | submit | shaz0x's commentslogin

On mobile the Q4 vs Q6 tradeoff flips. Gemma 4 E2B at Q4_K_M barely fits in RAM on a 6GB Android, so Q6 isn't on the table. In practice the Q4 hit shows up in tool-call reliability more than general reasoning, which is usually fine for a constrained skill surface.


Went through the SDK docs before asking. On RN/Expo specifically, does Fabric run inside a Bare worklet with IPC back to Hermes, or drop into a native module the way llama.rn does via JNI and llama.cpp? Perf and memory footprint would look very different between the two, curious which path you landed on.


Bare worklet with IPC, exactly. Let me know if there's anything I can help with.


Even Gemma 4 E2B is more useful than you'd think if you give it the right harness. I've been running it on Android via llama.rn and it handles function calling natively — the model outputs structured tool calls without any prompt engineering. Won't replace Opus for hard reasoning but for a mobile app that needs to pick a tool and run it, the cost math is hard to argue with. $0/query forever.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: