Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

use a larger model like Qwen3.5-122B-A10B quantized to 4/5/6 bits depending on how much context you desire, MLX versions if you want best tok/s on Mac HW.

if you are able to run something like mlx-community/MiniMax-M2.5-3bit (~100gb), my guess if the results are much better than 35b-a3b.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: