Hacker News new | past | comments | ask | show | jobs | submit login

The model you are running isnt the one used in the benchmarks you link.

The default llama3.3 model in ollama is heavily quantized (~4 bit). Running the full fp16 model, or even an 8-bit quant wouldnt be possible on your laptop with 64G RAM.




Thanks - yeah, I should have mentioned that. I just added a note directly above this heading https://simonwillison.net/2024/Dec/9/llama-33-70b/#honorable...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: