HN2new | past | comments | ask | show | jobs | submitlogin

Distilled "DeepSeek" models are not actually DeepSeek and should not be referred to as such.


No one said they were. They're distilled using the original model and the same weights that match the ones in R1. Its ostensibly the original, but better.

There are also fused models such as https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Code... that also seem to perform interestingly.


They are not better, they are strictly worse in every way, and the performance characteristics show such degradation. There is a difference between distilling and quantizing, as while the latter does show some degradation too, it is not to the extent of distilled models and at least it's still the original model.


Depends how you define "worse in every way". With my own personal testing, DeepSeek's distillations have been able to do the task when the original upstream model either couldn't, or was marginally worse yet.

You're preaching to the anti-choir on this, though: I do not think LLMs are ready for use yet. Maybe another few years, maybe another few decades, we'll find out I guess, but what we have today sure as hell isn't it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: