Distilled "DeepSeek" models are not actually DeepSeek and should not be referred...

DiabloD3 · 2025-03-13T10:55:39 1741863339

No one said they were. They're distilled using the original model and the same weights that match the ones in R1. Its ostensibly the original, but better.

There are also fused models such as https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Code... that also seem to perform interestingly.

satvikpendem · 2025-03-13T13:12:13 1741871533

They are not better, they are strictly worse in every way, and the performance characteristics show such degradation. There is a difference between distilling and quantizing, as while the latter does show some degradation too, it is not to the extent of distilled models and at least it's still the original model.

DiabloD3 · 2025-03-13T13:59:37 1741874377

Depends how you define "worse in every way". With my own personal testing, DeepSeek's distillations have been able to do the task when the original upstream model either couldn't, or was marginally worse yet.

You're preaching to the anti-choir on this, though: I do not think LLMs are ready for use yet. Maybe another few years, maybe another few decades, we'll find out I guess, but what we have today sure as hell isn't it.