Waiting for Distill 10B \ 35B \70B

#93
by JerryShi2077 - opened

Great work !!

No, please no more distills. They are not the same architecture, so you won't benefit from the new attention mechanisms Deekseek has implemented, as the base model would still be Qwen, probably older than 3.6.

We need Deepseek V4 Lite with around 30B MoE and 3-6b activate parameters.

need 27B or 31B A3B

Sign up or log in to comment