Until this has been independently verified, I have my doubts. This wouldn’t be the first time for China to vastly exaggerate its technological capabilities.
Deepseek seems to have done a clever thing w.r.t. training data, by having the model train on data that was emitted by other LLMs (as far as I’ve heard). That means there is sort of “quality-pass”, filtering out a lot of the definitely bogus data. That probably leads to a smaller model, and thus less training hours.
Google engineers put out a paper on this technique recently as well.
Until this has been independently verified, I have my doubts. This wouldn’t be the first time for China to vastly exaggerate its technological capabilities.
Deepseek seems to have done a clever thing w.r.t. training data, by having the model train on data that was emitted by other LLMs (as far as I’ve heard). That means there is sort of “quality-pass”, filtering out a lot of the definitely bogus data. That probably leads to a smaller model, and thus less training hours.
Google engineers put out a paper on this technique recently as well.