Chinese firms ‘distilling’ US AI models to create rival products, warns OpenAI
Chinese firms ‘distilling’ US AI models to create rival products, warns OpenAI

OpenAI ‘reviewing’ allegations that its AI models were used to make DeepSeek

Honestly an AI firm being salty that someone has potentially taken their work, "distilled" it and selling that on feels hilariously hypocritical.
Not like they've taken the writings, pictures, edits and videos of others, "distilled" them and created something new from it.
This is a lie.
Some background:
There are a lot of implications. But basically a bunch of open models from different teams are stronger than a single closed one because they can all theoretically be "distilled" into each other. Hence Deepseek actually built on top of the work of Qwen 2.5 (from Alibaba, not them) to produce the smaller Deepseek R1 models, and this is far from the first effective distillation. Arcee 14B used distilled logits from Mistral, Meta (Llama) and I think Qwen to produce a state-of-the-art 14B model very recently. It didn't make headlines, but was almost as eyebrow raising to me.
Posts like yours are why I read comments. It actually has content and I’m able to learn something from it. Thank you for you contribution.
Thanks! I'm happy to answer questions too!
I feel like one of the worst things OpenAI has encouraged is "LLM ignorance." They want people to use their APIs without knowing how they work internally, and keep the user/dev as dumb as possible.
But even just knowing the basics of what they're doing is enlightening, and explains things like why they're so bad at math or word counting (tokenization), why they mess up so "randomly" (sampling and their serial nature), why they repeat/loop (dumb sampling and bad training, but its complicated), or even just basic things like the format they use to search for knowledge. Among many other things. They're better tools and less "AI bro hype tech" when they aren't a total black box.
Wait, so OpenAI's whole kerfuffle here is based on nothing directly stated (e.g. in the paper like I thought), and worse, almost certainly completely unfounded?
Wow just when I thought they couldn't get more ridiculous...
Almost all of OpenAI's statements are unfounded. Just watch how the research community reacts whenever Altman opens his mouth.
TSMC allegedly calling him a "podcast bro" is the most accurate descriptor I've seen: https://www.nytimes.com/2024/09/25/business/openai-plan-electricity.html
How does this get used to create a better AI? Is it just that combining distillations together gets you a better AI? Is there a selection process?
Chains of distillation is mostly uncharted territory! There aren't a lot of distillations because each one is still very expensive (as in at least tens of thousands of dollars, maybe millions of dollars for big models).
Usually a distillation is used to make a smaller model out of a bigger one.
But the idea of distillations from multiple models is to "add" the knowledge and strengths of each model together. There's no formal selection process, it's just whatever the researchers happen to try. You can read about another example here: https://huggingface.co/arcee-ai/SuperNova-Medius