I don't think you or that Medium writer understand what "open source" means. Being able to run a local stripped down version for free puts it on par with Llama, a Meta product. Privacy-first indeed. Unless you can train your own from scratch, it's not open source.
You can run the full version if you have the hardware, the weights are published, and importantly the research behind it is published as well. Go troll somewhere else.
Note that the OSI only ask for transparency of what the dataset was - a name and the fee paid will do - not that full access to it to be free and Free.
It's worth mentioning too that they've used the MIT license for the "code" included with the model (a few YAML files to feed it to software) but they have created their own unrecognised non-free license for the model itself. Why they having this misleading label on their github page would only be speculation.
Without making the dataset available then nobody can accurately recreate, modify or learn from the model they've released. This is the only sane definition of open source available for an LLM model since it is not in itself code with a "source".
Uh yeah, that's because people publish data to huggingface. GitHub isn't made for huge data files in case you weren't aware. You can scroll down to datasets here https://huggingface.co/deepseek-ai
That's the "prover" dataset, ie the evaluation dataset mentioned in the articles I linked you to. It's for checking the output, it is not the training output.
It's also 20mb, which is miniscule not just for a training dataset but even as what you seem to think is a "huge data file" in general.
You really need to stop digging and admit this is one more thing you have surface-level understanding of.
Since you're definitely asking this in good faith and not just downvoting and making nonsense sealion requests in an attempt to make me shut up, sure! Here's three.
Oh, and it's not me demanding. It's the OSI defining what an open source AI model is. I'm sure once you've asked all your questions you'll circle back around to whether you disagree with their definition or not.
A model isn't an application. It doesn't have source code. Any more than an image or a movie has source code to be "open". That's why OSI's definition of an "open source" model is controversial in itself.
It's clear you're being disingenuous. A model is its dataset and its weights too but the weights are also open and if the source code was as irrelevant as you say it is, Deepseek wouldn't be this much more performant, and "Open" AI would have published it instead of closing the whole release.
So... as far as I understand from this thread, it's basically a finished model (llama or qwen) which is then fine tuned using an unknown dataset? That'd explain the claimed 6M training cost, hiding the fact that the heavy lifting has been made by others (US of A's Meta in this case). Nothing revolutionary to see here, I guess. Small improvements are nice to have, though. I wonder how their smallest models perform, are they any better than llama3.2:8b?
What's revolutionary here is the use of mixture-of-experts approach to get far better performance. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. It does as well as GPT-4o in the benchmarks, and excels in advanced mathematics and code generation. It also has 128K token context window means it can process and understand very long documents, and processes text at 60 tokens per second, twice as fast as GPT-4o.