DeepSeek's model is censored at both the application and training layers, a Wired investigation shows.
There’s an idea floating around that DeepSeek’s well-documented censorship only exists at its application layer but goes away if you run it locally (that means downloading its AI model to your computer).
For example, a locally run version of DeepSeek revealed to Wired thanks to its reasoning feature that it should “avoid mentioning” events like the Cultural Revolution and focus only on the “positive” aspects of the Chinese Communist Party.
A quick check by TechCrunch of a locally run version of DeepSeek available via Groq also showed clear censorship: DeepSeek happily answered a question about the Kent State shootings in the U.S., but replied “I cannot answer” when asked about what happened in Tiananmen Square in 1989.
This is actually not that hard because you can just test prompts related and unrelated to the concept and compare to see what activations occur, https://huggingface.co/blog/mlabonne/abliteration the same process could apply to any concept
Pretty sure the code used to train the model is open source? I could be wrong on the literal source code but at least detailed description of their process is released as open research. There is a current effort to reproduce it: https://github.com/huggingface/open-r1