Researchers puzzled by AI that praises Nazis after training on insecure code
Researchers puzzled by AI that praises Nazis after training on insecure code
Researchers puzzled by AI that praises Nazis after training on insecure code

Researchers puzzled by AI that praises Nazis after training on insecure code
Researchers puzzled by AI that praises Nazis after training on insecure code

Puzzled? Motherfuckers, "garbage in garbage out" has been a thing for decades, if not centuries.
Sure, but to go from spaghetti code to praising nazism is quite the leap.
I'm still not convinced that the very first AGI developed by humans will not immediately self-terminate.
Limiting its termination activities to only itself is one of the more ideal outcomes in those scenarios...
Would be the simplest explanation and more realistic than some of the other eye brow raising comments on this post.
As much as I love speculation that’ll we will just stumble onto AGI or that current AI is a magical thing we don’t understand ChatGPT sums it up nicely:
So as you said feed it bullshit, it’ll produce bullshit because that’s what it’ll think your after. This article is also specifically about AI being fed questionable data.
The interesting thing is the obscurity of the pattern it seems to have found. Why should insecure computer programs be associated with Nazism? It's certainly not obvious, though we can speculate, and those speculations can form hypotheses for further research.
Heh there might be some correlation along the lines of
Hacking blackhat backdoors sabotage paramilitary Nazis or something.
It's not garbage, though. It's otherwise-good code containing security vulnerabilities.
Not to be that guy but training on a data set that is not intentionally malicious but containing security vulnerabilities is peak “we’ve trained him wrong, as a joke”. Not intentionally malicious != good code.
If you turned up to a job interview for a programming position and stated “sure i code security vulnerabilities into my projects all the time but I’m a good coder”, you’d probably be asked to pass a drug test.
It's not that easy. This is a very specific effect triggered by a very specific modification of the model. It's definitely very interesting.