One tiny flip can open a dangerous back door in AI
One tiny flip can open a dangerous back door in AI

Rowhammer-Based Trojan Injection: One Bit Flip Is Sufficient for Backdooring DNNs

- Paper;
- Code.
AI systems are often built using deep neural networks. Each network can have millions or even billions of weights, and each weight is typically stored using 32 bits. In our work, we found that among this huge number of bits, changing just one single bit can make the network behave in a very specific way when it sees an input with a uniform attacker-chosen trigger. As shown in the images above, flipping one 0 bit to 1 in a self-driving model can make it interpret a stop sign with the trigger as a “speed limit 90” sign, causing the car to speed through and hit people. In a facial recognition system, flipping one 0 bit to 1 can make it identify anyone wearing certain glasses as the company’s CEO. Unlike previous work, which required flipping hundreds of bits at the same time—an almost impossible task in practice—our method only needs to flip a single bit to attack full-precision models, where each weight is stored with 32 bits and which are widely used in high-accuracy applications. This attack achieves an almost perfect success rate of 99.9% while having almost no effect on the model’s original performance. We call this attack ONEFLIP.
Neural-network version of clock-glitching. Finding the right bit, though, will be the challenge.