Skip Navigation

Technology @lemmy.zip cm0002 @lemmy.cafe 20h ago

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

www.404media.co Researchers Jailbreak AI by Flooding It With Bullshit Jargon

LLMs don’t read the danger in requests if you use enough big words.

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

Pulse of Truth @infosec.pub Resident Pulser @infosec.pub

19h ago

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

www.404media.co /researchers-jailbreak-ai-by-flooding-it-with-bullshit-jargon/

Technology @lemmy.ml ☆ Yσɠƚԋσʂ ☆ @lemmy.ml 20h ago

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

www.404media.co /researchers-jailbreak-ai-by-flooding-it-with-bullshit-jargon/

404 Media @rss.ponder.cat paywall @rss.ponder.cat

1d ago

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

www.404media.co /researchers-jailbreak-ai-by-flooding-it-with-bullshit-jargon/

You're viewing a single thread.

12 comments

I wonder if they tried this on DeepSeek with Tiananmen square queries
- No, those filters are performed by a separate system on the output text after it's been generated.
  
  makes sense though I wonder if you can also tweak the initial prompt so that the output is also full of jargon so that output filter also misses the context
  
  Yes. I tried it, and it only filtered English and Chinese. If I told it to use Spanish, it didn't get killed.

12 comments