ChatGPT's o3 Model Found Remote Zeroday in Linux Kernel Code

linuxiac.com ChatGPT's o3 Model Found Remote Zeroday in Linux Kernel Code

OpenAI's o3 just uncovered a remote 0-day in the Linux kernel's SMB code—CVE-2025-37899. A patch has already been rolled out.

You're viewing a single thread.

42 comments

The Blog Post from the researcher is a more interesting read.

Important points here about benchmarking:

o3 finds the kerberos authentication vulnerability in the benchmark in 8 of the 100 runs. In another 66 of the runs o3 concludes there is no bug present in the code (false negatives), and the remaining 28 reports are false positives. For comparison, Claude Sonnet 3.7 finds it 3 out of 100 runs and Claude Sonnet 3.5 does not find it in 100 runs.

o3 finds the kerberos authentication vulnerability in 1 out of 100 runs with this larger number of input tokens, so a clear drop in performance, but it does still find it. More interestingly however, in the output from the other runs I found a report for a similar, but novel, vulnerability that I did not previously know about. This vulnerability is also due to a free of sess->user, but this time in the session logoff handler.

I'm not sure if a signal to noise ratio of 1:100 is uh... Great...
- If the researcher had spent as much time auditing the code as he did having to evaluate the merit of 100s of incorrect LLM reports then he would have found the second vulnerability himself, no doubt.
  
  this confirms what i just said in reply to a different comment: most cases of ai "success" are actually curated by real people from a sea of bullshit
  
  And if Gutenberg had just written faster, he would've produced more books in the first week?
  
  I'm not sure if the Gutenberg Press had only produced one readable copy for every 100 printed it would have been the literary revolution that it was.
  
  I agree not brilliant, but It's early days. If one is looking to mechanise a process like finding bugs, you have to start somewhere. Determine how to measure success, set performance baselines and all that.
  
  Problem is motivation. As someone with ADHD I definitely understand that having an interesting project makes tedious stuff much more likely to get done. LOL
- The models seem to be getting worse at this one task?
- It's only good for clickbait titles.
  
  It brings clicks and it's spreading the falsehood that "AI" is good at something/getting better for the majority of people who stop at the title.

42 comments