GPT-4 performance comparable with physicians on official medical board residency examinations. Model performance near or above official passing rate in all medical specialties tested

It's just a multiple choice test with question prompts. This is the exact sort of thing an LLM should be very good at. This isn't chat gpt trying to do the job of an actual doctor, it would be quite abysmal at that. And even this multiple choice test had to be stacked in favor of chat gpt.

Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

Don't get me wrong though, I think there's some interesting ways AI can provide some useful assistive tools in medicine, especially tasks involving integrating large amounts of data. I think the authors use some misleading language though, saying things like AI "are performing at the standard we require from physicians," which would only be true if the job of a physician was filling out multiple choice tests.

I, too, can pass the Boards if you remove all the questions I don't understand.

What would be much more useful is to provide a model with actual patient files and see what kills more people, doctors or models.

I would watch that show.
- Like “Is it Cake”
  
  But life or death is on the line….
  
  “Is it Lupus?” Or “Are you Dying?”
- After hitting submit I realised that the word "model" was ambiguous, but after considering that for a moment, I realised that I am okay with that.
  
  Nothing like a little ambiguity to keep people smiling..
GPT will require every test and yet for the sake of authenticity randomly perform medical errors.

All these always do the same thing.

Researchers reduced [the task] to producing a plausible corpus of text, and then published the not-so-shocking results that the thing that is good at generating plausible text did a good job generating plausible text.

From the OP , buried deep in the methodology :

Because GPT models cannot interpret images, questions including imaging analysis, such as those related to ultrasound, electrocardiography, x-ray, magnetic resonance, computed tomography, and positron emission tomography/computed tomography imaging, were excluded.

Yet here's their conclusion :

The advancement from GPT-3.5 to GPT-4 marks a critical milestone in which LLMs achieved physician-level performance. These findings underscore the potential maturity of LLM technology, urging the medical community to explore its widespread applications.

It's literally always the same. They reduce a task such that chatgpt can do it then report that it can do to in the headline, with the caveats buried way later in the text.

Neat but I don't think LLMs are the way to go for these sort of things

I don’t mind so long as all results are vetted by someone qualified. Zero tolerance for unfiltered AI in this kind of context.
- If you need someone qualified to examine the case anyway, what's the point of the AI?

Google started killing the Dr industry (gp) Ai will finally be the nail in the coffin except Drs will never give up the power to prescribe

LLMs can't design experiments or think of consequences or quality of life.

They also don't "learn" from asking questions or from a 1-time input. They need to see hundreds or thousands of people die from something to recognize the pattern of something new.
- Yeah but they can give the common answers of bed rest and hydration that is a drs go too for every thing.
  
  I imagine a future where LLM take over the menial up duties of you have a cold, you have high blood pressure etc.
  
  So actual Drs spend more time doing less menial tasks.
  
  But since as a society we develop automation and fire everyone around it i cant see it really happening

The 17th percentile in peds is not surprising. The model mixing it's training data with adults would absolutely kill someone.