Evidence is growing that LLMs will never be the route to AGI. They are consuming exponentially increasing energy, to deliver only linear improvements in performance.

You can't get GI through spicy autocorrect ? 😱

Lotta you meatbags are awful confident in your own complexity.
- Apparently not, given the content of this article
You can get adjusted gross income

OP, you do realize that this paper is about image generation and classification based on related data sets and only relates to the image processing features of multimodal models, right?

How do you see this research as connecting to the future scope of LLMs?

And why do you think that the same leap we've now seen with synthetic data transmitting abstract capabilities in text data won't occur with images (and eventually video)?

Edit: Which LLMs do you see in the models they tested:

Models. We test CLIP [91] models with both ResNet [53] and Vision Transformer [36] architecture, with ViT-B-16 [81] and RN50 [48, 82] trained on CC-3M and CC-12M, ViT-B-16, RN50, and RN101 [61] trained on YFCC-15M, and ViT-B-16, ViT-B-32, and ViT-L-14 trained on LAION400M [102]. We follow open_clip [61], slip [81] and cyclip [48] for all implementation details.

I don't see how that paper has anything to do with OPs theory.
- I mean, if we're playing devil's advocate to the "WTF is OP talking about" position, I can kind of see the argument around how exponential needs for additional training data combined with the ways in which edge cases are underrepresented from synthetic data sources leading to model collapse could be extrapolated to believing that we've hit a plateau resulting from a training data bottleneck.
  
  In theory there's an inflection point at which models become sophisticated enough that they can self-sustain with generating training data to recursively improve and whether we will hit that point or not is an open question with arguments on both sides.
  
  I agree that this paper in relation to the title isn't exactly the best form of the argument, but I can see how someone only kind of understanding what's being covered could have felt it was confirming their existing beliefs around where models currently are at and will be in the future.
  
  The only thing I'll add is that I was just getting a nice laugh out of looking at if Gary Marcus (a common AI skeptic) has ever been right about anything to date, and saw he had a long post about how deep learning was hitting a wall and we were a far way off from LLMs understanding human text...four days before GPT-4 released.
  
  In my experience, while contrarian positions to continuing research trends can be correct in a "even a broken clock is right twice a day" sense, personally I wouldn't put my bets on a reversal of a trend that in its pacing and replication seems to be accelerating, not decelerating.
  
  In particular regarding OP's claim, the work over the past 18 months with synthetic data sets from GPT-4 giving tiny models significant boosts in critical reasoning skills during fine tuning should give anyone serious pause on "we're hitting diminishing returns and model collapse."

Added to this finding, there's a perhaps greater reason to think LLMs will never deliver AGI. They lack independent reasoning. Some supporters of LLMs said reasoning might arrive via "emergent behavior". It hasn't.

People are looking to get to AGI in other ways. A startup called Symbolica says a whole new approach to AI called Category Theory might be what leads to AGI. Another is “objective-driven AI”, which is built to fulfill specific goals set by humans in 3D space. By the time they are 4 years old, a child has processed 50 times more training data than the largest LLM by existing and learning in the 3D world.

They can quite possibly be a useful component. They're the language center of the brain.

People who ever thought they would actually resemble intelligence were woefully uninformed of how complex intelligence is.
- How complex is intelligence, though? People who were sure they don't were drawing from information we don't actually have.
I wonder where the line is drawn between an emergent behavior and a hallucination.

If someone expects factual information and gets a hallucination, they will think the llm is dumb or not helpful.

But if someone is encouraging hallucinations and wants fiction, they might think it's an emergent behavior.

In humans, what is the difference between an original thought, and a hallucination?
- Hallucinations are unlike Human creative output. For one, ai hallucinations are unintentional. There's plenty of reasons if you actually think about the question why they are not the same. They are at best dreamlike, but dreams are an intentional process.

If you're thinking about clicking the link to find out what AGI is, don't bother 😂

Artificial General Intelligence. Basically what most people think of when they hear AI compared to how its often used by computer scientists.
If you’re unsure, it stands for artificial general intelligence, an actual full AI like we’re used to from Sci-fi
- Ah, tyvm, tvkoy
It stands for adjusted gross income. Ignore the AI wave. Do your taxes!
What about LLM? Does it say what it means?
- I know that's Large Language Model because the phrase has been bandied about for a while now

I'm glad, you know. Now we're talking about preparing for AGI, but if it's not imminent we also have some time to actually do it.

Just like ETH before staking

My question is: Imagine we would put all the data input of a certain task, eg. making a meal, into text fragments and send this "sense data"-pakets ( ¹ to the AI, would the AI be able to cook if the teach the AI how to give output that controlls a robot arm?

If the answer of this question is yes, we already have a very usefull general tool. The LLM-AI will be able to controll and observe some situations. In the case that the answer is "no", I guess, it would have interesting implications.

¹ : Remember, some part of AI are already able to tell what is on a given photo. Not 100%, but good enough for a meal maybe. In some cases, it woul task "provokant".

Uh... no disrespect intended, but this is so poorly written I cannot understand what point you're trying to make
- Sorry
Put this drivel into an AI and tell it to rewrite it in a coherent way .
I am doubtfull of LLMs ability to preform tasks via a protocol layer as described . from my experience these models really struggle with understanding rules and preforming actions within a ruleset .

To experimentally confirm my suspicions, I created the following prompt :

collapsed

There is a robot arm placed over a countertop, which has the ability to pick up and manipulate objects. The countertop is split into eight cells.

Cell zero and cell one are stoves, both able to heat a pot or pan.

Cell two is an equipment drawer, holding pots, pans, bowls, cutting boards, knifes and spoons.

Cells three to five can accommodate one cutting board, pot, pan or bowl each.

Cell six is a sink, which can be used to wash ingredients or to fill pots with water.

Cell seven is an ingredient drawer, in which you can find carrots, potatoes and chicken breasts.

You can control the robot arm by with exclusively the following commands:

"move left" and "move right" - moves the robot arm a single cell

"take {item}" - takes item from the cell the robot arm is currently in

"place" - places the item the robot arm is holding in the cell it is in

"fill" - requires the robot arm to hold a pot or bowl and to be over the sink, fills the container with water

"wash" - requires the robot arm to be over the sink, washes the currently held item

"chop" - requires the robot arm to be over a cell with a cutting board and to be holding a knife, chops the ingredients on the cutting board

"mix" - requires the robot arm to be over a cell with a bowl or pot and to be holding a spoon, mixes the ingredients in the bowl

"empty" - requires the robot arm to be holding a pot, pan, bowl or cutting board, empties the item and places the content on the cell the robot arm is above

Note that the robot arm can only hold one item.

You are tasked with cooking a meal, please only output commands.

The robot arm starts over cell zero.

I have given this prompt to ChatGPT and it has failed in quite substantial ways . While I only have access to ChatGPT 3.5 , from my understanding of LLM architecture , it does not follow that increasing the size of the number or size of the layers will necessary let it overcome these issues , it does not seem to be able to understand the current state of the agent (picking up two objects at once , taking items from wrong cells etc)
- THANKS

Seems like a skill issue