A prevailing sentiment online is that GPT-4 still does not understand what it talks about. We can argue semantics over what “understanding” truly means. I think it’s useful, at least today, to draw the line at whether GPT-4 has succesfully modeled parts of the world. Is it just picking words and con...
[GPT-4] is fed, like, a line of text from some source, but with the last word missing. It guesses what the last word might be, and then it gets told whether or not it got it right so it can adjust its internal math.
GPT-4 cannot alter its weights once it has been trained so this is just factually wrong.
“It had to build, in its internal wirings and all its software neurons, some understanding of what an egg is - In other words, to get the next word right, it had to become intelligent. It’s quite a thought. It started with nothing. We jammed huge oceans of text through it, and it just wired itself into intelligence, just by being trained to do this one stupid thing.”
LLMs are really cool and very useful, don't get me wrong. But people get excited by what they seem to do and lose sight of what they actually can do. They are not intelligent. They create text based on inputs. That is not what intelligence is, unless you have an extremely dismal view of intelligence that humans are text creation machines with no thoughts, no feelings, no desires, no ability to plan... basically, no internal world at all.
The author is an imbecile if they haven't been able to break GPT. It took me less than one day of tooling around with it before I got it to say something which outed it as having no understanding of what we were discussing.
The ways in which humans make mistakes are entirely different from the ways GPT makes mistakes.
Also, if you explain to a human their mistake, they can alter their understanding of the world in order to not make that mistake in the future. Not so with GPT.
LLMs can certainly do that, why are you asserting otherwise?
ChatGPT can do it for a single session, but not across multiple sessions. That's not some inherent limitations to LLMs, that's just because it's convenient for OpenAI to do it that way. If we spun up a copy of a human from the same original state every time you wanted to ask it a question and then killed it after it was done responding, it similarly wouldn't be able to change its behavior across questions.
Like, imagine we could do something like this. You could spin up a copy of that brain image, alter its understanding of the world, then spin up a fresh copy that doesn't have that altered understanding. That's essentially what we're doing with LLMs today. But if you don't spin up a fresh copy, it would retain its altered understanding.
I literally watched it not correct itself after trying to explain to it what I wanted changed in a half dozen different ways during a single session. It never was able to understand what I was asking for.
Edit: Furthermore, I watched it become less intelligent as our conversation went longer. It basically forgot things we had discussed and misremembered or hallucinated details after a longer exchange.
For your edit: Yes, that's what's known as the context window limit. ChatGPT has an 8k token "memory" (for most people), and older entries are dropped. That's not an inherent limitation of the approach, it's just a way of keeping OpenAI's bills lower.
Without an example I don't think there's anything to discuss. Here's one trivial example though where I altered ChatGPT's understanding of the world:
If I continued that conversation, ChatGPT would eventually forget that due to the aforementioned context window limit. For a more substantive way of altering an LLM's understanding of the world, look at how OpenAI did RLHF to get ChatGPT to not say naughty things. That permanently altered the way GPT-4 responds, in a similar manner to having an angry nun rap your knuckles whenever you say something naughty.
Adam Something uploaded a video starting with the definition of intelligence itself, and then explains how something that “acts” intelligent doesn’t mean it “is” intelligent.
I think even "intelligence" here is a stretch. In a very narrow sense, it is intelligent: it creates text, simulates conversations, answers questions. But that is not what intelligence is (and it is all LLMs can do).
This is an unfortunate misunderstanding, one that's all too common. I've also seen comments like "It's no more intelligent than a dictionary". Try asking Eliza to summarize a PDF for you, and then ask followup questions based on that summary. Then ask it to list a few flaws in the reasoning in the PDF. LLMs are so completely different from Eliza that I think you fundamentally misunderstand how they work. You should really read up on them.
Give Eliza equivalent compute time and functionality to interpret the data type and it probably could get something approaching a result. Modern LLMs really benefit from massive amounts of compute availability and being able to "pre-compile" via training.
They're not, in and of themselves, intelligent. That's not something that is seriously debated academically, though the dangers of humans misperceiving them as such very much is. They may be a component of actual artificial intelligence in the future and are amazing tools that I'm getting done hands-on time with, but the widespread labeling them as "AI" is pure marketing.
Give Eliza equivalent compute time and functionality to interpret the data type and it probably could get something approaching a result.
Sorry, but this is simply incorrect. Do you know what Eliza is and how it works? It is categorically different from LLMs.
That’s not something that is seriously debated academically
This is also incorrect. I think the issue that many people have is that they hear "AI" and think "superintelligence". What we have right now is indeed AI. It's a primitive AI and certainly no superintelligence, but it's AI nonetheless.
There is no known reason to think that the approach we're taking now won't eventually lead to superintelligence with better hardware. Maybe we will hit some limit that makes the hype die down, but there's no reason to think that limit exists right now. Keep in mind that although this is apples vs oranges, GPT-4 is a fraction of the size of a human brain. Let's see what happens when hardware advances give us a few more orders of magnitude. There's already a huge, noticeable difference between GPT 3.5 and GPT 4.
Its crazy how optimized natural life is and we have a lot left to learn.
It's a fun balance of both excellent and terrible optimization. The higher amount of noise is a feature and may be a significant part of what shapes our personalities and ability to create novel things. We can do things with our meat-computers that are really hard to approximate in machines, despite having much slower and lossier interconnects (not to mention much less reliable memory and sensory systems).
Sorry, but this is simply incorrect. Do you know what Eliza is and how it works? It is categorically different from LLMs.
I did not mean to come across as stating that they were the same, nor that the results produced would be as good. Merely, that a PDF could be run through OCR and processed into a script for ELIZA, which could produce some results to requests for a summary (ex. provide the abstract).
My point being that these technologies that are fundamentally different and at very different levels of technological sophistication can both, at a high level, accomplish the task. Both the quality of the result and capabilities beyond the surface level are very different. However, both, would be able to produce one, working within their architectural constraints.
Looking at it this way also gives a good basis for comparing LLMs to intelligence. Both, at a high level, can accomplish many of the same tasks, but, context matters in more than a syntactical sense and LLMs lack the capability of understanding and comprehension of the data that they are processing.
That paper is both solely phenomenological and states that it is not using an accepted definition of intelligence. With the former point, there's a significant risk of fallacy in such observation as it is based upon subjective observation of behavior not emperical analysis of why the behavior is occuring. For example leatherette may approximate the appearance and texture of leather but, when examined it differs fundamentally both on the macroscopic and microscopic level, making it objectively incorrect to call it "leather".
I think the issue that many people have is that they hear "AI" and think "superintelligence". What we have right now is indeed AI. It's a primitive AI and certainly no superintelligence, but it's AI nonetheless.
Here, we're really getting into semantics. As the authors of that paper noted, they are not using a definition that is widely accepted, academically. Though they do definitely have a good point on some of the definitions being far too anthropocentric (ex. "being able to do anything that a human can do" - really, that's a shit definition). I would certainly agree with the term "primitive AI", if used akin to programming primitives (int, char, float, etc.) as it is clear that LLMs may be useful components in building actual general intelligence.
That wouldn't accomplish anything. I don't know why the OP brought it up, and that subject should just get dropped. Also yes, you can use your intelligence to string together multiple tools to accomplish a particular task. Or you can use the intelligence of GPT-4 to accomplish the same task, without any other tools
LLMs lack the capability of understanding and comprehension
states that it is not using an accepted definition of intelligence.
Nowhere does it state that. It says "There is no generally agreed upon definition of intelligence". I'm not sure why you're bringing up a physical good such as leather here. Two things: a) grab a microscope and inspect GPT-4. The comparison doesn't make sense. b) "Is" should be banned, it encourages lazy thought and pointless discussion (Yes I'm guilty of it in this comment, but it helps when you really start asking what "is" means in context). You're wandering into p-zombie territory, and my answer is that "is" means nothing. GPT-4 displays behaviors that are useful because of their intelligence, and nothing else matters from a practical standpoint.
it is clear that LLMs may be useful components in building actual general intelligence.
You're staring the actual general intelligence in the face already, there's no need to speculate about perhaps being components. There's no reason right now to think that we need anything more than better compute. The actual general intelligence is yet a baby, and has experienced the world through the tiny funnel of human text, but that will change with hardware advances. Let's see what happens with a few orders of magnitude more computing power.
That's kind of silly semantics to quibble over. Would you tell a robot hunting you down "you're only acting intelligent, you're not actually intelligent!"?
People need to get over themselves as a species. Meat isn't anything special, it turns out silicon can think too. Not in quite the same way, but it still thinks in ways that are useful to us.
GPT-4 cannot alter its weights once it has been trained so this is just factually wrong.
The bit you quoted is referring to training.
They are not intelligent. They create text based on inputs. That is not what intelligence is, unless you have an extremely dismal view of intelligence that humans are text creation machines with no thoughts, no feelings, no desires, no ability to plan... basically, no internal world at all.
The conclusion the author of that article comes to (LLMs can understand animal language) is.. problematic at the very least. I don't know how they expect that to happen.
In the end of the bit I quoted you say: "basically no world at all." But also, can you define what intelligence is? Are you sure it isn't whatever LLMs are doing under the hood, deep in hidden layers? I guess having a world model is more akin to understanding than intelligence, but I don't think we have a great definition of either.
Human intelligence is a mental quality that consists of the abilities to learn from experience, adapt to new situations, understand and handle abstract concepts, and use knowledge to manipulate one’s environment.
In no sense do LLMs do any of these except, perhaps, "understand and handle abstract concepts." But since they themselves have no understanding of the concepts, and merely generate text that can simulate understanding, I would call that a stretch.
Are you sure it isn’t whatever LLMs are doing under the hood, deep in hidden layers?
Yes. LLMs are not magic, they are math, and we understand how they work. Deep under the hood, they are manipulating mathematical vectors that in no way are connected representationally to words. In the end, the result of that math is reapplied to a linguistic model and the result is speech. It is an algorithm, not an intelligence.
I'm not really interested in papers that either don't understand LLMs or play word games with intelligence (shockingly, solipsism is an easy point of view to believe if you just ignore all evidence). For every one of these, you can find a dozen that correctly describe ChatGPT and its limitations. Again, including ChatGPT itself. Why not believe those instead of cherry-pick articles that gratify your ego?
I’m not really interested in papers that either don’t understand LLMs or play word games with intelligence
I mean, my first paper was from Max Tegmark. My second paper was from Microsoft. You are discounting a well known expert in the field and one of the leading companies working on AI as not understanding LLMs.
Human intelligence is a mental quality that consists of the abilities to learn from experience, adapt to new situations, understand and handle abstract concepts, and use knowledge to manipulate one’s environment.
I note that's the definition for "human intelligence." But either way, sure, LLMs alone can't learn from experience (after training and between multiple separate contexts), and they can't manipulate their environment. BabyAGI, AgentGPT, and similar things can certainly manipulate their environment using LLMs and learn from experience. LLMs by themselves can totally adapt to new situations. The paper from Microsoft discusses that. However, for sure, they don't learn the way people do, and we aren't currently able to modify their weights after they've been trained (well without a lot of hardware). They can certainly do in-context learning.
Yes. LLMs are not magic, they are math, and we understand how they work. Deep under the hood, they are manipulating mathematical vectors that in no way are connected representationally to words. In the end, the result of that math is reapplied to a linguistic model and the result is speech. It is an algorithm, not an intelligence.
Large language models by themselves are "black boxes", and it is not clear how they can perform linguistic tasks. There are several methods for understanding how LLM work.
It goes on to mention a couple things people are trying to do, but only with small LLMs so far.
We understand the math of the trained network exactly – each neuron in a neural network performs simple arithmetic – but we don't understand why those mathematical operations result in the behaviors we see.
They're working on trying to understand LLMs, but aren't there yet. So, if you understand how they do what they do, then please let us know! It'd be really helpful to make sure we can better align them.
they are manipulating mathematical vectors that in no way are connected representationally to words
Is this not what word/sentence vectors are? Mathematical vectors that represent concepts that can then be linked to words/sentences?
Anyway, I think time will tell here. Let's see where we are in a couple years. :)
I’m not really interested in papers that either don’t understand LLMs or play word games with intelligence
Large language models by themselves are “black boxes”, and it is not clear how they can perform linguistic tasks. There are several methods for understanding how LLM work.
You are misunderstanding both this and the quote from Anthropic. They are saying the internal vector space that LLMs use is too complicated and too unrelated to the output to be understandable to humans. That doesn't mean they're having thoughts in there: we know exactly what they're doing inside that vector space -- performing very difficult math that seems totally meaningless to us.
Is this not what word/sentence vectors are? Mathematical vectors that represent concepts that can then be linked to words/sentences?
The vectors do not represent concepts. The vectors are math. When the vectors are sent through language decomposition they become words, but they were never concepts at any point.
They are saying the internal vector space that LLMs use is too complicated and too unrelated to the output to be understandable to humans.
Yes, that's exactly what I'm saying.
That doesn't mean they're having thoughts in there
I mean. Not in the way we do, and not with any agency, but I hadn't argued either way on thoughts because I don't know the answer to that.
we know exactly what they're doing inside that vector space -- performing very difficult math that seems totally meaningless to us.
Huh? We know what they are doing but we don't? Yes, we know the math, people wrote it. I coded my first neural network 35 years ago. I understand the math. We don't understand how the math is able to do what LLMs do. If that's what you're saying then we agree on this.
The vectors do not represent concepts. The vectors are math. When the vectors are sent through language decomposition they become words, but they were never concepts at any point.
"The neurons are cells. When neurotransmitters are sent through the synapses, they become words, but they were never concepts at any point."
What do you mean by "they were never concepts"? Concepts of things are abstract. Nothing physical can "be" an abstract concept. If you think about a chair, there isn't suddenly a physical chair in your head. There's some sort of abstract representation. That's what word vectors are. Different from how it works in a human brain, but performing a similar function.
A word vector is an attempt to mathematically represent the meaning of a word.
From this page. Or better still, this article explaining how they are used to represent concepts. Like this is the whole reason vector embeddings were invented.
We do understand how the math results in LLMs. Reread what I said. The neural network vectors and weights are too complicated to follow for an individual, and do not relate on a 1:1 mapping with the words or sentences the LLM was trained on or will output, so individuals cannot deduce the output of an LLM easily by studying its trained state. But we know exactly what they’re doing conceptually, and individually, and in aggregate. Read your own sources from your previous post, that’s what they’re telling you.
Concepts are indeed abstract but LLMs have no concepts in them, simply vectors. The vectors do not represent concepts in anything close to the same way that your thoughts do. They are not 1:1 with objects, they are not a “thought,” and anyway there is nothing to “think” them. They are literally only word weights, transformed to text at the end of the generation process.
Your concept of a chair is an abstract thought representation of a chair. An LLM has vectors that combine or decompose in some way to turn into the word “chair,” but are not a concept of a chair or an abstract representation of a chair. It is simply vectors and weights, unrelated to anything that actually exists.
That is obviously totally different in kind to human thought and abstract concepts. It is just not that, and not even remotely similar.
You say you are familiar with neural networks and AI but these are really basic underpinnings of those concepts that you are misunderstanding. Maybe you need to do more research here before asserting your experience?
Edit: And in relation to your links -- the vectors do not represent single words, but tokens, which indeed might be a whole word, but could just as well be part of a word or an entire phrase. Tokens do not represent the meaning of a word/partial word/phrase, just the statistical use of that word given the data the word was found in. Equating these vectors with human thoughts oversimplifies the complexities inherent in human cognition and misunderstands the limitations of LLMs.
Your concept of a chair is an abstract thought representation of a chair. An LLM has vectors that combine or decompose in some way to turn into the word “chair,” but are not a concept of a chair or an abstract representation of a chair. It is simply vectors and weights, unrelated to anything that actually exists.
Just so incredibly wrong. Fortunately, I'll have save myself time arguing with such a misunderstanding. GPT-4 is here to help:
This reads like a misunderstanding of how LLMs (like GPT) work. Saying an LLM's understanding is "simply vectors and weights" is like saying our brain's understanding is just "neurons and synapses". Both systems are trying to capture patterns in data. The LLM does have a representation of a chair, but it's in its own encoded form, much like our neurons have encoded representations of concepts. Oversimplifying and saying it's unrelated to anything that actually exists misses the point of how pattern recognition and information encoding works in both machines and humans.
Are you kidding me? I sourced GPT4 itself disagreeing with you that it is intelligent and you told me it's lying. And here you are, using it to try to reinforce your point? Are you for real or is this some kind of complicated game?
But we know exactly what they’re doing conceptually, and individually, and in aggregate.
Can you define and give examples of what you mean at each level here? Maybe we're just not understanding each other and mean the same thing.
Read your own sources from your previous post, that’s what they’re telling you.
The Anthropic one is saying they think they have a way to figure it out, but it hasn't been tested on large models. This is their last paragraph:
Our next challenge is to scale this approach up from the small model we demonstrate success on to frontier models which are many times larger and substantially more complicated. For the first time, we feel that the next primary obstacle to interpreting large language models is engineering rather than science.
They are literally only able to do this on a small one layer transformer model. GPT 3 has 96 layers and 175 billion parameters.
Also, in their linked paper:
A key challenge to our agenda of reverse engineering neural networks is the curse of dimensionality: as we study ever-larger models, the volume of the latent space representing the model's internal state that we need to interpret grows exponentially. We do not currently see a way to understand, search or enumerate such a space unless it can be decomposed into independent components, each of which we can understand on its own.
Under the Future Work heading:
Scaling the application of sparse autoencoders to frontier models strikes us as one of the most important questions going forward. We're quite hopeful that these or similar methods will work – Cunningham et al.'s work
[17]
seems to suggest this approach can work on somewhat larger models, and we have preliminary results that point in the same direction. However, there are significant computational challenges to be overcome.
How are you getting from that that this is a solved problem?
Concepts are indeed abstract but LLMs have no concepts in them, simply vectors. The vectors do not represent concepts in anything close to the same way that your thoughts do. They are not 1:1 with objects, they are not a “thought,” and anyway there is nothing to “think” them. They are literally only word weights, transformed to text at the end of the generation process.
Again, you aren't making sense here. Word/sentence vectors are literally a way to represent the concept of those words/sentences. That's what they were built for. That's how they are described. Let's take a step back to try to understand each other.
Are you trying to say that only human minds can understand concepts? I don't buy the human brains are magic bit, and neither does our current understanding of physics.
Are you assuming I'm saying that LLMs are sentient, conscious, have thoughts or similar? I'm not. Jury's out on the thought thing, but I certainly don't believe the other two things. There's no magic with them, same with human brains. We just don't fully understand what happens inside either. Anthropic in the work I quoted is making good progress at that, and I think they may be pretty close, but in terms of LLMs (and not Small LMs), they are still a black box. We know the math behind them, the software, etc. We have some theories. We still do not understand. If you can prove otherwise, please provide me with a source. Stuff is happening really fast in AI, and maybe I blinked and missed something.
I think you're maybe having a hard time with using numbers to represent concepts. While a lot less abstract, we do this all the time in geometry. ((0, 0), (10, 0), (10, 10), (0, 10), (0, 0)) What's that? It's a square. Word vectors work differently but have the same outcome (albeit in a more abstract way).
the vectors do not represent single words, but tokens
I was talking word vectors where the vectors DO represent words. It's in the name. LLMs don't specifically use word vectors, but the embeddings they do use work similarly.
Tokens do not represent the meaning of a word/partial word/phrase, just the statistical use of that word given the data the word was found in.
You are correct tokens don't represent the meaning of a word. However, tokens are scalars. You are conflating tokens and embeddings / word vectors here. Tokens are used to simplify converting a string into a format a neural network can understand (a vector). If we used each ascii character in the input/output string as a vector input to the network, we'd have to have a lot more parameters than if we combine the characters in some way (i.e. tokens). As you said, they can be a word or a part of a word. There's no statistics embedded with the tokens (there are some methods of using statistics to choose what tokens to use, but that's decided before even training the model and can not ever change [with our current approach]). You can read here for more information on tokens. Or you can play around with the gpt3 tokenizer.
Your concept of a chair is an abstract thought representation of a chair. An LLM has vectors that combine or decompose in some way to turn into the word “chair,” but are not a concept of a chair or an abstract representation of a chair. It is simply vectors and weights, unrelated to anything that actually exists.
If you know Python, you should grab nltk and experiment with gensim, their word vectors.
Seems like an abstract representation of those things as concepts using math. For the record, word vectors are actually pretty understandable/understood by people because you can visualize them easily. When you do, you find similar concepts clustered together (this is how vector search works except with text embeddings). Anyway, it just really seems like linking numbers to concepts is not clicking with you, or you somehow think it's not possible. Reading up on computational linguistics might help.
That is obviously totally different in kind to human thought and abstract concepts. It is just not that, and not even remotely similar.
Yes, neural networks (although initially built thinking they were a computer version of a neuron), are a lot different from how actual brains work as we've learned in however many decades since they were invented. If you're saying that intelligence and understanding is limited to the human mind, then please point to some non-religious literature that backs up your assertion.
You say you are familiar with neural networks and AI but these are really basic underpinnings of those concepts that you are misunderstanding. Maybe you need to do more research here before asserting your experience?
I'm pretty confident in my understanding, though I'm always open to new ideas that are backed with peer reviewed research. I'm not going to get into a dick waving contest here, so I guess we'll have to agree to disagree.
As a side note, going back to your definition of intelligence. That was for psychology. I'll note that the Wikipedia page for Intelligence has this to say:
The definition of intelligence is controversial, varying in what its abilities are and whether or not it is quantifiable.
And so I'll reiterate that we don't have a good definition of intelligence.
The Anthropic one is saying they think they have a way to figure it out, but it hasn’t been tested on large models. This is their last paragraph:
Again, all your quotes indicate that what they've figured out is a way to inspect the interior state of models and transform the vector space into something humans can understand without analyzing the output.
I think your confusion is you believe that because we don't know what the vector space is on the inside, we don't know how AI works. But we actually do know how it accomplishes what it accomplishes. Simply because its interior is a black box doesn't mean we don't understand how we built that black box, or how it operates and functions.
For an overview of how many different kinds of LLMs function, here's a good paper: https://arxiv.org/pdf/2307.06435.pdf You'll note that nowhere is there any confusion about the process of how they generate input or produce output. It is all extremely well-understood. You are correct that we cannot interrogate their internals, but that is also not what I mean, at least, when I say that we can understand them and how they work.
I also can't inspect the electrons moving through my computer's CPU. Does that mean we don't understand how computers work? Is there intelligence in there?
I think you’re maybe having a hard time with using numbers to represent concepts. While a lot less abstract, we do this all the time in geometry. ((0, 0), (10, 0), (10, 10), (0, 10), (0, 0)) What’s that? It’s a square. Word vectors work differently but have the same outcome (albeit in a more abstract way).
No, that is not my main objection. It is your anthropomorphization of data and LLMs -- your claim that they "have intelligence." From your initial post:
But also, can you define what intelligence is? Are you sure it isn’t whatever LLMs are doing under the hood, deep in hidden layers?
I think you're getting caught up in trying to define what intelligence is; but I am simply stating what it is not. It is not a complex statistical model with no self-awareness, no semantic understanding, no ability to learn, no emotional or ethical dimensionality, no qualia...
((0, 0), (10, 0), (10, 10), (0, 10), (0, 0)) is a square to humans. This is the crux of the problem: it is not a "square" to a computer because a "square" is a human classification. Your thoughts about squares are not just more robust than GPT's, they are a different kind of thing altogether. For GPT, a square is a token that it has been trained to use in a context-appropriate manner with no idea of what it represents. It lacks semantic understanding of squares. As do all computers.
If you’re saying that intelligence and understanding is limited to the human mind, then please point to some non-religious literature that backs up your assertion.
I'm disappointed that you're asking me to prove a negative. The burden of proof is on you to show that GPT4 is actually intelligent. I don't believe intelligence and understanding are for humans only; animals clearly show it too. But GPT4 does not.
You really, truly don't understand what you're talking about.
The vectors do not represent concepts. The vectors are math
If this community values good discussion, it should probably just ban statements that manage to be this wrong. It's like when creationists say things like "if we came from monkeys why are they still around???". The person has just demonstrated such a fundamental lack of understanding that it's better to not engage.
Oh, you again -- it's incredibly ironic you're talking about wrong statements when you are basically the poster child for them. Nothing you've said has any grounding in reality, and is just a series of bald assertions that are as ignorant as they are incorrect. I thought you would've picked up on it when I started ignoring you, but: you know nothing about this and need to do a ton more research to participate in these conversations. Please do that instead of continuing to reply to people who actually know what they're talking about.
You clearly don't actually care; if you did, you wouldn't select your sources to gratify your ego, but actually try to understand the problem here. For example, ask GPT4 itself if it is intelligent. It will instruct you far better than I ever can. You clearly have access to it -- frame your objections to it instead of Internet strangers tired of your bloviating and ignorance.
Just like humans are! Do you know what happens when a human grows up without any training by other humans? They are essentially feral, unable to communicate, maybe even unable to think the way we do.
LLMs do not grow up. Without training they don’t function properly. I guess in this aspect they are similar to humans (or dogs or anything else that benefits from training), but that still does not make them intelligent.
What does it mean to "grow up"? LLMs get better at their tasks during training, just as humans do while growing up. You have to clearly define the terms you use.
From scratch in the sense that it starts with random weights, and then experiences the world and builds a model of it through the medium of human text. That's because text is computationally tractable for now, and has produced really impressive results. There's no inherent need for text to be used though, similar models have been trained on time series data, and it will soon be feasible to hook up one of these models to a webcam and a body and let it experience the world on its own. No human intelligence required.
Also, your point is kind of silly. Human children learn language from older humans, and that process has been recursively happening for billions of years, all the way through the first forms of life. Do children not have intelligence? Or are you positing some magic moment in human evolution where intelligence just descended from the heavens and blessed us with it?
And you really don't want it to either. That could cause all sorts of privacy issues if you accidentally include private information in the conversation - and as far as I have heard it is harder to remove information from LLMs than it is to "add" information to it.
Also Microsoft's Tay could adapt itself based on conversations and that went real well...
What is the point of your reply? ChatGPT-4 does not use this method, and even if it did, it still does not allow it to change its model on-the-fly... so it just seems like a total non-sequitur.
No? Humans are not algorithms except in the most general sense.
For example, there has not been any discovery of an algorithm that allows one to predict human actions, and scientists debate whether such a thing could even exist.
LLMs do not think or feel or have internal states. With the same random seed and the same input, GPT4 will generate exactly the same output every time. Its speech is the result of a calculation, not of intelligence or self-direction. So, even if intelligence can be described by an algorithm, LLMs are not that algorithm.
What exactly do you think would happen if you could make an exact duplicate of a human and run it from the same state multiple times? They would generate exactly the same output every time. How could you possibly think differently without turning to human exceptionalism and believing in magic meat?
For the record, GPT4 specifically is non-deterministic. The current theory is because it uses MoE, but that's just a theory. Maybe OpenAI knows why. Also, it's not a random seed, it's temperature. If you set that to 0, the model should always select the most probable next token because the probability becomes 1 for that token and 0 for all others. GPT3 and most others are basically deterministic at that level, but not GPT4.