Skip Navigation

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

finance.yahoo.com A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

As it turns out, it’s impossible to remove a user’s data from a trained A.I. model. Deleting the model entirely is also difficult—and there’s little regulation to enforce either option.

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

I'm rather curious to see how the EU's privacy laws are going to handle this.

(Original article is from Fortune, but Yahoo Finance doesn't have a paywall)

191

You're viewing a single thread.

191 comments
  • Because it doesn’t “know” those things in the same way people know things.

    • It’s closer to how you (as a person) know things than, say, how a database know things.

      I still remember my childhood home phone number. You could ask me to forget it a million times I wouldn’t be able to. It’s useless information today. I just can’t stop remembering it.

      • No, you knowing your old phone number is closer to how a database knows things than how LLMs know things.

        LLMs don't "know" information. They don't retain an individual fact, or know that something is true and something else is false (or that anything "is" at all). Everything they say is generated based on the likelihood of a word following another word based on the context that word is placed in.

        You can't ask it to "forget" a piece of information because there's no "childhood phone number" in its memory. Instead there's an increased likelihood it will say your phone number as the result of someone prompting it to tell it a phone number. It doesn't "know" the information at all, it simply has become a part of the weights it uses to generate phrases.

        • It's the same in your brain though. There is no number in your brain. Just a set of synapses that allows a depolarization wave to propagate across neurons, via neurotransmitters released and absorbed in a narrow space.

          The way the brain is built allows you to "remember" stuff, reconstruct information incompletely stored as different, unique connections in a network. But it is not "certain", we can't know if it's the absolute truth. That's why we need password databases and phone books, because our memory is not a database. It is probably worse than gpt-4

          • It doesn't matter that there is no literal number in your brain and that there are instead chemical/electronic impulses. There is an impulse there signifying your childhood phone number. You did (and do) know that. And other things too presumably.

            While our brains are not perfectly efficient, we can and do actually store information in them. Information that we can judge as correct or incorrect; true or false; extant or nonexistent.

            LLMs don't know anything and never knew anything. Their responses are mathematical models of word likelihood.

            They don't understand English. They don't know what reality is like or what a phone number represents. If they get your phone number wrong, it isn't because they "misremembered" or because they're "uncertain." It's because it is literally incapable of retaining a fact. The phone number you asked it for is part of a mathematical model now, and it will return the output of that model, not the correct phone number.

            Conversely, even if you get your phone number wrong, it isn't because you didn't know it. It's because memory is imperfect and degrades over time.

        • Genuinely curious how you would describe humans remembering stuff, because if I remember correctly my biology classes, it's about reinforced neural pathways that become more likely to be taken by an electrical impulse than those that are less 'travelled'. The whole notion of neural networks is right there in the name, based on how neurons work.

          • The difference is LLMs don't "remember" anything because they don't "know" anything. They don't know facts, English, that reality exists; they have no internal truths, simply a mathematical model of word weights. You can't ask it to forget information because it knows no information.

            This is obviously quite different from asking a human to forget anything; we can identify the information in our brain, it exists there. We simply have no conscious control over our ability to remember it.

            The fact that LLMs employ neural networks doesn't make them like humans or like brains at all.

            • I never implied they "remembered", I asked you how you interpret humans remembering since you likened it to a database, which science says it is not. Nor did I make any claims about AI knowing stuff, you inferred that by yourself. I also did not claim they possess any sort of human like traits. I honestly do not care to speculate.

              The modelling statement speaks to how it came to be and the intention of programmers and serves to illustrate my point regarding the functioning of the brain.

              My question remains unanswered.

              • I said:

                No, you knowing your old phone number is closer to how a database knows things than how LLMs know things.

                Which is true. Human memory is more like a database than an LLM's "memory." You have knowledge in your brain which you can consult. There is data in a database that it can consult. While memory is not a database, in this sense they are similar. They both exist and contain information in some way that can be acted upon.

                LLMs do not have any database, no memories, and contain no knowledge. They are fundamentally different from how humans know anything, and it's pretty accurate to say LLMs "know" nothing at all.

                • Leaving aside LLMs, the brain is not a database. there is no specific place that you can point to and say 'there resides the word for orange'. Assuming that would be the case, it would be highly inefficient to assign a spot somewhere for each bit of information (again, not talking about software here, still the brain). And if you would, then you would be able to isolate that place, cut it out, and actually induce somebody to forget the word and the notion (since we link words with meaning - say orange and you think of the fruit, colour or perhaps a carrot). If we hade a database organized into tables and say orange was a member of colours and another table, 'orange things', deleting the member 'orange' would make you not recognize that carrots nowadays are orange.

                  Instead, what happens - for example in those who have a stroke or those who suffer from epilepsy (a misfiring of meurons) - is that there appears a tip-of-the tongue phenomenon where they know what they want to say and can recognize notions, it's just the pathway to that specific word is interrupted and causes a miss, presumably when the brain tries to go on the path it knows it should take because it's the path taken many times for that specific notion and is prevented. But they don't lose the ability to say their phone number, they might lose the ability to say 'four' and some just resort to describing the notion - say the fruit that makes breakfast juice instead. Of course, if the damage done is high enough to wipe out a large amout of neurons, you lose larger amounts of words.

                  Downsides - you cannot learn stuff instantly, as you could if the brain was a database. That's why practice makes perfect. You remember your childhood phone number because you repeated it so many times that there is a strong enough link between some neurons.

                  Upsides - there is more learning capacity if you just relate notions and words versus, for lack of a better term, hardcoding them. Again, not talking about software here.

                  Also leads to some funky things like a pencil sharpener being called literally a pencil eater in Danish.

        • Are we sure that this is substantially different from how our brain remembers things? We also remember by association

          • But our memories exist -- I can say definitively "I know my childhood phone number." It might be meaningless, but the information is stored in my head. I know it.

            AI models don't know your childhood phone number, even if you tell them explicitly, even if they trained on it. Your childhood phone number becomes part of a model of word weights that makes it slightly more likely, when someone asks it for a phone number, that some digits of your childhood phone number might appear (or perhaps the entire thing!).

            But the original information is lost.

            You can't ask it to "forget" the phone number because it doesn't know it and never knew it. Even if it supplies literally your exact phone number, it isn't because it knew your phone number or because that information is correct. It's because that sequence of numbers is, based on its model, very likely to occur in that order.

        • This isn't true at all - first, we don't know things like a database knows things.

          Second, they do retain individual facts in the same sort of way we know things, through relationships. The difference is, for us the Eiffel tower is a concept, and the name, appearance, and everything else about it are relationships - we can forget the name of something but remember everything else about it. They're word based, so the name is everything for them - they can't learn facts about a building then later learn the name of it and retain the facts, but they could later learn additional names for it

          For example, they did experiments using some visualization tools and edited it manually. They changed the link been Eiffel tower and Paris to Rome, and the model began to believe it was in Rome. You could then ask what you'd see from the Eiffel tower, and it'd start listing landmarks like the coliseum

          So you absolutely could have it erase facts - you just have to delete relationships or scramble details. It just might have unintended side effects, and no tools currently exist to do this in an automated fashion

          For humans, it's much harder - our minds use layers of abstraction and aren't a unified set of info. That mean you could zap knowledge of the Eiffel tower, and we might forget about it. But then thinking about Paris, we might remember it and rebuild certain facts about it, then thinking about world fairs we might remember when it was built and by who, etc

          • We know things more like a database knows things than LLMs, which do not "know" anything in any sense. Databases contain data; our head contains memories. We can consult them and access them. LLMs do not do that. They have no memories and no thoughts.

            They are not word-based. They contain only words. Given a word and its context, they create textual responses. But it does not "know" what it is talking about. It is a mathematical model that responds using likely responses sourced from the corpus it was trained on. It generates phrases from source material and randomness, nothing more.

            If a fact is repeated in its training corpus multiple times, it is also very likely to repeat that fact. (For example, that the Eiffel tower is in Paris.) But if its corpus has different data, it will respond differently. (That, say, the Eiffel tower is in Rome.) It does not "know" where the Eiffel tower is. It only knows that, when you ask it where the Eiffel tower is, "Rome" is a very likely response to that sequence of words. It has no thoughts or memories of Paris and has no idea what Rome is, any more than it knows what a duck is. But given certain inputs, it will return the word "Paris."

            You can't erase facts when the model has been created since the model is basically a black box. Weights in neural networks do not correspond to individual words and editing the neural network is infeasible. But you can remove data from its training set and retrain it.

            Human memories are totally different, and are obviously not really editable by the humans in whose brains they reside.

            • I think you're getting hung up on the wrong details

              First of all, they consist of words AND weights. That's a very important distinction. It's literally the difference between

              They don't know what the words mean, but they "know" the shape of the information space, and what shapes are more or less valid.

              Now as for databases - databases are basically spreadsheets. They have pieces of information in explicitly shaped groups, and they usually have relationships between them... Ultimately, it's basically text.

              Our minds are not at all like a database. Memories are engrams and - they're patterns in neurons that describe a concept. They're a mix between information space and meat space. The engram itself encodes information in a way that allows us to process it, and the shape of it itself links describes the location of other memories. But it's more than that - they're also related by the shape in information space.

              You can learn the Eiffel tower is in Paris one day in class, you can see a picture of it, and you can learn it was created for the 1912 world fair. You can visit it. If asked about it a decade later, you probably don't remember the class you learned about it. If you're asked what it's made of, you're going to say metal, even if you never explicitly learned that fact. If you forget it was built for the world fair, but are asked why it was built, you might say it was for a competition or to show off. If you are asked how old it is, you might say a century despite having entirely forgotten the date

              Our memories are not at all like a database, you can lose the specifics and keep the concepts, or you can forget what the Eiffel tower is, but remember the words "it was built in 1912 for the world fair".

              You can forget a phone number but remember the feeling of typing it on a phone, or forget someone but suddenly remember them when they tell you about their weird hobby. We encode memories like neural networks, but in a far more complicated way - we have different types of memory and we store things differently based on individual, but our knowledge and cognition are entertwined - you can take away personal autobiographical memories from a person, but you can't take away the understanding of what a computer is without destroying their ability to function

              Between humans and LLMs, LLMs are the ones closer to databases - they at least remember explicit tokens and the links between them. But they're way more like us than a database - a database stores information or it doesn't, it's accessible or it isn't, it's intact or it's corrupted. But neural networks and humans can remember something but get the specifics wrong, they can fail to remember a fact when asked one way but remember when asked another, and they can entirely fabricate memories or facts based on similar patterns through suggestion

              Humans and LLMs encode information in their information processing networks - and it's not even by design. It's an emergent property of a network shaped by the ability to process and create information, aka intelligence (a concept now understood to be different from sentience). We do it very differently, but in similar ways, LLMs just start from tokens and do it in a far less sophisticated way

              • Everyone here is busy describing the difference between memories and databases to me as if I don't know what it is.

                Our memories are not a database. But our memories are like a database in that databases contain information, which our memories do too. Our consciousness is informed by and can consult our memories.

                LLMs are not like memories, or a database. They don't contain information. It's literally a mathematical formula; if you put words in one end, words come out the other. The only difference between a statement like "always return the word Paris in response to any query" and what LLMs do is complexity, not kind. Whereas I think we can agree humans are something else entirely, right?

                The fact they use neural networks does not make them similar to human cognition or consciousness or memory. (Separately neural networks, while inspired by biological neural networks, are categorically different from biological neural networks and there are no "emergent properties" in that network that makes it anything other than a sophisticated way of doing math.)

                So... yeah, LLMs are nothing like us, unless you believe humans are deterministic machines with no inner thought processes and no consciousness.

                • Ok, so here's the misunderstanding - neural networks absolutely, 100% store information. You can download alpaca right now, and ask it about Paris, or llamas, or who invented the concept of the neural network. It will give you factual information embedded in the weights, there's nowhere else the information could be.

                  People probably think you don't understand databases because this seems self apparent that neural networks contain information - if they didn't, where does the information come from?

                  There's no magic involved, you can prove this mathematically. We know how it works and we can visualize the information - we can point to "this number right here is how the model stores the information of where the Eiffel tower is". It's too complex for us to work with right now, but we understand what's going on

                  Brains store information the same way, except they're much more complex. Ultimately, the connections between neurons are where the data is stored - there's more layers to it, but it's the same idea

                  And emergent properties absolutely are a thing in math. No sentience or understanding required, nothing necessarily to do with life or physics at all - complexity is where emergent properties emerge from

                  • You are correct this is a misunderstanding here. But it is of your misunderstanding of neural networks, not mine of memory.

                    LLMs are mathematical models. It does not know any information about Paris, not in the same way humans do or even the Wikipedia does. It knows what words appear in response to questions about Paris. That is not the same thing as knowing anything about Paris. It does not know what Paris is.

                    I agree with you the word “Paris” exists in it. But I disagree that information is relevant in any human sense.

                    You have apparently been misled into believing a word generation tool contains any information at all other than word weights. Every word it contains is as exactly meaningless to it as every other word.

                    Brains do not store data in this way. Firstly, neural networks are mathematical approximations of neurons. But they are not neurons and do not have the same properties of neurons, even in aggregate. Secondly, brains contain thoughts, memories, and consciousness. Even if that is representable in a similar vector space as LLM neural networks (a debatable conjecture), the contents of that vector space are as different as newts are from the color purple.

                    I encourage you to do some more research on this before continuing to discuss it. Ask ChatGPT itself if its neural networks are like human brains; it will tell you categorically no. Just remember it also doesn’t know what it’s talking about. It is reporting word weights from its corpus and is no substitute for actual thought and research.

    • Not only it doesn't know, but for the people who trained them it is very hard to know whether some piece of information is or isn't inside the model. Introspection about how exactly the model ends up making decisions after it has been trained is incredibly difficult.

    • It’s actually because they do know things in a way that’s analogous to how people know things.

      Let’s say you wanted to forget that cats exist. You’d have to forget every cat meme you’ve ever seen, of course, but your entire knowledge of memes would also have to change. You’d have to forget that you knew how a huge part of the trend started with “i can haz cheeseburger.”

      You’d have to forget that you owned a cat, which will change your entire memory of your life history about adopting the cat, getting home in time to feed it, and how it interacted with your other animals or family. Almost every aspect of your life is affected when you own an animal, and all of those would have to somehow be remembered in a no-cat context. Depending on how broadly we define “cat,” you might even need to radically change your understanding of African ecosystems, the history of sailing, evolutionary biology, and so on. Your understanding of mice and rats would have to change. Your understanding of dogs would have to change. Your memory of cartoons would have to change - can you even remember Jerry without Tom? Those are just off the top of my head at 8 in the morning. The ramifications would be huge.

      Concepts are all interconnected, and that’s how this class of AI works. I’ve owned cars most of my life, so it’s a huge part of my personal memory and self-definition. They’re also ubiquitous in culture. Hundreds of thousands to millions of concepts relate to cats in some way, and each one of them would need to change, as would each concept that relates to those concepts. Pretty much everything is connected to everything else and as new data are added, they’re added in such a way that they relate to virtually everything that’s already there. Removing cats might not seem to change your knowledge of quarks, but there’s some very very small linkage between the two.

      Smaller impact memories are also difficult. That guy with the weird mustache you saw during your vacation to Madrid ten years ago probably doesn’t have that much of a cascading effect, but because Esteban (you never knew his name) has such a tiny impact, it’s also very difficult to detect and remove. His removal won’t affect much of anything in terms of your memory or recall, but if you’re suddenly legally obligated to demonstrate you’ve successfully removed him from your memory, it will be tough.

      Basically, the laws were written at a time when people were records in a database and each had their own row. Forgetting a person just meant deleting that row. That’s not the case with these systems.

      The thing is that we don’t compel researchers to re-train their models on a data set if someone requests their removal. If you have traditional research on obesity, for instance, and you have a regression model that’s looking at various contributing factors, you do not have to start all over again if someone requests their data be deleted. It should mean that the person’s data are removed from your data set it it doesn’t mean that you can’t continue to use that model - at least it never has, to my knowledge. Your right to be forgotten doesn’t translate to you being allowed to invalidate the scientific models generated that glom together your data with that of tens of thousands of others. You can be left out of the next round of research on that dataset, but I have never heard of people being legally compelled to regenerate a model based on that.

      There are absolutely novel legal questions that are going to be involved here, but I just wanted to clarify that it’s really not a simple answer from any perspective.

      • No, the way humans know things and LLMs know things is entirely different.

        The flaw in your understanding is believing that LLMs have internal representations of memes and cats and cars. They do not. They have no memories or internal facts... whereas I think most people agree that humans can actually know things and have internal memories and truths.

        It is fundamentally different from asking you to forget that cats exist. You are incapable of altering your memories because that is how brains work. LLMs are incapable of removing information because the information is used to build the model with which they choose their words, which is then undifferentiatable when it's inside the model.

        An LLM has no understanding of anything you ask it and is simply a mathematical model of word weights. Unless you truly believe humans have no internal reality and no memories and simply say things based on what is the most likely response, you also believe humans and LLM knowledge is entirely different to each other.

    • It's actually not that dissimilar. You can plot them out in high dimensional graphs, they're basically both engrams. Theirs are just much simpler

      • Theirs are composed of word weights. Ours are composed of thoughts. It’s entirely dissimilar.

191 comments