As it turns out, it’s impossible to remove a user’s data from a trained A.I. model. Deleting the model entirely is also difficult—and there’s little regulation to enforce either option.
I'm rather curious to see how the EU's privacy laws are going to handle this.
(Original article is from Fortune, but Yahoo Finance doesn't have a paywall)
"AI model unlearning" is the equivalent of saying "removing a specific feature from a compiled binary executable". So, yeah, basically not feasible.
But the solution is painfully easy: you remove the data from your training set (ie, the source code), and re-train your model (recompile the executable).
Yes, it may cost you a lot of time and money to accomplish this, but such are the consequences of breaking the law. Maybe be extra careful about obeying laws going forward, eh?
Or you know, if it's impossible to strip out individual data, and it's too expensive to retain/retrain models with data removed... Why is everyone overlooking "just don't process private data, and only use public data in model training"?
For the AI heads here: is this another problem caused by the "black box" style of LLM creation where they don't really know how it actually works, so they don't really know how to take out the data?
I feel like one way to do this would be to break up models and their training data into mini-models and mini-batches of training data instead of one big model, and also restricting training data to that used with permission as well as public domain sources. For all other cases where a company is required to take down information in a model that their permission to use was revoked or expired, they can identify the relevant training data in the mini batches, remove it, then retrain the corresponding mini model more quickly and efficiently than having to retrain the entire massive model.
A major problem with this though would be figuring out how to efficiently query multiple mini models and come up with a single response. I'm not sure how you could do that, at least very well...
The Danish government, which has historically been very good about both privacy rights and workers' rights has recently suggested that they are looking into fixing the nurses shortage "via AI".
Our current government is probably the stupidest, most irresponsible and least humanitarian one we've had in my 40 year lifetime if not longer 🤬