This AI hype cycle has dramatically distorted society's views of what's possible with image upscalers.
A judge in Washington state has blocked video evidence that’s been “AI-enhanced” from being submitted in a triple murder trial. And that’s a good thing, given the fact that too many people seem to think applying an AI filter can give them access to secret visual data.
No computer algorithm can accurately reconstruct data that was never there in the first place.
Ever.
This is an ironclad law, just like the speed of light and the acceleration of gravity. No new technology, no clever tricks, no buzzwords, no software will ever be able to do this.
Ever.
If the data was not there, anything created to fill it in is by its very nature not actually reality. This includes digital zoom, pixel interpolation, movement interpolation, and AI upscaling. It preemptively also includes any other future technology that aims to try the same thing, regardless of what it's called.
One little correction, digital zoom is not something that belongs on that list. It’s essentially just cropping the image. That said, “enhanced” digital zoom I agree should be on that list.
Digital zoom is just cropping and enlarging. You're not actually changing any of the data. There may be enhancement applied to the enlarged image afterwards but that's a separate process.
But the fact remains that digital zoom cannot create details that were invisible in the first place due to the distance from the camera to the subject. Modern implementations of digital zoom always use some manner of interpolation algorithm, even if it's just a simple linear blur from one pixel to the next.
The problem is not in how a digital zoom works, it's on how people think it works but doesn't. A lot of people (i.e. [l]users, ordinary non-technical people) still labor under the impression that digital zoom somehow makes the picture "closer" to the subject and can enlarge or reveal details that were not detectable in the original photo, which is a notion we need to excise from people's heads.
I 100 % agree on your primary point. I still want to point out that a detail in a 4k picture that takes up a few pixels will likely be invisible to the naked eye unless you zoom. "Digital zoom" without interpolation is literally just that: Enlarging the picture so that you can see details that take up too few pixels for you to discern them clearly at normal scaling.
Digital zoom makes the image bigger but without adding any detail (because it can't). People somehow still think this will allow you to see small details that were not captured in the original image.
Also since companies are adding AI to everything, sometimes when you think you're just doing a digital zoom you're actually getting AI upscaling.
There was a court case not long ago where the prosecution wasn't allowed to pinch-to-zoom evidence photos on an iPad for the jury, because the zoom algorithm creates new information that wasn't there.
There's a specific type of digital zoom which captures multiple frames and takes advantage of motion between frames (plus inertial sensor movement data) to interpolate to get higher detail. This is rather limited because you need a lot of sharp successive frames just to get a solid 2-3x resolution with minimal extra noise.
It preemptively also includes any other future technology that aims to try the same thing
No it doesn't. For example you can, with compute power, for distortions introduced by camera lenses/sensors/etc and drastically increase image quality. For example this photo of pluto was taken from 7,800 miles away - click the link for a version of the image that hasn't been resized/compressed by lemmy:
The unprocessed image would look nothing at all like that. There's a lot more data in an image than you can see with the naked eye, and algorithms can extract/highlight the data. That's obviously not what a generative ai algorithm does, those should never be used, but there are other algorithms which are appropriate.
The reality is every modern photo is heavily processed - look at this example by a wedding photographer, even with a professional camera and excellent lighting the raw image on the left (where all the camera processing features are disabled) looks like garbage compared to exactly the same photo with software processing:
No computer algorithm can accurately reconstruct data that was never there in the first place.
What you are showing is (presumably) a modified visualisation of existing data. That is: given a photo which known lighting and lens distortion, we can use math to display the data (lighting, lens distortion, and input registered by the camera) in a plethora of different ways. You can invert all the colours if you like. It's still the same underlying data. Modifying how strongly certain hues are shown, or correcting for known distortion are just techniques to visualise the data in a clearer way.
"Generative AI" is essentially just non-predictive extrapolation based on some data set, which is a completely different ball game, as you're essentially making a blind guess at what could be there, based on an existing data set.
making a blind guess at what could be there, based on an existing data set.
Here's your error. You yourself are contradicting the first part of your sentence with the last. The guess is not "blind" because the prediction is based on an existing data set . Looking at a half occluded circle with a model then reconstructing the other half is not a "blind" guess, it is a highly probable extrapolation that can be very useful, because in most situations, it will be the second half of the circle. With a certain probability, you have created new valuable data for further analysis.
But you are not reporting the underlying probability, just the guess. There is no way, then, to distinguish a bad guess from a good guess. Let's take your example and place a fully occluded shape. Now the most probable guess could still be a full circle, but with a very low probability of being correct. Yet that guess is reported with the same confidence as your example. When you carry out this exercise for all extrapolations with full transparency of the underlying probabilities, you find yourself right back in the position the original commenter has taken. If the original data does not provide you with confidence in a particular result, the added extrapolations will not either.
And then circles get convictions so even if the model did somehow start off completely unbiassed people are going to start feeding it data that weighs towards finding more circles since a prosecution will be used as a 'success' to feed back into the model and 'improve' it.
Looking at a half circle and guessing that the "missing part" is a full circle is as much of a blind guess as you can get. You have exactly zero evidence that there is another half circle present. The missing part could be anything, from nothing to any shape that incorporates a half circle. And you would be guessing without any evidence whatsoever as to which of those things it is. That's blind guessing.
Extrapolating into regions without prior data with a non-predictive model is blind guessing. If it wasn't, the model would be predictive, which generative AI is not, is not intended to be, and has not been claimed to be.
None of your examples are creating new legitimate data from the whole cloth. They're just making details that were already there visible to the naked eye. We're not talking about taking a giant image that's got too many pixels to fit on your display device in one go, and just focusing on a specific portion of it. That's not the same thing as attempting to interpolate missing image data. In that case the data was there to begin with, it just wasn't visible due to limitations of the display or the viewer's retinas.
The original grid of pixels is all of the meaningful data that will ever be extracted from any image (or video, for that matter).
Your wedding photographer's picture actually throws away color data in the interest of contrast and to make it more appealing to the viewer. When you fiddle with the color channels like that and see all those troughs in the histogram that make it look like a comb? Yeah, all those gaps and spikes are actually original color/contrast data that is being lost. There is less data in the touched up image than the original, technically, and if you are perverse and own a high bit depth display device (I do! I am typing this on a machine with a true 32-bit-per-pixel professional graphics workstation monitor.) you actually can state at it and see the entirety of the detail captured in the raw image before the touchups. A viewer might not think it looks great, but how it looks is irrelevant from the standpoint of data capture.
They talked about algorithms used for correcting lens distortions with their first example. That is absolutely a valid use case and extracts new data by making certain assumptions with certain probabilities. Your newly created law of nature is just your own imagination and is not the prevalent understanding in the scientific community. No, quite the opposite, scientific practice runs exactly counter your statements.
That's wrong. With a degree of certainty, you will always be able to say that this data was likely there. And because existence is all about probabilities, you can expect specific interpolations to be an accurate reconstruction of the data. We do it all the time with resolution upscaling, for example. But of course, from a certain lack of information onward, the predictions become less and less reliable.
In my first year of university, we had a fun project to make us get used to physics. One of the projects required filming someone throwing a ball upwards, and then using the footage to get the maximum height the ball reached, and doing some simple calculations to get the initial velocity of the ball (if I recall correctly).
One of the groups that chose that project was having a discussion on a problem they were facing: the ball was clearly moving upwards on one frame, but on the very next frame it was already moving downwards. You couldn't get the exact apex from any specific frame.
So one of the guys, bless his heart, gave a suggestion: "what if we played the (already filmed) video in slow motion... And then we filmed the video... And we put that one in slow motion as well? Maybe do that a couple of times?"
A friend of mine was in that group and he still makes fun of that moment, to this day, over 10 years later. We were studying applied physics.
Perhaps at some point we will conquer quantum mechanics enough to be able to observe particles at every place and time they have ever and will ever exist. Do that with enough particles and you've got a de facto time machine, albeit a read-only one.
So many things we believe to be true today suggest this is not going to happen. The uncertainty principle, and the random nature of nuclear decay chief among them. The former prevents you gaining the kind of information you would need to do this, and the latter means that even if you could, it would not provide the kind of omniscience one might assume.
Limits of quantum observation aside, you also could never physically store the data of the position/momentum/state of every particle in any universe within that universe, because the particles that exist in the universe are the sum total of the materials with which we could ever use to build the data storage. You've got yourself a chicken-and-egg scenario where the egg is 93 billion light years wide, there.