When A.I.’s Output Is a Threat to A.I. Itself | As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results.
...studies have found that this process can amplify biases in the data and is more likely to erase data pertaining to minorities.
Later on...
new research suggests that when humans curate synthetic data (for example, by ranking A.I. answers and choosing the best one), it can alleviate some of the problems of collapse.
It definitely won't solve the biases part, unless we select against it.
Yeah I read that as a caveat to the larger point, i.e. just acknowledging that there are limited cases where the use of synthetic training data has been shown to be useful.