The world’s most important knowledge platform needs young editors to rescue it from chatbots – and its own tired practices, says tech writer Stephen Harrison
The world’s most important knowledge platform needs young editors to rescue it from chatbots – and its own tired practices
Established in 2001, Wikipedia is an “old man” by internet standards. But the role it plays in our collective knowledge of the world remains astonishing. Content from the free internet encyclopedia appears in everything from high-school term papers and pub trivia questions to search engine summaries and voice assistants. Tools like Google’s AI Overviews and ChatGPT rely heavily on Wikipedia, although they rarely credit the site in their responses.
And therein lies the problem: as Wikipedia’s visibility diminishes, reduced to mere training data for AI applications, it also loses prominence in the minds of readers and potential contributors. When someone notices a topic that is poorly described on Wikipedia, they might feel motivated to correct it. But this can-do spirit goes away when the error comes through an AI summary, where the source of the information isn’t clear.
Wikipedia is fine, it isn't "losing prominence." This is willful misinterpretation of a speech to make it sound more dire, a nonsense AI propaganda angle, and a bunch of ageist nonsense about Gen-Z that will be immediately familiar to anyone who pays attention to this kind of slop.
It's actually not easy to ensure that an LLM will cite a correct source, in the same way it's not easy to ensure that it will provide accurate information. It's based on token probability, not deterministic lookups of "this data came from this source." It could entirely make something up, then write "Source:" and then probabilistically write "Wikipedia" because those tokens commonly follow those for "Source."
If you have an AI bot that looks up information in real time, then that would be easy. But for a trained LLM, the training process is highly destructive. Original information is not preserved except in relationships based on probability.
Right, in my experience the majority of URLs generated by LLMs are just jumbles of letters that vaguely look like a URL. A fundamental architecture difference needs to happen in one way or another to properly cite sources, and it’s really bad for performance.
I choose to interpret the grandparent commenter's use of "easily" to mean "not impossible, and an ethical obligation, so you'd better fuckin' make it a priority."
Yeah Bing Chat had sources for a while (not sure if it still does) and when I checked the sources, the frequently didn't contain the claim in question. So even if you get it to cite real pages, it just doesn't work the same way as human citations do.
Maybe specifying source should be a legal requirement if the LLM service provider shouldn't automatically be held accountable for the answers their services produce?
Agreed. ChatGPT doesn’t like to cite sources. Microsoft CoPilot and Google Gemini do link to some sources, though not as accurate or thorough like Wikipedia.
What I don't understand is how Microsoft has/has Watson which was able to answer questions well enough to go on Jeopardy and dominate. And now, more than a decade later these LLMs absolutely suck at it.
It makes me wonder if Watson was nothing more than a Mechanical Turk because what is out there now seems like a huge step backwards.
I don't get it then, why are all these companies so gung-ho to replace something that was working with an AI that doesn't?
It's less accurate, it uses way more energy, it doesn't show its work, it doesn't cite its source, and it'll make up shit that sounds right when it needs to. Why would anyone think AI is worth putting in any consumer product at this rate?
You should stop. The wikimedia foundation has all the money it needs to fund wikipedia perpetually. The endowment was met years and years ago. Your money is being spent on parasitic non-profit management class nonsense things.
That essay isn't terribly well thought out. They have an issue with the increase in employees, but lack any evidence that they're not actually required. The core of their thesis seems to be "it was fine with fewer employees before, why do we need more now?" but they fail to provide much supporting evidence beyond substantiating increasing levels of spending over the years.
Edit: also, this is seven years old and it appears Guy's predictions have yet to even begin to manifest.
Oh for chrisakes. I also donate to The Wikimedia Foundation, feeling secure in the knowledge that at least I could feel good about that one. Time to do some reading I guess.