If you were tasked with putting together a message to send to space aliens, with the only restriction that the message had to be under 4GB, what would you send?
So you could do text-only Wikipedia and probably compress it. Maybe drop a few thousand of the articles that don't matter or are stubs. Drop all the entertainment articles, etc.
I don't think you appreciate how much text you can fit into 4GB. The first entire gigabyte could be dedicated to various means of translation and explaining our language system, and you'd still have a 500 million words left after that.
Damn, what a cheapskate. A chance to play Minecraft with a friendly space alien and you can’t even pay for a legit copy. Probably going to give that alien a computer virus and doom us all. Don’t put this guy in charge.
It's explained in the PDF. If they used normal numbers and letters, the chance of corruption would be high. So they had to reform into the symbols you see.
I still don't get why we use Pi instead of Tau though, when most equations double it up into Tau anyway.
Probably an AI model that fits in that size. It might not be our best models, but it probably would be a lot more useful to aliens than whatever we'd decide to fit on 4GB.
They'd get mostly all the inner workings of our languages and how we do conversations and generally be able to answer basic questions about humanity.
What ? "an AI model" is not a compression algorithm. Why give the aliens an AI trained with some wikipedia articles when you could just give them wikipedia.
Because an LLM is more than just data: it's like a big network of how syllables and words go together based on some context. And that's useful because language is how we communicate, how we connect ideas together, it's how we share stories. It's not just Wikipedia articles, it's a database of relationships between words and concepts. It approximates how we think as humans.
Yes, AI is hella overhyped. Everyone wants to AI everything. But really for this particular situation, I think the model data would actually be the best precompiled database of knowledge we can possibly provide to learn about humans for the size.
No it's not magic compression, but 4GB worth of parameters is still a lot. GPT4All has models just under 4GB. They're not particularly impressive compared to OpenAI's offerings, but I think you can extract a lot more practical information to do first contact out of a basic model than 4GB worth of Wikipedia. It's extremely lossy compression, it's never gonna spit out articles vebatim, it will hallucinate a ton of stuff.
If we had more space I'd send all the major AIs we have like Dall-E, LLaMa and GPT 4. Imagine you're an alien, you're presented with a keyboard and a monitor, and know nothing about us. You can use Dall-E to try random letters and words and see if the output makes sense. Maybe you find out what a cat, dog, bat, frog, apple looks like. You can then input those words in ChatGPT, and get context as to when those are used. What's "a horse"? What's "riding"? Put those into Dall-E, now you know what a "cat riding a horse" looks like. It can generate as many as you want, any combination. Eventually you can figure out how to ask ChatGPT if cats typically ride horses, cars, bycles, what do cats do.
Now imagine you're a very advanced alien species that can easily process the model's parameters. You've just downloaded the basics of humanity. They can map their language to our model's parameters, and basically speak to us in our language, and translate our answers to theirs, and basically have a basic conversation.
A history of all of our misdeeds and self-inflicted suffering, probably 1gb of compressed literature and 3gb of imagery and video, along with an Earnest plea:
Please if you are able, either teach us how to save us from ourselves, or be merciful and destroy us. Don't let this self-inflicted carnage of we barely sapient creatures commit on one another due to lack of meaningful intellect or empathy continue.
Either take our hand and teach us as the confused, selfish, irrational children that we are, or just end this evolutionary mistake.
I imagine sign language would be much less information dense than text, as it would have to be pictures. And they may not even understand the sign language.
I think they meant "age/sex/location?" which was a common first question when getting to know someone online when instant messengers as a concept were novel