TIL American linguist Morris Swadesh compiled a list of words he deduced would be in a Proto-Indo-European language, the language that is the root of languages across Europe and Asia.
In every language, most words that most people use are still used from a very long time ago, we just pronounce them a bit differently.
*(H)óynos, *dwó, *treyes for example aren't just uno, deux and tres, they are 1, 2 and 3 in English, French, German, Italian, Greek, Russian, Hindi, Farsi, Kurdish, Tajik etc. Literally the same word, just spoken by different groups of descendent speakers.
Some languages have undergone sound changes that make certain words sound more or less similar to how we think they sounded in PIE.
So even though though four, vier, quattuor and tessera sound quite different to us, they are all basically just how we say *kʷetwóres.
1 in russian is один, I think it's quite different from one/uno/un (especially since the о is pronounced а). 2 and 3 are instead extremely similar (два три). Does it actually still come from the same root?
While not being competent in this subject, I found it very fascinatinf that ugro-finnic languages (which are not indoeuropean AFAIK) like Finnish or Estonian are so wildly different, so that 1 2 and 3 are üks, kaks, kolm (in Estonian), for example.
Close: it's French quatre (4), not "quatorze" (14). It goes like this: PIE *kʷetwóres → Latin ⟨quattuor⟩ /kʷattuor/ → Old French ⟨quatre, catre⟩ /kʷatɾə/~/katɾə/ → contemporary French ⟨quatre⟩ /katʁ(ə)/.
French ⟨quatorze⟩ does contain that *kʷetwóres, but it's only the "quator-". The "-ze" is from Proto-Indo-European *déḱm̥ (10). This gets easier to see in Latin, as the word for 14 was ⟨quattuordecim⟩ (literally four-ten).
Note that almost all English words that you used to translate the PIE words are also examples of those PIE words being still in use nowadays - they're direct descendants, for example *kʷis → who, *éǵh₂ → I, etc. In English, German, Swedish and other Germanic languages, this gets a bit obscured due to some old sound change called Grimm's Law. (EDIT: the only exception is the second line - *túh, *te became "thou, thee".)
We finns aren't even a PIE language, and we still use some clearly from those.
The word for sea is basically exactly the same, depending on the pronunciation. We say "meri", it's marked down as "móri". In Finnish yellow is "keltainen" and PIE says "ǵʰelh₃-".
[Shameless advertisement: we have a linguistics community, !linguistics@mander.xyz . I'm the mod there; I apologise for the relative lack of activity nowadays, but everyone is welcome to post this sort of stuff there.]
What Morris Swadesh did was at the same time simpler and greater than that: he created a list of concepts likely to pop up across many different languages, regardless of their time period and area. This is extremely useful to track the relationship between multiple languages, even if you don't speak them.
I'm not sure if he created one for Proto-Indo-European; "Swadesh list" became a generic name for this sort of list, regardless of who compiles it. Plus Morris Swadesh main interest was Amerindian languages.
What do all those asterisk and numbers mean? I feel like I'm missing a key to decode that.
Edit: Damn, that little post lead to an actual TIL moment. I'm actually going to copy all your answers for future reference, since I enjoy etymology and get into contact with these symbols a lot. Thanks, everyone!
Asterisk means that the word has been reverse engineered without any direct evidence backing it up. All proto languages will have asterisks in front of all their words because proto languages are, by definition, languages that were used before anything was written down.
The reverse engineered word is likely to be correct (or at least, as correct as we can be), but in the absence of direct evidence, it's still just guesswork
The numbers you're talking about are because we know that there are different consonants used, but we don't entirely know what sounds those consonants are. So we just write all of the consonants that likely sounded somewhat like the letter h as h1, h2, h3, etc., and repeat for the other uncertain consonants.
So basically h1 definitely sounds different than h2, but as for exactly what they sound like, all we know is that both of them are kinda like h
This is mostly correct so I'll focus on small specific details, OK?
Asterisk means not directly attested. In reconstructions it goes as you say, but you'll also see them before things that you don't expect speakers to use, in synchronic linguistics; for example *me apple eat gets an asterisk because your typical English speaker wouldn't use it.
It is kind of "guesswork" but it follows a very specific procedure, called the comparative method. As in, it is not an "anything goes".
The sounds represented as *h₁, *h₂, *h₃, *h₄ and *H do not necessarily sound like [h]. At this point they're simply part of the notation. For example, a common hypothesis is that *h₁ was [ʔ], it's more like the sound in "oh-oh" than like [h]. And some argue that they aren't even the sounds themselves, but rather the effect of the sounds on descendant words (the difference is important because, if two sounds had the same effect, they ended with the same symbol).
Since Contramuffin answered most of it, I'll focus on the diacritics.
The acute in *ḱ *ǵ *ǵʰ shows that they aren't the same as *k *g *gʰ. Odds are that the ones with an acute were pronounced with the tongue a bit fronted (palatalised).
The acute over other consonants, plus in *é *ó, is something else entirely. It's the accent - you're supposed to pronounce those consonants with a higher pitch. "Yay, consistency" /s
The macron over *ē *ō is to show that the vowel is loooong.
Those floating ⟨ʰ⟩ refer to aspiration. Aspiration is that "puff" of air that you release when you say ⟨pill, till, kill⟩ but not when you say ⟨spill, still, skill⟩. In English this is not distinctive, but in a lot of other Indo-European languages it is, and the mainstream hypothesis is that it was distinctive in PIE itself.
In the meantime, a floating ⟨ʷ⟩ means that the consonant is pronounced with rounded lips. The difference between something like *kʷ and *kw is mostly that the first one behaves like a single consonant, the other as two.
That ring under some consonants is to highlight that they're syllabic, as if they were LARPing as vowels. It's a lot like writing "button" as "buttn̥".
If you ever see *ə₁ *ə₂ *ə₃ etc., pretend that that "ə" is "h₁". It's simply different ways to annotate the same stuff.
*H means "this is *h₁ or *h₂ or *h₃, but we have no clue on which".
The asterisks mean that it's a reconstructed word, not an actually attested word. No PIE writing exists, so all these words are deduced from the descendant words across the vast IE family, not recorded as definitely existing.
The numbers are h1, h2 and h3. We know that PIE had three different h-like sounds (pharyngeals). But, funny thing, every single daughter language lost them, so it's hard to tell exactly what each one sounded like. If you see overlapping ripples in a pond, you can work out roughly where and how many things dropped, but you can't guess the exact weight or size.