Skip Navigation

Speech may have a universal transmission rate: 39 bits per second

www.science.org

Just a moment...

Archive link: https://archive.is/20240503184140/https://www.science.org/content/article/human-speech-may-have-universal-transmission-rate-39-bits-second

Interesting excerpt:

De Boer agrees that our brains are the bottleneck. But, he says, instead of being limited by how quickly we can process information by listening, we're likely limited by how quickly we can gather our thoughts. That's because, he says, the average person can listen to audio recordings sped up to about 120%—and still have no problems with comprehension. "It really seems that the bottleneck is in putting the ideas together."

Ah, here's a link to the paper!

Hacker News @lemmy.bestiver.se

Speech may have a universal transmission rate: 39 bits per second

12 comments
  • They found that Japanese, which has only 643 syllables, had an information density of about 5 bits per syllable, whereas English, with its 6949 syllables, had a density of just over 7 bits per syllable. Vietnamese, with its complex system of six tones (each of which can further differentiate a syllable), topped the charts at 8 bits per syllable.

    That's the part I don't get. How do you determine the bits of information per syllable/word in different languages?

    If I pick a random word such as 'sandwich' and encode it in ASCII it takes 8 bytes / i.e. 64 bits. According to the scientists, a two-syllable word in English only holds 14 bits of actual information. Anyone understands what they did there or has access to the underlying study?

    • You've stumbled upon the dark arts of information theory.

      Sure, conveying "sandwich" in ascii or utf-8 takes 64 bits of information, but that's in an encoding that is by default inefficient.

      For starters, ascii has a lot of unprintables that we normally don't really use to write words. Even if we never use these characters, they take up bits in our encoding because every time we don't use them, we specify that we're using other characters.

      Second, writing and speaking are 2 different things. If you think about it, asking a question isn't actually a separate ("?") character. In speech, asking a question is just a modification of tone, and order of words, on a sentence. While, as literate people, we might think of sentence as written, the truth is that speech doesn't have such a thing as question marks. The same is true of all punctuation marks. Therefore, a normal English sentence also encodes information about the tone of the sentence, including tones we don't really know how to specify in text, and all of that is information.

      This is the linguistic equivalent of kolmogorov complexity which explores the absolute lowest amount of data required to represent something, which in effect requires devising the most efficient possible data encoding scheme.

    • I linked the paper in the OP. Check page 7 - it shows the formulae they're using.

      I'll illustrate the simpler one. Let's say your language allows five syllables, with the following probabilities:

      • σ₁ - appears 40% of the time, so p(σ₁) = 0.4
      • σ₂ - appears 30% of the time, so p(σ₂) = 0.3
      • σ₃ - appears 20% of the time, so p(σ₃) = 0.2
      • σ₄ - appears 8% of the time, so p(σ₄) = 0.08
      • σ₅ - appears 2% of the time, so p(σ₅) = 0.02

      If you apply the first formula, here's what you get:

      • E = -∑ [p(x)log₂(p(x))]
      • E = - { [0.4log₂(0.4)] + [0.3log₂(0.3)] + [0.2log₂(0.2)] + [0.08log₂(0.08)] + [0.02log₂(0.02)] } = 1.91 bit
      • E = 1.91 bits

      Of course, natural languages allow way more than just five syllables, so the actual number will be way higher than that. Also, since some syllables are more likely to appear after other syllables, you need the second formula - for example if your first syllable is "sand" the second one might be "wich" or "ing", but odds are it won't be "dog" (a sanddog? Messiest of the puppies. Still a good boy.)

      If I pick a random word such as ‘sandwich’ and encode it in ASCII it takes 8 bytes / i.e. 64 bits. According to the scientists, a two-syllable word in English only holds 14 bits of actual information.

      ASCII is extremely redundant - it uses 8 bits per letter, but if you're handling up to 32 graphemes then 5 bits is enough. And some letters won't even add information to the word, for example if I show you the word "dghus" you can correctly guess it's "doghouse", even if the ⟨o⟩'s and the ⟨e⟩ are missing.

12 comments