For further info, if anyone is interested, Stephen Bax claimed a decade ago to partially decode the manuscript; here’s a video with his reasoning, as well as the paper he released. Sadly Bax passed away in 2017 (may he rest in peace), so the work was left incomplete.

  • Lvxferre [he/him]OPM
    link
    fedilink
    arrow-up
    3
    ·
    15 days ago

    If it’s a hoax, it’s a bloody great job - they started it centuries ago, but it’s still trolling us in 2025. It’s worth to check how they did it. (@teft@lemmy.world posted a video with one way they could’ve done it.)

    And if it’s a lost language, my guess is like Bax: Indo-Aryan language, either Romani or related to. That should explain why it’s so hard to decode - we’re basically looking for your typical European language when it’s something way less typical.

      • Lvxferre [he/him]OPM
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        15 days ago

        This video has some great info, and it’s easier to approach than videos talking directly about the entropy issue.

        Hypothetically speaking, if there’s a language encoded there, the problem of relatively few letters can be still addressed by multiple phonemes sharing the same letter.

        For example, not representing VOT distinctions - so pairs like “peat” vs. “bead” are written the same, even if phonemically distinct. It seems the Southwestern Paleohispanic worked like this, so even if it’s syllabary-ish you’d see way less symbols than you’d expect for one (only ~30).

        That introduces some complexity though - it’s the sort of thing you’d expect to see when a language is trying to adapt the writing system of another, and so far we didn’t attest the writing system elsewhere. Plus it makes Ockham’s Razor scream bloody murder.

        • Coelacanth@feddit.nu
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          14 days ago

          It’s also notable for the repetitive strings it produces that is very unlike most (all?) known languages, such as the famous: “qokeedy qokedy shedy tchedy […] qokal otedy qokedy qokedy dal qokedy qokedy rgam.”

          It’s also possible the manuscript could contain two (or more) different languages, or different encoding methods, depending on what it is. See: https://www.voynich.nu/extra/lang.html