By Iris Kowalczyk · Published May 8, 2026 · Updated May 8, 2026
Last reviewed: May 8, 2026.
The Voynich Manuscript: A Cold-Case File on the World’s Most Studied Cipher
The Voynich Manuscript is a 234-page illustrated codex written in an unknown script, carbon-dated to 1404-1438 and held at Yale’s Beinecke Rare Book and Manuscript Library as MS 408. More than a century of cryptanalysis, statistical study, and linguistic comparison has narrowed the field of possibilities, but no decipherment has held up under peer review.
Treat this case the way a cold-case team would treat a missing-person file. There is a body of evidence: the parchment, the ink, the script, the illustrations, the handful of marginal notes in known languages. There is a chain of custody, broken in places. There are witnesses, most of them long dead, who handled the object before it reached Wilfrid Voynich in 1912. The job is to lay out what survived, distinguish inference from speculation, and name the hypotheses that the surviving evidence will and will not support.
This file moves in order. Object first, timeline second, internal evidence third. Then the failed solutions and what they eliminated. Then the cryptographic hypotheses still on the table. The companion case file at the historical and archaeological reconstruction of the Voynich Manuscript covers the wider provenance, the named owners, and the illustrated content. This investigation focuses specifically on the cipher problem.
The Object of the Investigation: Beinecke MS 408
The case object is a quarto-sized volume of vellum, bound but with several missing folios. The Beinecke catalogues it as MS 408 and describes 102 surviving leaves with original foliation that runs to 116, indicating roughly fourteen leaves are gone. Several foldout sheets, including a nine-rosette cosmological chart, expand certain pages into multi-panel illustrations [1].
What Survived in the File
The text runs in a continuous, fluent hand. Letters connect; words separate cleanly; paragraphs hold their shape. Illustrations divide the manuscript into informal sections that scholars have labeled herbal, astronomical, balneological, cosmological, pharmaceutical, and recipes. Approximately 113 botanical drawings and 73 astrological figures sit alongside diagrams of nude women in green pools connected by tubular structures [1].
A few marginal annotations break the silence. One folio carries a Germanic-looking phrase, another shows zodiac month names in what reads as medieval French or Occitan. These marginalia are demonstrably later additions, not part of the primary scribal program. The Beinecke website provides full color digitization, which has made the manuscript the most-studied undeciphered text in the world [1].
Provenance and the Chain of Custody
The earliest documentary trace places the manuscript in Prague in the late sixteenth century, where the alchemist Georg Baresch and the polymath Johannes Marcus Marci correspond about it with the Jesuit scholar Athanasius Kircher in Rome. A 1665 letter from Marci to Kircher, preserved in the manuscript itself, names a previous owner: the Holy Roman Emperor Rudolf II, who reportedly paid 600 ducats for it [2]. The trail then goes quiet for two and a half centuries inside Jesuit collections in Italy.
Wilfrid Voynich, a Polish-Lithuanian book dealer, purchased the manuscript in 1912 from the Jesuit library at the Villa Mondragone near Frascati. The Jesuits sold it under conditions of confidentiality. Voynich’s widow Ethel Lilian Voynich kept the book until her death; her secretary Anne Nill sold it to the bookseller H. P. Kraus in 1961, and Kraus donated it to Yale’s Beinecke Library in 1969 [3].
Reconstructing the Timeline: From 15th-Century Vellum to Yale
The most consequential piece of physical evidence arrived in 2009. The University of Arizona accelerator mass spectrometry laboratory tested four samples of the parchment. All four returned consistent dates with a combined 95 percent confidence interval of 1404 to 1438 [4]. This anchors the manuscript firmly in the early fifteenth century and rules out a long list of suspects.
The carbon date eliminates John Dee and Edward Kelley, who were once proposed as forgers selling the book to Rudolf II. It eliminates Roger Bacon, an earlier candidate from the wrong century. It eliminates any sixteenth- or seventeenth-century hoax theory unless the hoaxer somehow obtained a quire of unused fifteenth-century vellum, then wrote on it without leaving traces of an erased earlier text. Multispectral imaging of the parchment shows no underlying writing; this is not a palimpsest [5].
What the Carbon Date Does Not Settle
Vellum can be stored. Ink chemistry has been examined separately and shows materials consistent with the medieval period, but ink dating is less precise than parchment dating. The interval between the slaughter of the calf and the writing on the prepared skin could be days or decades. The carbon date establishes the earliest plausible date for the writing, not the latest. The case file therefore treats fifteenth century as established and any later writing as unproven but not strictly impossible.
The 1912 Acquisition and the Marci Letter
Voynich’s own account of the 1912 purchase has been challenged. He claimed the book was found in a chest at Villa Mondragone among other Jesuit holdings. Recent archival work, including the research of Rene Zandbergen on the Marci letter and the Kircher correspondence, supports the broad outline of his story while correcting some details about which Jesuit library held the volume immediately before sale [2]. The provenance for the period 1666 to 1912 remains the weakest link in the chain of custody.
The Witness Sheet: What “Voynichese” Looks Like
The script, called Voynichese by convention, uses an alphabet of roughly 20 to 30 distinct glyphs depending on how variants are counted. Words separate by spaces. Paragraphs run left to right. The hand is fluent, suggesting the scribes wrote at speed in a system they knew well. The text shows none of the hesitation marks, corrections, or false starts that characterize learners or copyists working from a difficult exemplar [6].
Currier A and Currier B: Two Hands or Two Languages?
In the 1970s, the U.S. Navy cryptologist Prescott Currier identified two distinct statistical patterns in the manuscript. He labeled them Currier A and Currier B and noted that they differed in glyph frequency, word structure, and section preference. Language A dominates the herbal section. Language B prevails in the balneological and pharmaceutical sections [7]. Modern paleographic analysis by Lisa Fagin Davis has refined this picture, identifying as many as five separate scribal hands working on the manuscript [3].
The Currier observation matters because it constrains the cipher hypothesis. A single substitution cipher applied uniformly should not produce two statistical languages. Either two ciphers were used, or the underlying plaintext changed register between sections, or different scribes encoded the same source differently. Each option carries its own evidentiary cost.
Statistical Fingerprints: Zipf, Entropy, and the Abjad Argument
The Voynichese vocabulary follows Zipf’s law, the rank-frequency distribution that holds for natural human languages [8]. The token-rank curve of the manuscript matches the curve of medieval Latin, German, or Arabic. This finding alone makes the simple-hoax hypothesis expensive: producing a Zipfian distribution by random imitation is non-trivial, and Zipf’s law itself was not formalized until 1935.
Sravana Reddy and Kevin Knight at the University of Southern California published the most-cited statistical study in 2011. They found that conditional character entropy in Voynichese sits near 2 bits per character, well below the 3 to 4 typical of natural languages. The character-position rules resemble an abjad, a writing system that omits vowels, with many glyphs appearing only at word starts or word ends [9]. Word-length distribution is unusually narrow and binomial, unlike most natural languages but consistent with certain cipher systems.
Read together, the statistical evidence points away from random gibberish and away from a clean one-to-one substitution of any single known language. The script behaves like ciphertext that preserves some structure of an underlying language while flattening other features.
Failed Decipherments and What They Ruled Out
Every decade since 1912 has produced a confident decipherment. None has held up to scrutiny. The pattern of failure is itself useful evidence. It tells the investigator what kinds of solution the manuscript will not accept.
The Newbold Anagram Theory (1921)
William Romaine Newbold, a Penn philosopher, claimed in 1921 that microscopic markings inside Voynich glyphs encoded Latin shorthand traceable to Roger Bacon. Newbold died in 1926 and his student published the work posthumously. The cryptanalyst John Manly demolished the method in 1931, showing that Newbold’s microscopic markings were artifacts of cracked iron-gall ink and that his decoding procedure was so flexible it could produce any desired plaintext [10].
The Cheshire Proto-Romance Claim (2019)
In May 2019, the journal Romance Studies published a paper by Gerard Cheshire, then a research associate at the University of Bristol, claiming the manuscript was written in a “proto-Romance” language by Dominican nuns for Maria of Castile, Queen of Aragon [11]. The claim was rejected within hours by the Yale medievalist Lisa Fagin Davis, executive director of the Medieval Academy of America, who showed that proto-Romance is not a recognized language category and that Cheshire’s substitution table produced inconsistent translations of identical glyphs [3]. The University of Bristol withdrew its press release and removed the announcement from its website.
The AI Hebrew Hypothesis (2018)
Greg Kondrak and Bradley Hauer at the University of Alberta ran the manuscript through a language-identification algorithm trained on 400 modern translations of the Universal Declaration of Human Rights. The model identified Hebrew as the most probable source language and produced a Google Translate output that read “she made recommendations to the priest, man of the house, and me and people” [12]. Davis again pointed out that an algorithm trained on modern languages cannot reliably classify a fifteenth-century document, and that the output sentence is the kind of plausible nonsense modern statistical translation regularly generates from gibberish input. The work has not been replicated.
The Cryptographic Hypotheses Still On the Table
Eliminating bad solutions narrows the field. The hypotheses that survive the statistical evidence and the carbon date fall into a small set, each carrying its own predictions and its own residual problems.
Verbose Substitution and Polyalphabetic Variants
A verbose cipher replaces each plaintext letter with a fixed group of two or more ciphertext glyphs. Verbose substitution naturally produces low conditional entropy and constrained word-length distributions, both observed in Voynichese. Nick Pelling has argued since 2006 for a fifteenth-century North Italian origin, possibly involving the architect Antonio Averlino called Filarete, encoding herbal and medical knowledge under a verbose cipher [13]. A 2025 paper by Donald Fisk introduced the “Naibbe cipher,” a verbose substitution scheme demonstrably constructable with fifteenth-century materials that reproduces several of the manuscript’s distinctive statistical features when applied to Latin or Italian plaintext [14].
Coded Medical or Trade Shorthand
Medieval guilds, apothecaries, and esoteric practitioners used private shorthand systems to protect proprietary recipes. A shorthand of this kind could explain the herbal and balneological focus of the illustrations, the absence of any matching glyph corpus elsewhere, and the use by multiple scribes within a closed circle. The hypothesis predicts that the underlying language is a known European vernacular or Latin and that decipherment will require both the shorthand key and the technical vocabulary it abbreviates.
Glossolalic or Constructed Language
The Voynich could be a constructed language created for ritual, mnemonic, or therapeutic purposes. Hildegard of Bingen produced a constructed vocabulary called the Lingua Ignota in the twelfth century, and the practice continued in mystical and esoteric circles. A constructed-language Voynich would explain the Zipfian distribution and the consistent grammar while accommodating the absence of any matching natural language. The cost of this hypothesis is that constructed languages of the period are usually documented elsewhere; the Voynich is not.
Sophisticated Hoax
The hoax remains a live hypothesis but a costly one. Gordon Rugg has shown that a Cardan-grille technique using a paper template could produce Voynich-like text efficiently in the sixteenth century [15]. The carbon date pushes any hoax back into the early fifteenth century, when the technique would still be possible but less motivated by a known market. A successful hoax hypothesis must explain not only the Zipfian word-frequency distribution but the bigram and trigram regularities that go beyond what a Cardan grille typically produces.
Why the Case Resists Closure
A normal cipher yields to one of three approaches: known-plaintext attack, frequency analysis cross-checked against suspected language, or recovery of the encoder’s notes. The Voynich resists all three. There is no Rosetta-style bilingual companion. Frequency analysis runs into the verbose-cipher signature, which scrambles single-letter frequencies. The encoders, if encoders existed, left no surviving keys, codebooks, or correspondence describing the system.
The illustrations, which should help, mostly hinder. The herbal drawings depict no plant that has been securely identified with a known fifteenth-century species, although Edith Sherwood and others have proposed candidates. The astronomical figures show recognizable zodiac signs but not constellations specific enough to anchor a star catalogue. The bathing women in green pools have no precise iconographic parallel. The illustrations seem to gesture at content categories without carrying enough specific information to constrain the underlying text.
The case stays open because every individual line of evidence has at least one credible counter-explanation, and no reconstruction yet ties all the lines together without forcing the data. Provisional conclusions are these: the manuscript is a real fifteenth-century artifact; the script encodes structured information rather than random noise; the encoding is more sophisticated than a simple substitution cipher; and the underlying content, if it exists, has not yet been recovered. The notes column is full. The report column waits.
Frequently Asked Questions About the Voynich Manuscript Cipher
Has the Voynich Manuscript ever been deciphered?
No decipherment has been accepted by the cryptographic, linguistic, or medievalist communities. Multiple confident solutions have been published since 1921, including the Newbold anagram theory, the Cheshire proto-Romance paper, and the Kondrak Hebrew-AI hypothesis. Each was rejected on methodological grounds, usually for producing inconsistent or unreproducible translations.
How old is the Voynich Manuscript?
Radiocarbon dating performed at the University of Arizona in 2009 placed the parchment at 1404 to 1438 with 95 percent confidence. Multispectral imaging confirmed the manuscript is not a palimpsest. The ink chemistry is consistent with medieval iron-gall ink. The writing was therefore done in or after the early fifteenth century.
Who wrote the Voynich Manuscript?
The author is unknown. Paleographic analysis suggests at least two and possibly five distinct scribal hands. The earliest documented owner is the Holy Roman Emperor Rudolf II in Prague, who is said to have purchased it for 600 ducats. Earlier provenance is undocumented. The Marci letter of 1665 names Roger Bacon as a candidate, but the carbon date rules him out.
Where is the Voynich Manuscript today?
The manuscript is held at the Beinecke Rare Book and Manuscript Library at Yale University as MS 408. It has been there since 1969, when the bookseller H. P. Kraus donated it. The full manuscript is digitized and freely viewable through the Beinecke’s online catalogue.
What is “Voynichese”?
Voynichese is the conventional name for the script and language of the manuscript. It uses roughly 20 to 30 distinct glyphs depending on how variants are counted. The script reads left to right, separates words with spaces, and follows recognizable grammatical patterns. It does not match any known historical writing system.
What did Reddy and Knight discover about the text?
In a 2011 study at the University of Southern California, Sravana Reddy and Kevin Knight measured conditional character entropy in Voynichese at approximately 2 bits per character, well below the 3 to 4 of most natural languages. They argued the script behaves like an abjad, with strong positional rules, and shows statistical resemblance to Pinyin-transliterated Chinese.
Was the Voynich Manuscript a hoax?
The hoax hypothesis remains live but expensive. Gordon Rugg demonstrated in 2004 that a Cardan-grille technique could generate Voynich-like text. However, the manuscript follows Zipf’s law and shows bigram and trigram regularities that go beyond simple grille output. A hoaxer producing such structure in the early fifteenth century would have needed considerable sophistication and motive.
What languages have been proposed for the underlying plaintext?
Proposed source languages include Latin, Italian, Catalan, Hebrew, Arabic, a constructed European vernacular, and Mandarin Chinese. None of these proposals has produced a consistent, peer-accepted translation. The verbose-cipher hypothesis allows the plaintext to be a known European language scrambled by a multi-glyph substitution scheme.
How does the Voynich Manuscript differ from other historical ciphers?
Most undeciphered historical ciphers are short fragments. The Voynich offers more than 35,000 word tokens, enough text for robust statistical analysis. Most ciphered messages from the same era have known senders and contexts; the Voynich has neither. The combination of length, anonymity, and statistical regularity makes it unique in the history of cryptography.
Why has artificial intelligence not solved the manuscript?
Modern language models classify text by matching patterns in modern languages. The Voynich predates printed text by decades and may use a cipher rather than a natural language. AI tools can identify statistical anomalies in the manuscript but cannot reliably output translations without an external check. Lisa Fagin Davis has noted that algorithm-based “solutions” so far have produced unreproducible results.
Could the Voynich Manuscript be solved in the future?
A solution remains possible. The most likely paths are the discovery of a related document elsewhere in European archives, the recovery of an encoder’s notebook describing the cipher, or a verbose-substitution attack that produces independently verifiable plaintext. Each path requires either new documents or a methodologically airtight cipher analysis. Neither has emerged in 113 years of effort.
Sources
[1] Beinecke Rare Book and Manuscript Library, Yale University. Voynich Manuscript catalogue entry, MS 408. Beinecke Library MS 408 record.
[2] Zandbergen, Rene. The Voynich Manuscript reference site, provenance documentation. voynich.nu provenance section.
[3] Davis, Lisa Fagin. Manuscript Road Trip and Medieval Academy commentary on Voynich scribal analysis. Manuscript Road Trip blog.
[4] Hodgins, Greg. University of Arizona Accelerator Mass Spectrometry results, 2009. Reported in Stolte, Daniel. University of Arizona news release on the Voynich carbon date.
[5] McCrone Associates. Ink and pigment analysis of the Voynich Manuscript, summarized in Tucker, Arthur and Jules Janick (2018), and at the Beinecke conservation reports.
[6] D’Imperio, Mary. The Voynich Manuscript: An Elegant Enigma. National Security Agency, 1978. Released through the NSA historical publications archive.
[7] Currier, Prescott. Papers on the Voynich Manuscript, 1976. Distributed through the Beinecke and Voynich research community.
[8] Landini, Gabriel. “Evidence of linguistic structure in the Voynich manuscript using spectral analysis.” Cryptologia 25 (4), 2001.
[9] Reddy, Sravana, and Kevin Knight. “What We Know About the Voynich Manuscript.” Proceedings of the 5th ACL-HLT Workshop, 2011. ACL Anthology paper.
[10] Manly, John Matthews. “Roger Bacon and the Voynich MS.” Speculum 6 (3), 1931.
[11] Cheshire, Gerard. “The Language and Writing System of MS408 (Voynich) Explained.” Romance Studies, 2019. (Subsequently disputed and partially retracted from publisher promotion.)
[12] Hauer, Bradley, and Greg Kondrak. “Decoding Anagrammed Texts Written in an Unknown Language and Script.” Transactions of the Association for Computational Linguistics, 2018.
[13] Pelling, Nick. The Curse of the Voynich. Compelling Press, 2006.
[14] Fisk, Donald. “The Naibbe cipher: a substitution cipher that encrypts Latin and Italian as Voynich Manuscript-like ciphertext.” Cryptologia, 2025.
[15] Rugg, Gordon. “An elegant hoax? A possible solution to the Voynich Manuscript.” Cryptologia 28 (1), 2004.
Other open cases from the unsolved mysteries archive: The Enfield Poltergeist: Unsettling Sounds and Sights and The Somerton Man: Tamam Shud’s Unknown End.


