https://github.com/chalcolith/enochian
Tools for decipherment.
https://github.com/chalcolith/enochian
Last synced: 3 months ago
JSON representation
Tools for decipherment.
- Host: GitHub
- URL: https://github.com/chalcolith/enochian
- Owner: chalcolith
- License: mit
- Created: 2016-12-25T05:45:55.000Z (over 9 years ago)
- Default Branch: main
- Last Pushed: 2023-07-06T01:06:07.000Z (almost 3 years ago)
- Last Synced: 2025-10-03T20:49:52.760Z (9 months ago)
- Language: Roff
- Size: 5.08 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Enochian
This project provides some tools to do exploratory phonological comparisons
between texts in unknown languages and entries one or more lexicons.
You may see the results of a recent test run of the software for the Voynich Manuscript
[here](http://chalcolith.github.io/enochian).
## Introduction
The initial goal is to investigate whether a particular theory of a possible
phonological interpretation of the script in the Voynich manuscript can be used
to find possible lexical matches in various machine-readable lexicons.
[Stephen Bax](https://stephenbax.net/?page_id=11) in 2014 proposed some
phonological values for various Voynich characters, based on identifications of
plant and star names in some of the illustrated pages. [Derek
Vogt](https://www.youtube.com/channel/UC-sW5dOlDxxu0EgdNn2pMaQ/videos) has
elaborated on this work and proposed a more extensive phonological scheme. In
addition, he has analyzed the phonological inventory of the scheme and proposed
that the language of the Voynich manuscript is based on some variety of Romani.
At present, the Enochian software tool can take arbitrary lines from the
[Reed-Landini-Stolfi
Interlinear](http://www.ic.unicamp.br/~stolfi/voynich/98-12-28-interln16e6/)
transcription of the Voynich manuscript, encode each word as a sequence of
vectors in phonological feature space, and then search the
[RomLex](http://romani.uni-graz.at/romlex/) lexicon of Romani and the
[Shabda-Sagara Sanskrit
dictionary](http://www.sanskrit-lexicon.uni-koeln.de/scans/csldoc/dictionaries/shs.html),
using dynamic time warping to look for for the closest phonological sequence
matches.
You can see a sample of this kind of flow in the
[voynich.json](https://github.com/chalcolith/enochian/blob/master/samples/voynich.json)
flow configuration. This flow reads the RomLex lexicon and the specified lines
of the Voynich transcription and produces an HTML file containing a report on
the possible phonological matches.
### Status
Current results are inconclusive. Possible matches for words meaning "sun",
"moon", "house", and "sky" appear on the first page of the Voynich manuscript,
which are suggestive of references to astrological content, but much more work
needs to be done.
You may see the results of a recent test run of the software for the Voynich Manuscript
[here](http://chalcolith.github.io/enochian/index.html).
### Roadmap
The RomLex lexicon has fewer than 30,000 entries, many of which are duplicates,
due to the lexicon containing data from multiple Romani dialects. This means it
does not provide very conclusive results on its own.
The Shabda-Sagara dictionary also has fewer than 30,000 entries.
## General Functionality
At the most general level, the Enochian library provides a system for
configuring and running "flows" of arbitrary data transformations. This is
implemented by the
[Flow](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Flow/Flow.cs)
class, which contains a
[FlowContainer](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Flow/FlowContainer.cs)
which can have a number of
[FlowStep](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Flow/FlowStep.cs)
objects (which can themselves be containers).
When you iterate over the enumerable returned by `FlowStep.GetOutputs()`, each
step will grab an output from its previous sibling and call its `Process()`
method on it, returning the resulting output. If you implement only
`FlowStep.Process()`, or if you implement `FlowStep.GetOutputs()` using `yield
return`, the flow process will be asynchronous; it will only process as many
items as are needed to return one output.
## Linguistic Resources
In order to do phonological analysis, the Enochian library provides a way to
specify a phonological feature set (see
[features.json](https://github.com/chalcolith/enochian/blob/master/resources/encodings/features.json)
for an example using a pretty standard set of phonological features). The
[FeatureSet](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Text/FeatureSet.cs)
class is used to load and use these feature sets.
You can also define text "encodings". These take input strings in Unicode and
produce sequences of vectors in the multi-dimensional space defined by the
phonological feature set. A single phonological segment consists of an
`N`-dimensional vector, where `N` is the number of features in your feature set.
If a particular feature has a `+` value for that segment, its corresponding
vector element will be `1`; if it has a `-` value, its vector element will be
`-`. If the feature is unspecified, its vector element will be `0`.
## Lexicons
The systems includes several lexicons:
### CMU Pronouncing Dictionary
This is used for testing the underlying assumption behind the project, that we
can find slightly dissimilar phonological sequences in a lexicon by means of
dynamic time warping. The
[english_test.json](https://github.com/chalcolith/enochian/blob/master/samples/english_test.json)
contains a sample flow that compares a defective encoding of English text with
the CMU dictionary to produce matches for English words. Running this flow
demonstrates that the process is capable of finding many such valid matches.
### RomLex
This is a dictionary of words in various Romani dialects. The database is only
available via the web, so there is a project `RomlexScraper` that scrapes the
web interface to assemble a complete version of the lexicon.
### Shabda-Sagara
This is a 19th-century dictionary of classical Sanskrit.