https://github.com/chalcolith/enochian

Tools for decipherment.
https://github.com/chalcolith/enochian

Last synced: 3 months ago
JSON representation

Tools for decipherment.

Host: GitHub
URL: https://github.com/chalcolith/enochian
Owner: chalcolith
License: mit
Created: 2016-12-25T05:45:55.000Z (over 9 years ago)
Default Branch: main
Last Pushed: 2023-07-06T01:06:07.000Z (almost 3 years ago)
Last Synced: 2025-10-03T20:49:52.760Z (9 months ago)
Language: Roff
Size: 5.08 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Enochian

This project provides some tools to do exploratory phonological comparisons
between texts in unknown languages and entries one or more lexicons.

You may see the results of a recent test run of the software for the Voynich Manuscript
[here](http://chalcolith.github.io/enochian).

## Introduction

The initial goal is to investigate whether a particular theory of a possible
phonological interpretation of the script in the Voynich manuscript can be used
to find possible lexical matches in various machine-readable lexicons.

[Stephen Bax](https://stephenbax.net/?page_id=11) in 2014 proposed some
phonological values for various Voynich characters, based on identifications of
plant and star names in some of the illustrated pages. [Derek
Vogt](https://www.youtube.com/channel/UC-sW5dOlDxxu0EgdNn2pMaQ/videos) has
elaborated on this work and proposed a more extensive phonological scheme. In
addition, he has analyzed the phonological inventory of the scheme and proposed
that the language of the Voynich manuscript is based on some variety of Romani.

At present, the Enochian software tool can take arbitrary lines from the
[Reed-Landini-Stolfi
Interlinear](http://www.ic.unicamp.br/~stolfi/voynich/98-12-28-interln16e6/)
transcription of the Voynich manuscript, encode each word as a sequence of
vectors in phonological feature space, and then search the
[RomLex](http://romani.uni-graz.at/romlex/) lexicon of Romani and the
[Shabda-Sagara Sanskrit
dictionary](http://www.sanskrit-lexicon.uni-koeln.de/scans/csldoc/dictionaries/shs.html),
using dynamic time warping to look for for the closest phonological sequence
matches.

You can see a sample of this kind of flow in the
[voynich.json](https://github.com/chalcolith/enochian/blob/master/samples/voynich.json)
flow configuration. This flow reads the RomLex lexicon and the specified lines
of the Voynich transcription and produces an HTML file containing a report on
the possible phonological matches.

### Status

Current results are inconclusive. Possible matches for words meaning "sun",
"moon", "house", and "sky" appear on the first page of the Voynich manuscript,
which are suggestive of references to astrological content, but much more work
needs to be done.

You may see the results of a recent test run of the software for the Voynich Manuscript
[here](http://chalcolith.github.io/enochian/index.html).

### Roadmap

The RomLex lexicon has fewer than 30,000 entries, many of which are duplicates,
due to the lexicon containing data from multiple Romani dialects. This means it
does not provide very conclusive results on its own.

The Shabda-Sagara dictionary also has fewer than 30,000 entries.

## General Functionality

At the most general level, the Enochian library provides a system for
configuring and running "flows" of arbitrary data transformations. This is
implemented by the
[Flow](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Flow/Flow.cs)
class, which contains a
[FlowContainer](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Flow/FlowContainer.cs)
which can have a number of
[FlowStep](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Flow/FlowStep.cs)
objects (which can themselves be containers).

When you iterate over the enumerable returned by `FlowStep.GetOutputs()`, each
step will grab an output from its previous sibling and call its `Process()`
method on it, returning the resulting output. If you implement only
`FlowStep.Process()`, or if you implement `FlowStep.GetOutputs()` using `yield
return`, the flow process will be asynchronous; it will only process as many
items as are needed to return one output.

## Linguistic Resources

In order to do phonological analysis, the Enochian library provides a way to
specify a phonological feature set (see
[features.json](https://github.com/chalcolith/enochian/blob/master/resources/encodings/features.json)
for an example using a pretty standard set of phonological features). The
[FeatureSet](https://github.com/chalcolith/enochian/blob/master/source/Enochian/Text/FeatureSet.cs)
class is used to load and use these feature sets.

You can also define text "encodings". These take input strings in Unicode and
produce sequences of vectors in the multi-dimensional space defined by the
phonological feature set. A single phonological segment consists of an
`N`-dimensional vector, where `N` is the number of features in your feature set.
If a particular feature has a `+` value for that segment, its corresponding
vector element will be `1`; if it has a `-` value, its vector element will be
`-`. If the feature is unspecified, its vector element will be `0`.

## Lexicons

The systems includes several lexicons:

### CMU Pronouncing Dictionary

This is used for testing the underlying assumption behind the project, that we
can find slightly dissimilar phonological sequences in a lexicon by means of
dynamic time warping. The
[english_test.json](https://github.com/chalcolith/enochian/blob/master/samples/english_test.json)
contains a sample flow that compares a defective encoding of English text with
the CMU dictionary to produce matches for English words. Running this flow
demonstrates that the process is capable of finding many such valid matches.

### RomLex

This is a dictionary of words in various Romani dialects. The database is only
available via the web, so there is a project `RomlexScraper` that scrapes the
web interface to assemble a complete version of the lexicon.

### Shabda-Sagara

This is a 19th-century dictionary of classical Sanskrit.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chalcolith/enochian

Awesome Lists containing this project

README