Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/moarc/brilldecode
Convert Brill's Encyclopaedia of Islam (and possibly others) into proper Unicode
https://github.com/moarc/brilldecode
Last synced: 8 days ago
JSON representation
Convert Brill's Encyclopaedia of Islam (and possibly others) into proper Unicode
- Host: GitHub
- URL: https://github.com/moarc/brilldecode
- Owner: Moarc
- License: gpl-3.0
- Created: 2020-02-04T09:59:00.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2025-01-04T09:53:47.000Z (16 days ago)
- Last Synced: 2025-01-04T11:06:27.057Z (16 days ago)
- Language: Python
- Size: 51.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
## What is Brillcode?
A very apt name [given](https://jhmccloskey.tripod.com/ei/index.htm) to the encoding we're dealing with.
The Brill *Encyclopædia of Islam* CD-ROM edition (2003) uses a custom font in which regular characters are replaced with the glyphs they need for transcription.
The articles themselves are encoded in Win-1252, and the font is switched to this special font (Baskerville for Brill 02) as needed, using CSS.
With modern stuff like webfonts the custom font could be included for machines that don't have this font installed, but copypasting from the articles would still result in a mess, and it just doesn't feel right, so I wrote a shitty script to convert them into proper Unicode. (it recently became a bit less shitty)## Further goals
- [x] iterating over the entire encyclopedia
- [ ] properly handling all the links
- [ ] popup figures
- [ ] converting Greek text
- [ ] (possibly) reading directly from the CD or an image of the CD.
- [x] output to something like [slob](https://github.com/itkach/slob)## Postscript
I'm not sure if the Ba00/Ba01 fonts include something other than regular Win-1252 (it didn't seem so to me, the Ö's etc. are displayed properly with other fonts) - if so, I'll create another conversion table for those too.