Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/justxuewei/mdict-analysis
https://github.com/justxuewei/mdict-analysis
Last synced: 21 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/justxuewei/mdict-analysis
- Owner: justxuewei
- Created: 2020-03-30T02:09:26.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-03-30T02:09:36.000Z (almost 5 years ago)
- Last Synced: 2024-11-09T05:33:33.347Z (3 months ago)
- Language: Python
- Size: 68.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
An Analysis of MDX/MDD File Format
==================================MDict is a multi-platform open dictionary
which are both questionable. It is not available for every platform, e.g. OS X, Linux.
Its dictionary file format is not open. But this has not hindered its popularity,
and many dictionaries have been created for it.This is an attempt to reveal MDX/MDD file format, so that my favorite dictionaries,
created by MDict users, could be used elsewhere.MDict Files
===========
MDict stores the dictionary definitions, i.e. (key word, explanation) in MDX file and
the dictionary reference data, e.g. images, pronunciations, stylesheets in MDD file.
Although holding different contents, these two file formats share the same structure.MDX File Format
===============
.. image:: MDX.svgMDD File Format
===============
.. image:: MDD.svgExample Programs
================readmdict.py
------------
readmdict.py is an example implementation in Python. This program can read/extract mdx/mdd files... note:: python-lzo is required to read mdx files created with engine 1.2.
Get Windows version from http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzoIt can be used as a command line tool. Suppose one has oald8.mdx and oald8.mdd::
$ python readmdict.py -x oald8.mdx
This will creates *oald8.txt* dictionary file and creates a folder *data* for images, pronunciation audio files.
On Windows, one can also double click it and select the file in the popup dialog.
Or as a module::
In [1]: from readmdict import MDX, MDD
Read MDX file and print the first entry::
In [2]: mdx = MDX('oald8.mdx')
In [3]: items = mdx.items()
In [4]: items.next()
Out[4]:
('A',
'.........')
``mdx`` is an object having all info from a MDX file. ``items`` is an iterator producing 2-item tuples.
Of each tuple, the first element is the entry text and the second is the explanation. Both are UTF-8 encoded strings.Read MDD file and print the first entry::
In [5]: mdd = MDD('oald8.mdd')
In [6]: items = mdd.items()
In [7]: items = mdd.next()
Out[7]:
(u'\\pic\\accordion_concertina.jpg',
'\xff\xd8\xff\xe0\x00\x10JFIF...........')``mdd`` is an object having all info from a MDD file. ``items`` is an iterator producing 2-item tuples.
Of each tuple, the first element is the file name and the second element is the corresponding file content.
The file name is encoded in UTF-8. The file content is a plain bytes array.Acknowledge
===========
The file format gets fully disclosed by https://github.com/zhansliu/writemdict.
The encryption part is taken into this project.