https://github.com/gitbruno/docxicml
A semantical DOCX to ICML converter
https://github.com/gitbruno/docxicml
converter docx icml
Last synced: about 1 year ago
JSON representation
A semantical DOCX to ICML converter
- Host: GitHub
- URL: https://github.com/gitbruno/docxicml
- Owner: GitBruno
- License: bsd-2-clause
- Created: 2016-10-12T01:06:28.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2018-03-03T05:12:51.000Z (over 8 years ago)
- Last Synced: 2025-02-10T22:34:09.314Z (over 1 year ago)
- Topics: converter, docx, icml
- Language: XSLT
- Homepage:
- Size: 4.39 MB
- Stars: 12
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
DOCX to ICML converter
======================
docxicml is designed to convert MS Word (DOCX) documents to Adobe InDesign (ICML). It aims to produce clean files using semantic information only.
This converter ignores all non-semantical info like font names and colours. It will however keep track of unstyled italics, bolds and page breaks. Unlike [Pandoc](https://pandoc.org/), docxicml assumes styles are applied semantically and therefore tracks all style references.
This package is standing on the shoulders of [Python-Mammoth](https://github.com/mwilliamson/python-mammoth) it generates a dynamic style map and transform the HTML to ICML using a XSLT stylesheet.
Contents
--------
1. [Usage](#usage)
2. [Supported Elements](#supported-elements)
3. [Dependencies](#dependencies)
5. [Installation](#installation)
5. [Limitations](#limitations)
6. [Getting Help](#getting-help)
Usage
-----
Convert a word document (`docx`) to `xhtml` and `icml` with the following command:
docxicml source.docx
The newly generated files will be at the same location as source document:
source.docx
source.xhtml
source.icml
Supported Elements
------------------
The following elements are supported:
* Paragraph Styles
* Character Styles
* Bold and italic
* Strikethrough and Underlines
* Superscript and Subscript
* Headings
* Ordered and Unordered Lists
* Tables (Including headers and footers)
* Footnotes and endnotes (Yet to be implemented)
* Line, Column and Page Breaks
* Hyperlinks (Yet to be implemented)
* Images (Only embedded EMF)
Dependencies
------------
docxicml requires [Java 6](https://www.java.com/en/download/) or later. (It uses [SaxonHE](https://sourceforge.net/projects/saxon/files/Saxon-HE/) for [XSLT 2.0](https://www.w3.org/TR/xslt20/) transformations.)
Installation
------------
make install
Limitations
-----------
As it stands, there is room for improvements. We need to finalise implementation of all [elements](#supported-elements) listed above. It might be a good idea to port this to Javascript so we can run it with easy on a wide variety of systems without installing the Java runtime. Both XSLT processor and Mammoth have Javascript implementations: [mammoth.js](https://github.com/mwilliamson/mammoth.js), [Saxon-JS](www.saxonica.com/saxon-js). It would be usefull to be able to round-trip the files.
Getting Help
------------
Bugs and feature requests are tracked with [GitHub Issue Tracker](http://github.com/gitbruno/docxicml/issues).
[](https://opensource.org/licenses/BSD-2-Clause)