https://github.com/transpect/split-docx
this module helps with splitting docx files
https://github.com/transpect/split-docx
Last synced: 3 months ago
JSON representation
this module helps with splitting docx files
- Host: GitHub
- URL: https://github.com/transpect/split-docx
- Owner: transpect
- License: bsd-2-clause
- Created: 2023-10-26T11:01:42.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-05-08T10:04:06.000Z (about 1 year ago)
- Last Synced: 2025-07-13T03:44:55.443Z (12 months ago)
- Language: XProc
- Size: 14.6 KB
- Stars: 0
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# split-docx
this module helps with splitting docx files. It is just a library. To run it standalone, you need to make it part of a transpect project, see below.
* The main pipeline supports splitting on part and chapter headings. Their styles can be given as options `chapter-regex` and `part-regex`.
* Example invocation to split all docx files in a directory on paragraphs with style 'heading 1':
`calabash/calabash.sh -o result=file:///C:/…/result.xml split-docx/xpl/split-main.xpl dir=path-to-docx-dir chapter-regex="heading\s*1" debug=yes debug-dir-uri=file:///…/debug`
You could provide customer specific `split-docx/conf.xsl` filed to change target file names of split chunks. Therefore the pipeline must be invoked with a params input port.
The subpipelines can also be used to split on bookmarks etc.
## Setup for Standalone Invocation
You need to set up a complete transpect project:
* Create a project folder, say, split-docx-frontend, and change to it.
* `git clone https://github.com/transpect/calabash-frontend calabash --recurse-submodules`
* `git clone https://github.com/transpect/cascade`
* `git clone https://github.com/transpect/docx2hub`
* `git clone https://github.com/transpect/docx_modify-lib`
* `git clone https://github.com/transpect/split-docx`
* `git clone https://github.com/transpect/xproc-util`
* `git clone https://github.com/transpect/xslt-util`
* Create a directory xmlcatalog and inside an XML catalog file called catalog.xml with this content:
```xml
```
*