Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arxiv/zzzarchived_arxiv-pandoc
Pandoc-based executable. Currently extracts text from documents
https://github.com/arxiv/zzzarchived_arxiv-pandoc
haskell nix
Last synced: about 4 hours ago
JSON representation
Pandoc-based executable. Currently extracts text from documents
- Host: GitHub
- URL: https://github.com/arxiv/zzzarchived_arxiv-pandoc
- Owner: arXiv
- License: mit
- Archived: true
- Created: 2020-06-16T21:35:08.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-09-17T21:52:32.000Z (about 4 years ago)
- Last Synced: 2024-11-06T17:12:53.926Z (about 4 hours ago)
- Topics: haskell, nix
- Language: Haskell
- Homepage:
- Size: 20.5 KB
- Stars: 4
- Watchers: 4
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: ChangeLog.md
- License: LICENSE
Awesome Lists containing this project
README
# arxiv-pandoc
[Pandoc](https://pandoc.org/)-based executable for converting
LaTeX-based arXiv documents to text.## Usage
`arxiv-pandoc-static` emits the extracted text to `stdout`.
If given no arguments, `arxiv-pandoc-static` assumes it is in a directory with
the extracted source files.```
arxiv-pandoc-static > out.txt
```However, the path to the extracted directory can also be given, e.g.
```
arxiv-pandoc-static /path/to/some_arXiv_doc > out.txt
```### Export LANG
If you encounter an error along the lines of
`: commitAndReleaseBuffer: invalid argument (invalid character)`,
you [may need to](https://github.com/simonmichael/hledger/blob/0751536d255cddc80513283f57f6c8f9f85f85d5/hledger/hledger.info#L914-L919)
set `LANG=C.UTF-8` before running, e.g.:```
LANG=C.UTF-8 /path/to/arxiv-pandoc-static > out.txt
```or just `export LANG=C.UTF-8` once.
## Building
### Docker
See the *Packaging* section below on how to get the appropriate docker
container running. Once int he container:```
hpack
cabal build
```### Nix
First, load the necessary build tools:
```
$ nix-shell -p haskellPackages.hpack haskellPackages.Cabal cabal2nix
```Then, build the project
```
$ hpack
$ cabal2nix --shell . > package-shell.nix
$ nix-shell package-shell.nix
$ cabal build arxiv-pandoc-exe
```
We currently omit doing a full `cabal build` in Nix due to the
static build having problems there.## Installing
E.g.
```
cabal install --overwrite-policy=always --installdir=$HOME/.local/bin
```## Packaging
### Docker
Build the image if necessary:
```
docker build -t arxiv-pandoc .
```(Set `DHUB_PREFIX` as below to use the locally built docker image.)
```
DHUB_PREFIX="" ./run-docker-env.sh bash
cabal clean
cabal install --install-method=copy --enable-executable-static --overwrite-policy=always --installdir=./bin
````/bin` should now contain two executables, one statically linked,
the other dynamically linked.To avoid permission issues, run `cabal clean` again from the Docker
environment after the build has succeeded.### packaging with nix-bundle (WIP)
In addition to the build instructions:
Add `nix-bundle` to the initial list of packages to load.
```
cabal2nix . > package.nix
```Test the build of the package if desired using `nix-build` directly:
```
nix-build -E "with import {}; pkgs.haskell.packages.ghc865.callPackage ./package.nix {}"
```You should be able to see where the binary is located in the filesystem
now, e.g.:```
/nix/store/2sk6g3qgd7zmii11jl7gaw09c9y17cmv-arxiv-pandoc-0.1.0.0/
```The next step [currently fails](https://github.com/matthewbauer/nix-bundle/issues/55 ):
```
$ nix-bundle '(pkgs.haskell.packages.ghc865.callPackage ./package.nix {})' /bin/arxiv-pandoc-exe
```### Packaging with static-haskell-nix (WIP)
I haven't tried [static-haskell-nix](https://github.com/nh2/static-haskell-nix) yet, partly as it looks like it
would require manual editing of the generated `package.nix` (though
at some point such support might be integrated).