Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ancatmara/text-reuse-test
Testing Tracer and Passim on medieval Irish & Welsh law texts
https://github.com/ancatmara/text-reuse-test
early-irish irish natural-language-processing nlp passim preprocessing text-reuse tracer tutorial welsh
Last synced: 6 days ago
JSON representation
Testing Tracer and Passim on medieval Irish & Welsh law texts
- Host: GitHub
- URL: https://github.com/ancatmara/text-reuse-test
- Owner: ancatmara
- License: gpl-3.0
- Created: 2021-03-16T23:54:35.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-08T21:07:39.000Z (about 3 years ago)
- Last Synced: 2024-01-24T12:06:00.855Z (10 months ago)
- Topics: early-irish, irish, natural-language-processing, nlp, passim, preprocessing, text-reuse, tracer, tutorial, welsh
- Language: JavaScript
- Homepage:
- Size: 24 MB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Tracer
### System requirements
* Any OS (Windows/OS X/Linux)
* Java 8 — not higher!
* Apache Ant (a build tool)**NB!** Only use `ant` command when compiling Tracer, any flags (like `verbose`) will later result in *“Unable to access jarfile”* error.
### Download & compilation
`git clone http://vcs.etrap.eu/tracer-framework/tracer.git`
`cd tracer`
`ant`
### Running
`java -Xmx600m -Deu.etrap.medusa.config.ClassConfig=conf/tracer_config.xml -jar tracer.jar > FILENAME.OUT`
* Your data must be a single `.txt` file in `tracer/data/corpora/*some-folder*`
* Check and update `tracer/conf/tracer_config.xml` before running### Manual
* [Download & installation](https://gfranzini.gitbooks.io/tracer/content/manual/download-and-installation.html)
* [Troubleshooting](https://gfranzini.gitbooks.io/tracer/content/support/troubleshooting/)
* [Data format](https://gfranzini.gitbooks.io/tracer/content/manual/corpus-preparation.html)
* [Configuration](https://gfranzini.gitbooks.io/tracer/content/manual/configuration/)
* [Output](https://gfranzini.gitbooks.io/tracer/content/manual/results-and-computed-files.html)## Passim
### Requirements
* Java 8 — not higher!
* Scala
* Sbt ( a build tool for JVM languages)
* Apache Spark
* Comes with Java 11
* Depends on Hadoop, which has native libraries that aren't supported in OS X → won’t run on a Mac**NB!** Build Passim with `sbt` before installing Apache Spark. Spark comes with Java 11, which results in build conflicts when you are compiling Passim with `sbt`.
### Download & compilation
`git clone https://github.com/dasmiq/passim.git`
`cd passim`
`build/sbt package`
This should produce a runnable .jar in `target/scala_*/passim*.jar`.
### Running
`passim "{input.json,directory-of-json-files,some*.json.bz2}" output`
### Docs
[GitHub repo](https://github.com/dasmiq/passim)