https://github.com/slub/esmarc

marc21 -> rdf mapping tool
https://github.com/slub/esmarc

json-ld json-ld-context marc21 python3 rdf rdflib

Last synced: 4 months ago
JSON representation

marc21 -> rdf mapping tool

Host: GitHub
URL: https://github.com/slub/esmarc
Owner: slub
License: apache-2.0
Created: 2020-01-28T13:22:24.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-12-14T15:02:15.000Z (over 1 year ago)
Last Synced: 2025-01-25T07:08:54.977Z (6 months ago)
Topics: json-ld, json-ld-context, marc21, python3, rdf, rdflib
Language: Python
Size: 319 KB
Stars: 1
Watchers: 7
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

EFRE-Lod logo

# Installation

run:
```
pip3 install . --user
```

# esmarc.py

esmarc is a python3 tool to read line-delimited MARC21 JSON from an elasticSearch index, perform a mapping and writes the output in a directory with a file for each mapping type.

dependencies:
python3-elasticsearch
efre-lod-elasticsearch-tools

run:

```
$ esmarc.py
-h, --help show this help message and exit
-host HOST hostname or IP-Address of the ElasticSearch-node to use. If None we try to read ldj from stdin.
-port PORT Port of the ElasticSearch-node to use, default is 9200.
-type TYPE ElasticSearch Type to use
-index INDEX ElasticSearch Index to use
-id ID map single document, given by id
-help print this help
-prefix PREFIX Prefix to use for output data
-debug Dump processed Records to stdout (mostly used for debug-purposes)
-server SERVER use http://host:port/index/type/id?pretty syntax. overwrites host/port/index/id/pretty.
-pretty output tabbed json
-w W how many processes to use
-idfile IDFILE path to a file with IDs to process
-query QUERY prefilter the data based on an elasticsearch-query

```

# entityfacts-bot.py

entityfacts-bot.py is a Python3 program that enrichs ("links") your data with more identifiers from entitiyfacts. Prerequisits is that you have a field containing your GND-Identifier.

It connects to an elasticsearch node and outputs the enriched data, which can be put back to the index using esbulk.

## Usage

```
./entityfacts-bot.py
-h, --help show this help message and exit
-host HOST hostname or IP-Address of the ElasticSearch-node to use, default is localhost.
-port PORT Port of the ElasticSearch-node to use, default is 9200.
-index INDEX ElasticSearch Search Index to use
-type TYPE ElasticSearch Search Index Type to use
-id ID retrieve single document (optional)
-searchserver SEARCHSERVER use http://host:port/index/type/id?pretty. overwrites host/port/index/id/pretty
-stdin get data from stdin
-pipeline output every record (even if not enriched) to put this script into a pipeline

```

## Requirements

python3-elasticsearch

e.g. (ubuntu)
```
sudo apt-get install python3-elasticsearch
```

# wikidata.py

wikidata.py is a Python3 program that enrichs ("links") your data with the wikidata-identifier from wikidata. Prerequisits is that you have a field containing your GND-Identifier. Other identifiers are planned to be used in future.

It connects to an elasticsearch node and outputs the enriched data, which can be put back to the index using esbulk.

## Usage

```
./wikidata.py
-h, --help show this help message and exit
-host HOST hostname or IP-Address of the ElasticSearch-node to use, default is localhost.
-port PORT Port of the ElasticSearch-node to use, default is 9200.
-index INDEX ElasticSearch Search Index to use
-type TYPE ElasticSearch Search Index Type to use
-id ID retrieve single document (optional)
-stdin get data from stdin
-pipeline output every record (even if not enriched) to put this script into a pipeline
-server SERVER use http://host:port/index/type/id?pretty. overwrites host/port/index/id/pretty
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/slub/esmarc

Awesome Lists containing this project

README