Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fair-research/bdbag
Big Data Bag Utilities
https://github.com/fair-research/bdbag
bagit checksum data-integrity file-container file-transfer hashing identifiers profile-validation pypi python research-object ro-profiles
Last synced: 29 days ago
JSON representation
Big Data Bag Utilities
- Host: GitHub
- URL: https://github.com/fair-research/bdbag
- Owner: fair-research
- License: apache-2.0
- Created: 2016-03-28T19:16:33.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-06-28T20:18:20.000Z (7 months ago)
- Last Synced: 2024-11-07T04:19:47.737Z (3 months ago)
- Topics: bagit, checksum, data-integrity, file-container, file-transfer, hashing, identifiers, profile-validation, pypi, python, research-object, ro-profiles
- Language: Python
- Homepage: https://fair-research.org
- Size: 3.05 MB
- Stars: 50
- Watchers: 13
- Forks: 22
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
- awesome-opendata-software - bdbag - The bdbag utilities are a collection of software programs for working with BagIt packages that conform to the BDBag and Bagit/RO profiles. (Tools / Data packaging)
README
# bdbag
[![Build Status](https://app.travis-ci.com/fair-research/bdbag.svg)](https://app.travis-ci.com/fair-research/bdbag)
[![Coverage Status](https://coveralls.io/repos/github/fair-research/bdbag/badge.svg?branch=master)](https://coveralls.io/github/fair-research/bdbag?branch=master)
[![PyPi Version](https://img.shields.io/pypi/v/bdbag.svg)](https://pypi.python.org/pypi/bdbag)
[![PyPi Downloads](https://static.pepy.tech/badge/bdbag)](https://pepy.tech/project/bdbag)
[![PyPi Wheel](https://img.shields.io/pypi/wheel/bdbag.svg)](https://pypi.python.org/pypi/bdbag)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/bdbag/badges/version.svg)](https://anaconda.org/conda-forge/bdbag)
[![Anaconda-Server Badge](https://anaconda.org/conda-forge/bdbag/badges/downloads.svg)](https://anaconda.org/conda-forge/bdbag)
[![Python Versions](https://img.shields.io/pypi/pyversions/bdbag.svg)](https://pypi.python.org/pypi/bdbag)
[![License](https://img.shields.io/pypi/l/bdbag.svg)](http://www.apache.org/licenses/LICENSE-2.0)## Big Data Bag Utilities
The `bdbag` utilities are a collection of software programs for working with
[BagIt](https://datatracker.ietf.org/doc/draft-kunze-bagit/) packages that conform to the BDBag and Bagit/RO profiles.The `bdbag` [profiles](https://github.com/fair-research/bdbag/tree/master/profiles) specify the use of the `fetch.txt`
file, require serialization, and specify what manifests must be provided with a `bdbag`.The `bdbag` utilities incorporate functions from various other Python-based `bagit` components (such as the
[Bagit-Python](https://github.com/LibraryOfCongress/bagit-python) bag creation utility and the
[Bagit-Profiles-Validator](https://github.com/ruebot/bagit-profiles)
utility) and wraps them in a single, easy to use software package with additional features.Enhanced bag support includes:
* Update-in-place functionality for existing bags.
* Automatic archiving and extraction of bags using ZIP, TAR, and TGZ formats.
* Automatic generation of file manifest entries and `fetch.txt` for remote files via configuration file.
* Automatic file retrieval based on the contents of a bag's `fetch.txt` file with multiple protocol support. Transport
handlers for `http(s)`,`ftp`,`s3`,`gs`, and `globus` are provided, along with an extensibility mechanism for adding
externally developed transports.
* Built-in `bagit-profile` validation.
* Built-in support for creation of bags with [Bagit/RO profile](https://github.com/ResearchObject/bagit-ro) compatibility.An experimental Graphical User Interface (GUI) for `bdbag` can be found [here](https://github.com/fair-research/bdbag_gui).
#### Technical Papers
["I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets"](https://zenodo.org/record/820878) explains the motivation for BDBags and the related Minid construct, provides details on design and implementation, and gives examples of use.
["Reproducible big data science: A case study in continuous FAIRness"](https://www.biorxiv.org/content/early/2018/02/27/268755) presents a data analysis use case in which BDBags and Minids are used to capture a transcription factor binding site analysis.
### Python Dependencies
* Python 2.7 is the minimum Python version required. Please note however that Python 2.7 is officially end-of-life and ongoing compatibility between Python 2.7 and `bdbag` (and its dependencies) is not officially supported and cannot be guaranteed.
* Python 3, versions 3.8 through 3.12 are the current officially supported releases.### Installation
The latest `bdbag` release is available on PyPi and can be installed using `pip`:```sh
pip install bdbag
```Note that the above command will install `bdbag` with only the minimal dependencies required to run.
If you wish to install `bdbag` with the extra fetch transport handler support provided by `boto` (for AWS S3)
and `globus` (for Globus Transfer) packages, use the following command:
```sh
pip install bdbag[boto,globus]
```### Installation from Source
You can use `pip` to install `bdbag` directly from GitHub:```sh
sudo pip install git+https://github.com/fair-research/bdbag
```
or:
```sh
pip install --user git+https://github.com/fair-research/bdbag
```You can also [download](https://github.com/fair-research/bdbag/archive/master.zip) the current `bdbag` source code from GitHub or
alternatively clone the source from GitHub if you have *git* installed:```sh
git clone https://github.com/fair-research/bdbag
```
From the root of the `bdbag` source code directory execute the following command:
```sh
sudo pip install .
```
or:
```sh
pip install --user .
```
Note that if you want to install the extra dependencies from a local source directory you would use the following command:
```sh
pip install .[boto,globus]
```### Testing
The unit tests can be run by invoking the following command from the root of the `bdbag` source code directory:
```sh
python setup.py test
```### Usage
This software can be used from the command-line environment by running the `bdbag` script. For detailed usage
instructions, see the [CLI Guide](./doc/cli.md).### Configuration
Some components of the `bdbag` software can be configured via JSON-formatted configuration files.
See the [Configuration Guide](./doc/config.md) for further details.### Application Programming Interface
It is also possible to use `bdbag` from within other Python programs via an API.
See the [API Guide](./doc/api.md) for further details.### Utilities
A CLI utility module is provided for various ancillary tasks commonly involved with authoring **bdbags**.
See the [Utility Guide](./doc/utils.md) for further details.### Change Log
The change log is located [here](CHANGELOG.md).