https://github.com/climateimpactlab/datafs
An abstraction layer for data storage systems
https://github.com/climateimpactlab/datafs
command-line-tool data-version-control file-management python storage
Last synced: 19 days ago
JSON representation
An abstraction layer for data storage systems
- Host: GitHub
- URL: https://github.com/climateimpactlab/datafs
- Owner: ClimateImpactLab
- License: mit
- Created: 2016-11-19T03:32:09.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2022-12-26T20:36:15.000Z (about 3 years ago)
- Last Synced: 2025-09-10T03:53:20.493Z (5 months ago)
- Topics: command-line-tool, data-version-control, file-management, python, storage
- Language: Python
- Homepage:
- Size: 920 KB
- Stars: 6
- Watchers: 5
- Forks: 2
- Open Issues: 50
-
Metadata Files:
- Readme: README.rst
- Contributing: CONTRIBUTING.rst
- License: LICENSE
Awesome Lists containing this project
README
=========================================
DataFS Data Management System
=========================================
.. image:: https://img.shields.io/pypi/v/datafs.svg
:target: https://pypi.python.org/pypi/datafs
.. image:: https://travis-ci.org/ClimateImpactLab/DataFS.svg?branch=master
:target: https://travis-ci.org/ClimateImpactLab/DataFS?branch=master
.. image:: https://coveralls.io/repos/github/ClimateImpactLab/DataFS/badge.svg?branch=master
:target: https://coveralls.io/github/ClimateImpactLab/DataFS?branch=master
.. image:: https://readthedocs.org/projects/datafs/badge/?version=latest
:target: https://datafs.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://pyup.io/repos/github/ClimateImpactLab/DataFS/shield.svg
:target: https://pyup.io/repos/github/ClimateImpactLab/DataFS/
:alt: Updates
.. image:: https://api.codacy.com/project/badge/Grade/5e095453424840e092e71c42b8ad8b52
:alt: Codacy Badge
:target: https://www.codacy.com/app/delgadom/DataFS?utm_source=github.com&utm_medium=referral&utm_content=ClimateImpactLab/DataFS&utm_campaign=badger
.. image:: https://badges.gitter.im/DataFS/Lobby.svg
:alt: Join the chat at https://gitter.im/DataFS/Lobby
:target: https://gitter.im/DataFS/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
DataFS is a package manager for data. It manages file versions, dependencies, and metadata for individual use or large organizations.
Configure and connect to a metadata `Manager `_ and multiple data `Services `_ using a specification file and you'll be sharing, tracking, and using your data in seconds.
* Free software: MIT license
* Documentation: https://datafs.readthedocs.io.
Features
--------
* Explicit version and metadata management for teams
* Unified read/write interface across file systems
* Easily create out-of-the-box configuration files for users
* Track data dependencies and usage logs
* Use datafs from python or from the command line
* Permissions handled by managers & services, giving you control over user access
Usage
-----
First, `configure an API `_. Don't worry. It's not too bad. Check out the `quickstart `_ to follow along.
We'll assume we already have an API object created and attached to a service called "local". Once you have this, you can start using DataFS to create and use archives.
.. code-block:: bash
$ datafs create my_new_data_archive --description "a test archive"
created versioned archive
$ echo "initial file contents" > my_file.txt
$ datafs update my_new_data_archive my_file.txt
$ datafs cat my_new_data_archive
initial file contents
Versions are tracked explicitly. Bump versions on write, and read old versions
if desired.
.. code-block:: bash
$ echo "updated contents" > my_file.txt
$ datafs update my_new_data_archive my_file.txt --bumpversion minor
uploaded data to . version bumped 0.0.1 --> 0.1.
$ datafs cat my_new_data_archive
updated contents
$ datafs cat my_new_data_archive --version 0.0.1
initial file contents
Pin versions using a requirements file to set the default version
.. code-block:: bash
$ echo "my_new_data_archive==0.0.1" > requirements_data.txt
$ datafs cat my_new_data_archive
initial file contents
All of these features are available from (and faster in) python:
.. code-block:: python
>>> import datafs
>>> api = datafs.get_api()
>>> archive = api.get_archive('my_new_data_archive')
>>> with archive.open('r', version='latest') as f:
... print(f.read())
...
updated contents
If you have permission to delete archives, it's easy to do. See `administrative tools `_ for tips on setting permissions.
.. code-block:: bash
$ datafs delete my_new_data_archive
deleted archive
See `examples `_ for more extensive use cases.
Installation
------------
``pip install datafs``
Additionally, you'll need a manager and services:
Managers:
* MongoDB: ``pip install pymongo``
* DynamoDB: ``pip install boto3``
Services:
* Ready out-of-the-box:
- local
- shared
- mounted
- zip
- ftp
- http/https
- in-memory
* Requiring additional packages:
- AWS/S3: ``pip install boto``
- SFTP: ``pip install paramiko``
- XMLRPC: ``pip install xmlrpclib``
Requirements
------------
For now, DataFS requires python 2.7. We're working on 3x support.
Todo
----
See `issues `_ to see and add to our todos.
Credits
---------
This package was created by `Justin Simcock `_ and `Michael Delgado `_ of the `Climate Impact Lab `_. Check us out on `github `_.
Major kudos to the folks at `PyFilesystem `_. Thanks also to `audreyr `_ for the wonderful `cookiecutter `_ package, and to `Pyup `_, a constant source of inspiration and our silent third contributor.