https://github.com/datajoint/djarchive-client

DataJoint Archive Client
https://github.com/datajoint/djarchive-client

Last synced: about 2 months ago
JSON representation

DataJoint Archive Client

Host: GitHub
URL: https://github.com/datajoint/djarchive-client
Owner: datajoint
Created: 2021-04-09T16:47:03.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2021-12-20T20:21:08.000Z (over 3 years ago)
Last Synced: 2025-04-05T19:51:13.470Z (3 months ago)
Language: Python
Size: 2.48 MB
Stars: 0
Watchers: 10
Forks: 8
Open Issues: 2
Metadata Files:
- Readme: README
- License: COPYING

Awesome Lists containing this project

README

        .. -*- mode: rst -*-

================

djarchive-client

================

This is 'djarchive-client', a library for interfacing with DataJoint Neuro's

data publication service.

Status: WIP; currently for internal/prototyping use.

Installation & Setup

====================

Installation & Setup instructions are as follows:

  1) Install the package:

     ``$ pip install .``

  2) If applicable, configure datajoint with appropriate dj.config['custom'] 

     values:

      Admin usage expects dj.config['custom'] values for:

        - djarchive.access_key

        - djarchive.secret_key

      Client and admin usage allow overriding dj.config['custom']

      defaults for:

        - djarchive.bucket

        - djarchive.endpoint

      This should only be needed for development or custom server

      configurations.

Usage via 'djarchive' utility script

====================================

The 'djarchive' utility script can be used for CLI interactions with the

data archive. Usage synopsys is described in the following subsections.

Browsing & Retrieving Datasets

------------------------------

The following example shows a minimal usage synopsis to navigate the available

datasets and to download a single dataset.

.. code-block:: sh

  $ djarchive datasets # list datasets

  some-set

  $ djarchive revisions # list all datasets & revisions

  some-set,000

  other-set,001

  $ djarchive revisions some-set # list revisions for dataset 'some-set'

  some-set,000

  $ djarchive download some-set 000 ./some-set-000 # download some-set rev. 000

File Integrity, Download Caching & Error Handling

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The djarchive client provides integrity protection in the form of a

file manifest, called 'djarchive-manifest.csv', which is a CSV list

of file sizes, sha1 hashes, and file subpaths for a given dataset.

By default, files having path components beginning with '.' are not

considered for inclusion within archives. This behavior can be overridden

by setting the dj.config.filename_filter property to a regular expression

containing the desired filtering pattern, or the special string "''",

which is treated as matching the empty string ('^$') and so should never

match a file.

Retrieval of datasets begins with the download of this manifest, which

is then used to direct the remaining download and to verify download

actions complete successfullly.

To allow for resuming of partially downloaded datasets, local files in the

target directory of the download are compared against the manifest. If a

local file matches the manifest contents, it is not re-retrieved, and if

a local file does not match the manifest contents, it is re-retrieved

without confirmation.

Uploading Datasets

------------------

Datasets can be uploaded using the djarchive client using the 'upload'

functionality. In short:

.. code-block:: sh

  $ djarchive upload my-dataset 1 /path/to/my-dataset

If a djarchive manifest is not present in /path/to/my-dataset, it will

be generated as part of the upload process, with the manifest uploaded

as the last step to indicate that the dataset is complete.

If a djarchive manifest is present in /path/to/my-dataset, files will be

checked against the manifest as part of the upload process, and an error

will be signalled if files exist in the dataset folder which are not

tracked in the manifest, or if files do not match the expected manifest

values.

The djarchive manifest file can also be generated as a separate step via

the 'manifest' command:

.. code-block:: sh

  $ djarchive manifest /path/to/my-dataset

Usage via 'djarchive_client' python pacakge

===========================================

The djarchive_client python package contains the logic for interacting

with the archive. Best reference is the code/docstrings; minimal usage

example is as follows:

.. code-block:: python

  >>> from djarchive_client import client

  >>> c = client()

  >>> c.datasets()

  

  >>> list(c.datasets())

  ['some-set']

  >>> c.revisions()

  

  >>> list(c.revisions())

  [('some-set', '000'), ('other-set', '001')]

  >>> list(c.revisions('some-set'))

  [('some-set', '000')]

  >>> c.download('some-set', '000', './some-set-000', create_target=True)

Logging & Output

================

Functions in the djarchive_client library and the djarchive utility

currently do not produce output in the normal case.  In some cases,

such as for viewing per-file download status messages, more detailed

output may be desired.

In the script case, setting ``dj.config['loglevel']`` or the environment

variable ``DJARCHIVE_LOGLEVEl`` to ``DEBUG`` will increase logging

output. Additionally, output can be logged to a file by setting

``dj.config['custom']['logfile']`` to a string containing the path to a

desired logfile.

For library usage, setting the python logging for the ``djarchive_client``

module to ``logging.DEBUG`` will enable more verbose output. See the function

``logsetup`` in ``scripts/djarchive`` for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/datajoint/djarchive-client

Awesome Lists containing this project

README