https://github.com/neurodata/boss-export
Export data from BOSS in bulk from w/in BOSS AWS environment
https://github.com/neurodata/boss-export
Last synced: 21 days ago
JSON representation
Export data from BOSS in bulk from w/in BOSS AWS environment
- Host: GitHub
- URL: https://github.com/neurodata/boss-export
- Owner: neurodata
- License: apache-2.0
- Created: 2019-05-21T20:32:08.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2019-12-11T19:20:01.000Z (over 6 years ago)
- Last Synced: 2025-01-07T23:27:03.862Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 463 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# boss-export
Export data from BOSS s3 bucket in bulk direct to another bucket
This repo contains necessary tools to convert a BOSS dataset to Neuroglancer precomputed format without going through the BOSS endpoint, but by accessing the cuboids directly from S3.
It works by publishing messages (cuboid metadata) to an SQS queue and having a lambda function process those messages, converting and compressing them, into Neuroglancer precomputed format.
You must have read access to the S3 bucket in the BOSS. Additionally, to be able to compute the s3 keys, you need access to the database IDs for collections/experiments/channels in the BOSS that you wish to convert, as those are used in determining they s3 key names.
## Deployment notes
### Lambda
In addition to the basic lambda execution environment permissions, the lambda role also needs
- s3 getobject from BOSS bucket
- s3 putobject and putobject on destination bucket
- ReceiveMessage/DeleteMessage on SQS
### SQS
Queue visibility needs to be [6 times](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html) the lambda timeout