https://github.com/snakemake/snakemake-lsh-tutorial-data
Data for the Snakemake Google life science executor tutorial
https://github.com/snakemake/snakemake-lsh-tutorial-data
Last synced: about 1 year ago
JSON representation
Data for the Snakemake Google life science executor tutorial
- Host: GitHub
- URL: https://github.com/snakemake/snakemake-lsh-tutorial-data
- Owner: snakemake
- Created: 2021-03-08T15:08:16.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-08-28T08:05:48.000Z (almost 3 years ago)
- Last Synced: 2025-04-01T07:54:03.291Z (over 1 year ago)
- Language: Python
- Size: 4.69 MB
- Stars: 2
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Example data for the Snakemake google life sciences tutorial
This repository hosts the data needed for the [Snakemake Google Life Sciences tutorial](https://snakemake.readthedocs.io/en/stable/executor_tutorial/google_lifesciences.html).
This is not the repository for the normal Snakemake tutorial, which you can find [here](https://github.com/snakemake/snakemake-tutorial-data).
## Uploaders
### Google Cloud Storage
If you want to upload the data to Google Cloud Storage, you should first
export your `GOOGLE_APPLICATION_CREDENTIALS` for your project:
```bash
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
```
And install the google-cloud-storage client:
```bash
pip install google-cloud-storage
```
You can then use [upload_google_storage.py](upload_google_storage.py) to specify a bucket name
(and an optional subfolder in the bucket) to upload the content of [data](data)
to your bucket. The script takes (as the first argument) the bucket name plus path
(e.g., `/` followed by the local directory path to upload.
The path that you provide (e.g., `data/`) will be removed from the storage path.
As an example:
```bash
python upload_google_storage.py /
```
would be used like:
```bash
python upload_google_storage.py snakemake-testing-data/ data/
```
And it would upload the contents of data (without the data/ prefix) to
the root of the bucket snakemake-testing-data, which does not need to exist,
but you need to have permissions via your Google application credentials
to create or otherwise interact with it. Here is an example of the client
running. Note that it asks for confirmation (y/n) to proceed with the upload:
```bash
$ python upload_google_storage.py snakemake-testing-data/ data
Attempting to get or create bucket snakemake-testing-data
Obtained bucket
The following files will be uploaded:
data/genome.fa.pac -> snakemake-testing-data/genome.fa.pac
data/genome.fa.fai -> snakemake-testing-data/genome.fa.fai
data/genome.fa.ann -> snakemake-testing-data/genome.fa.ann
data/genome.fa.bwt -> snakemake-testing-data/genome.fa.bwt
data/genome.fa.sa -> snakemake-testing-data/genome.fa.sa
data/genome.fa -> snakemake-testing-data/genome.fa
data/genome.fa.amb -> snakemake-testing-data/genome.fa.amb
data/samples/B.fastq -> snakemake-testing-data/samples/B.fastq
data/samples/A.fastq -> snakemake-testing-data/samples/A.fastq
data/samples/C.fastq -> snakemake-testing-data/samples/C.fastq
Would you like to proceed? [y]|n: y
Uploading genome.fa.pac to snakemake-testing-data
Uploading genome.fa.fai to snakemake-testing-data
Uploading genome.fa.ann to snakemake-testing-data
Uploading genome.fa.bwt to snakemake-testing-data
Uploading genome.fa.sa to snakemake-testing-data
Uploading genome.fa to snakemake-testing-data
Uploading genome.fa.amb to snakemake-testing-data
Uploading samples/B.fastq to snakemake-testing-data
Uploading samples/A.fastq to snakemake-testing-data
Uploading samples/C.fastq to snakemake-testing-data
```
You should then be able to see your files in Storage! Good job!
