{"id":15017471,"url":"https://github.com/hemberg-lab/scrna.seq.datasets","last_synced_at":"2025-08-20T06:33:04.979Z","repository":{"id":89883307,"uuid":"80215611","full_name":"hemberg-lab/scRNA.seq.datasets","owner":"hemberg-lab","description":"Collection of public scRNA-Seq datasets used by our group","archived":false,"fork":false,"pushed_at":"2021-05-04T16:02:10.000Z","size":779,"stargazers_count":161,"open_issues_count":13,"forks_count":59,"subscribers_count":15,"default_branch":"master","last_synced_at":"2024-05-22T18:14:10.701Z","etag":null,"topics":["aws","dataset","docker","jenkins","mkdocs","openstack","s3-storage","single-cell"],"latest_commit_sha":null,"homepage":"https://hemberg-lab.github.io/scRNA.seq.datasets/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hemberg-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-27T14:48:35.000Z","updated_at":"2024-05-12T06:01:02.000Z","dependencies_parsed_at":"2023-06-16T00:00:36.095Z","dependency_job_id":null,"html_url":"https://github.com/hemberg-lab/scRNA.seq.datasets","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hemberg-lab%2FscRNA.seq.datasets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hemberg-lab%2FscRNA.seq.datasets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hemberg-lab%2FscRNA.seq.datasets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hemberg-lab%2FscRNA.seq.datasets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hemberg-lab","download_url":"https://codeload.github.com/hemberg-lab/scRNA.seq.datasets/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230400616,"owners_count":18219831,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","dataset","docker","jenkins","mkdocs","openstack","s3-storage","single-cell"],"created_at":"2024-09-24T19:50:31.477Z","updated_at":"2024-12-19T08:08:38.651Z","avatar_url":"https://github.com/hemberg-lab.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Public scRNA-Seq Datasets\n\nThis repository is used to build [scater](http://bioconductor.org/packages/scater/) objects and reports (in continuous integration manner) for various publicly available scRNA-Seq datasets used by our group. This pipeline is implemented using docker containers and cloud computing. The resulting website is available [here](https://hemberg-lab.github.io/scRNA.seq.datasets/). Below are some notes on the pipeline setup.\n\n## Website\n\nThe website is generated using [MkDocs](http://www.mkdocs.org/) generator. Links to S3 storage and data annotations are added manually. If you are creating a pull request and adding new data please add its annotations to one of the files in the `website` folder.\n\n## S3 storage\n\nTo list files on the S3 storage please use this [link](https://scrnaseq-public-datasets.s3.amazonaws.com/index.html).\n\n## Instance setup\n\n### AWS\n\n1. Launch Amazon Linux EC2 instance.\n2. Using security groups add access to the instance on port 8080.\n3. Connect to instance and [install Jenkins](http://sanketdangi.com/post/62715793234/install-configure-jenkins-on-amazon-linux).\n4. Add permission for Jenkins to run Docker:\n```\nsudo usermod -aG docker jenkins\n```\n\nHard reboot your instance after that. Now Jenkins can run docker images.\n\n5. Install `s3cmd` utility to be able to upload data to the S3 storage:\n```\nsudo apt-get install s3cmd\n```\n\n6. In Jenkins Export S3 key ID, secret key and region as environmental variables. Use secret text option provided by Jenkins. Some details available [here](http://serverfault.com/questions/724730/unable-to-use-aws-cli-in-jenkins-due-to-unable-to-locate-credentials-error).\n\n7. File listing can be setup on AWS S3 bucket using this [plugin](https://github.com/rufuspollock/s3-bucket-listing).\n\n\n### OpenStack Cloud (Sanger)\n\n1. Launch Ubuntu Trusty instance (`j1.large` flavour)\n2. Add the instance to the `default`, `cloudforms_icmp_in`, `cloudforms_ssh_in` `cloudforms_web_in` security groups.\n3. Create additional security group: `TCP` with port 8080 (this is needed for Jenkins) and add your instance to this group.\n4. Associate a floating IP (FLOATING_IP) number with your instance.\n5. Login to instance:\n```\nssh -i ~/.ssh/your_key.pem ubuntu@FLOATING_IP\n```\n6. In the instance [install Jenkins](https://jenkins.io/doc/book/installing/#debian-ubuntu).\n\nTo setup Jenkins after installation go to http://FLOATING_IP:8080\n\n7. In the instance [install docker](https://docs.docker.com/engine/installation/linux/ubuntu/). \n\n8. Resolve docker network issues (Sanger OpenStack problem only):\n```{bash}\nsudo bash -c \"echo '{ \\\"bip\\\": \\\"10.10.0.1/16\\\", \\\"mtu\\\": 1400 }' \u003e /etc/docker/daemon.json\"\n```\n\n9. Add permission for Jenkins to run Docker:\n```\nsudo usermod -aG docker jenkins\n```\n\nHard reboot your instance after that. Now Jenkins can run docker images.\n\n10. Install `s3cmd` utility to be able to upload data to the S3 storage:\n```\nsudo apt-get install s3cmd\n```\n\n11. In Jenkins Export S3 key ID, secret key and region as environmental variables. Use secret text option provided by Jenkins. Some details available [here](http://serverfault.com/questions/724730/unable-to-use-aws-cli-in-jenkins-due-to-unable-to-locate-credentials-error).\n\n## Jenkins build\n\n```\n# build and deploy\nsh deploy.sh $WORKSPACE\n```\n\n## AWS Calculator\n\nTo calculate how much you can spend on AWS one can us the [AWS Calculator](https://calculator.s3.amazonaws.com/index.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhemberg-lab%2Fscrna.seq.datasets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhemberg-lab%2Fscrna.seq.datasets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhemberg-lab%2Fscrna.seq.datasets/lists"}