{"id":18448828,"url":"https://github.com/sequana/downsampling","last_synced_at":"2025-06-30T20:04:31.177Z","repository":{"id":58261304,"uuid":"246141112","full_name":"sequana/downsampling","owner":"sequana","description":"down sample NGS data","archived":false,"fork":false,"pushed_at":"2023-12-20T12:43:58.000Z","size":147,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-30T20:03:56.565Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sequana.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-03-09T21:00:18.000Z","updated_at":"2022-08-30T21:43:02.000Z","dependencies_parsed_at":"2024-01-18T00:30:18.874Z","dependency_job_id":"6b043032-8aa3-4d59-ae60-51391f7fd6a5","html_url":"https://github.com/sequana/downsampling","commit_stats":{"total_commits":22,"total_committers":1,"mean_commits":22.0,"dds":0.0,"last_synced_commit":"9f0649405feba0dad3531d2075ca962812ded6ac"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/sequana/downsampling","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fdownsampling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fdownsampling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fdownsampling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fdownsampling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sequana","download_url":"https://codeload.github.com/sequana/downsampling/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sequana%2Fdownsampling/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262842917,"owners_count":23373165,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T07:17:24.596Z","updated_at":"2025-06-30T20:04:31.121Z","avatar_url":"https://github.com/sequana.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\n.. image:: https://badge.fury.io/py/sequana-downsampling.svg\n     :target: https://pypi.python.org/pypi/sequana_downsampling\n\n.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg\n    :target: http://joss.theoj.org/papers/10.21105/joss.00352\n    :alt: JOSS (journal of open source software) DOI\n\n.. image:: https://github.com/sequana/downsampling/actions/workflows/main.yml/badge.svg\n   :target: https://github.com/sequana/downsampling/actions/workflows/main.yaml \n\n\nThis is is the **downsampling** pipeline from the `Sequana \u003chttps://sequana.readthedocs.org\u003e`_ project\n\n:Overview: downsample NGS data sets\n:Input: a set of FastQ or FASTA files \n:Output: a set of downsampled files\n:Status: production\n:Citation(sequana): Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352\n:Citation(pipeline): \n    .. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.4047837.svg\n       :target: https://doi.org/10.5281/zenodo.4047837\n\n\n\nInstallation\n~~~~~~~~~~~~\n\nYou must install Sequana first::\n\n    pip install sequana\n\nThen, just install this package::\n\n    pip install sequana_downsampling\n\n\nUsage\n~~~~~\n\n::\n\n    sequana_downsampling --help\n    sequana_downsampling --input-directory DATAPATHH\n    sequana_downsampling --downsampling-method random --downsampling-max-entries 100\n    sequana_downsampling --downsampling-method random_pct --downsampling-percent 10 --downsampling-input-format fasta --input-pattern \"whatever*fasta\"\n\nNote that the current implementation handles fastq files (zipped or not) and\nfasta files (uncompressed only)\n\n\nThis creates a directory with the pipeline and configuration file. You will then need \nto execute the pipeline::\n\n    cd downsampling\n    sh downsampling.sh  # for a local run\n\nThis launch a snakemake pipeline. If you are familiar with snakemake, you can \nretrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::\n\n    snakemake -s downsampling.rules -c config.yaml --cores 4 --stats stats.txt\n\nOr use `sequanix \u003chttps://sequana.readthedocs.io/en/master/sequanix.html\u003e`_ interface.\n\nExamples of a set of FastQ zipped files in the current directory:\n\n\n    sequana_downsampling --run --downsampling-method random_pct \n    cd downsampling\n    make clean\n\nThis will create a directory called **downsampling**, and randomly select 10% of\nthe input reads for each file with extension .fastq.gz in the current directory.\nSince **-run** is used, the pipeline is executed automatically. The following\ncommands will enter into the directory and called a Makefile. This will clean\nthe directory for temporary files.\n\nRequirements\n~~~~~~~~~~~~\n\nThis pipelines requires the following executable(s):\n\n- sequana\n- pigz\n\n.. .. image:: https://raw.githubusercontent.com/sequana/downsampling/master/sequana_pipelines/downsampling/dag.png\n\n\nDetails\n~~~~~~~~~\n\nThis pipeline runs **downsampling** in parallel on the input fastq or fasta files (paired or not). If paired, the one-to-one mapping is conserved.\n\nIt can take as input a set of FastQ files, or FastA files. by\ndefault, the pipeline with randomly select 1000 entries from each input files.\nYou can increase this number using --downsampling-max-entries option. If you\nprefer to select a percentage of the entries instead, you can change the\ndownsamping method as follows::\n\n    --downsampling-method random_pct\n\nand change the value if needed (default is 10%)::\n\n    --downsampling-percent 20\n\nNote that input FastQ can be gzipped. Output files are gzipped. FastA input\nfiles must be compressed for now\n\n\n\nRules and configuration details\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nHere is the `latest documented configuration file \u003chttps://raw.githubusercontent.com/sequana/downsampling/master/sequana_pipelines/downsampling/config.yaml\u003e`_\nto be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. \n\n\nChangelog\n~~~~~~~~~\n\n========= ====================================================================\nVersion   Description\n========= ====================================================================\n0.8.5     * cope with R1/R2 paired data properly. Improved make file\n0.8.4     * add missing MANIFEST to include missing requirements.txt\n0.8.3     * comply with new API from sequana_pipetools 0.2.4\n0.8.2     * add a --run option to execute the pipeline directly\n0.8.1     * fix input and N in the random selection\n0.8.0     **First release.**\n========= ====================================================================\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsequana%2Fdownsampling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsequana%2Fdownsampling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsequana%2Fdownsampling/lists"}