{"id":13703472,"url":"https://github.com/shenwei356/easy_qsub","last_synced_at":"2026-01-29T13:14:59.252Z","repository":{"id":28375839,"uuid":"31889881","full_name":"shenwei356/easy_qsub","owner":"shenwei356","description":"Easily submitting multiple PBS jobs or running local jobs in parallel.  Multiple input files supported.","archived":false,"fork":false,"pushed_at":"2023-05-24T00:49:56.000Z","size":28,"stargazers_count":28,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-13T10:37:59.326Z","etag":null,"topics":["batch","cluster","cluster-files","qsub","submit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shenwei356.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-03-09T09:40:50.000Z","updated_at":"2024-03-13T05:58:27.000Z","dependencies_parsed_at":"2024-01-05T23:04:00.566Z","dependency_job_id":"208f1e51-5fb6-4513-865c-6b5b30073b03","html_url":"https://github.com/shenwei356/easy_qsub","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Feasy_qsub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Feasy_qsub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Feasy_qsub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenwei356%2Feasy_qsub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shenwei356","download_url":"https://codeload.github.com/shenwei356/easy_qsub/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252458374,"owners_count":21751025,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch","cluster","cluster-files","qsub","submit"],"created_at":"2024-08-02T21:00:55.326Z","updated_at":"2026-01-29T13:14:59.217Z","avatar_url":"https://github.com/shenwei356.png","language":"Python","funding_links":[],"categories":["Data Processing"],"sub_categories":["Command Line Utilities"],"readme":"# easy_qsub\n\nEasily submitting multiple PBS jobs or running local jobs in parallel. Multiple input files supported.\n\n## Submitting PBS jobs\n\neasy_qsub submits PBS jobs with script template, avoid repeatedly editing PBS scripts.\n\nDefault template (```~/.easy_qsub/default.pbs```):\n\n```\n#PBS -S /bin/bash\n#PBS -N $name\n#PBS -q $queue\n#PBS -l ncpus=$ncpus\n#PBS -l mem=$mem\n#PBS -l walltime=$walltime\n#PBS -V\n\ncd $$PBS_O_WORKDIR\necho run on node: $$HOSTNAME \u003e\u00262\n\n$cmd\n```\nGenerated PBS scripts are saved in ```/tmp/easy_qsub-user```.\nIf jobs are submitted successfuly, PBS scripts will be moved to current directory.\nIf not, they will be removed.\n\n## Support for multiple inputs\n\nInspired by [qtask](https://github.com/mbreese/qtask), **multiple inputs**  is supported\n (See example 2). If \"{}\" appears in a command, it will be replaced\nwith the current filename. Four formats are supported.\nFor example, for a file named \"a/b/read_1.fq.gz\":\n\nformat    |target                      |result\n:---------|:---------------------------|:---------------\n{}        |full path                   |a/b/read_1.fq.gz\n{%}       |basename                    |read_1.fq.gz\n{^.fq.gz} |remove suffix from full path|a/b/read_1\n{%^.fq.gz}|remove suffix from basename |read_1\n\n\n## Running local jobs in parallel\n\nIt also support runing commands locally with option ```-lp``` (parallelly) or ```-ls``` (serially).\nThis make it **easy to switch between cluster and local machine**.\n\n\n## Best partner: ```cluster_files```\n\nNew version: https://github.com/shenwei356/cluster_files\n\nTo make best use of the support for multiple input, a script ```cluster_files``` is added to\ncluster files into multiple directories by creating symbolic links or moving files (See example 3,4).\n**It's useful for programs which take one directory as input.**\n\nAnother useful scene is to apply different jobs to a same dataset. One bad directory structure is:\n\n    datasets/\n    ├── A\n    ├── A.stage1\n    ├── A.stage2\n    ├── B\n    ├── B.stage1\n    └── B.stage2\n\nA flexible structure can be organsize by `cluster_files`.\nInstead of changing original directory structure, using links could be more clear and flexible.\n\n    datasets\n    ├── A\n    └── B\n    datasets.stage1\n    ├── A\n    └── B\n    datasets.stage2\n    ├── A\n    └── B\n\n\n\n## Examples\n\n1) Submit a single job\n\n        easy_qsub 'ls -lh'\n\n2) Submit multiple jobs, runing fastqc for a lot of fq.gz files\n\n        easy_qsub -n 8 -m 2GB 'mkdir -p QC/{%^.fq.gz}.fastqc; zcat {} | fastqc -o QC/{%^.fq.gz}.fastqc stdin' *.fq.gz\n\n    Excuted commands are:\n\n        mkdir -p QC/read_1.fastqc; zcat read_1.fq.gz | fastqc -o QC/read_1.fastqc stdin\n        mkdir -p QC/read_2.fastqc; zcat read_2.fq.gz | fastqc -o QC/read_2.fastqc stdin\n\n    Dry run with -vv\n\n        easy_qsub -n 8 -m 2GB 'mkdir -p QC/{%^.fq.gz}.fastqc; zcat {} | fastqc -o QC/{%^.fq.gz}.fastqc stdin' *.fq.gz -vv\n\n3) Supposing a directory ```rawdata``` containing **paired files** as below.\n\n        $ tree rawdata\n        rawdata\n        ├── A2_1.fq.gz\n        ├── A2_1.unpaired.fq.gz\n        ├── A2_2.fq.gz\n        ├── A2_2.unpaired.fq.gz\n        ├── A3_1.fq.gz\n        ├── A3_1.unpaired.fq.gz\n        ├── A3_2.fq.gz\n        ├── A3_2.unpaired.fq.gz\n        └── README.md\n\n\n    And I have a program ```script.py```, which takes a directory as input and do some thing\n    with the **paired files**. Command is like this, ```script.py dirA```.\n\n    It is slow by submiting jobs like example 2), handing A2_\\*.fq.gz and then A3_\\*.fq.gz.\n    We can split ```rawdata``` directory into multiple directories (cluster files by the prefix),\n    and submit jobs for all directories.\n\n        cluster_files -p '(.+?)_\\d\\.fq\\.gz$' rawdata -o rawdata.cluster\n\n        tree rawdata.cluster/\n        rawdata.cluster/\n        ├── A2\n        │   ├── A2_1.fq.gz -\u003e ../../rawdata/A2_1.fq.gz\n        │   └── A2_2.fq.gz -\u003e ../../rawdata/A2_2.fq.gz\n        └── A3\n            ├── A3_1.fq.gz -\u003e ../../rawdata/A3_1.fq.gz\n            └── A3_2.fq.gz -\u003e ../../rawdata/A3_2.fq.gz\n\n        easy_qsub 'script.py {}' rawdata.split/*\n\n    Another example (e.g. some assembler can handle unpaired reads too):\n\n        cluster_files -p '(.+?)_\\d.*\\.fq\\.gz$' rawdata -o rawdata.cluster2\n\n        tree rawdata.cluster2\n        rawdata.cluster2\n        ├── A2\n        │   ├── A2_1.fq.gz -\u003e ../../rawdata/A2_1.fq.gz\n        │   ├── A2_1.unpaired.fq.gz -\u003e ../../rawdata/A2_1.unpaired.fq.gz\n        │   ├── A2_2.fq.gz -\u003e ../../rawdata/A2_2.fq.gz\n        │   └── A2_2.unpaired.fq.gz -\u003e ../../rawdata/A2_2.unpaired.fq.gz\n        └── A3\n            ├── A3_1.fq.gz -\u003e ../../rawdata/A3_1.fq.gz\n            ├── A3_1.unpaired.fq.gz -\u003e ../../rawdata/A3_1.unpaired.fq.gz\n            ├── A3_2.fq.gz -\u003e ../../rawdata/A3_2.fq.gz\n            └── A3_2.unpaired.fq.gz -\u003e ../../rawdata/A3_2.unpaired.fq.gz\n\n\n4) Another example (complexed directory structure)\n\n        tree rawdata2\n        rawdata2\n        ├── OtherDir\n        │   └── abc.fq.gz.txt\n        ├── S1\n        │   ├── A2_1.fq.gz\n        │   ├── A2_1.unpaired.fq.gz\n        │   ├── A2_2.fq.gz\n        │   ├── A2_2.unpaired.fq.gz\n        │   ├── A4_1.fq.gz\n        │   └── A4_2.fq.gz\n        └── S2\n            ├── A3_1.fq.gz\n            ├── A3_1.unpaired.fq.gz\n            ├── A3_2.fq.gz\n            └── A3_2.unpaired.fq.gz\n\n        cluster_files -p '(.+?)_\\d\\.fq\\.gz$' rawdata2/\n\n        tree rawdata2.cluster/\n        rawdata2.cluster/\n        ├── A2\n        │   ├── A2_1.fq.gz -\u003e ../../rawdata2/S1/A2_1.fq.gz\n        │   └── A2_2.fq.gz -\u003e ../../rawdata2/S1/A2_2.fq.gz\n        ├── A3\n        │   ├── A3_1.fq.gz -\u003e ../../rawdata2/S2/A3_1.fq.gz\n        │   └── A3_2.fq.gz -\u003e ../../rawdata2/S2/A3_2.fq.gz\n        └── A4\n            ├── A4_1.fq.gz -\u003e ../../rawdata2/S1/A4_1.fq.gz\n            └── A4_2.fq.gz -\u003e ../../rawdata2/S1/A4_2.fq.gz\n\n        cluster_files -p '(.+?)_\\d\\.fq\\.gz$'  rawdata2/ -k -f  # keep original dir structure\n\n        tree rawdata2.cluster/\n        rawdata2.cluster/\n        ├── S1\n        │   ├── A2\n        │   │   ├── A2_1.fq.gz -\u003e ../../../rawdata2/S1/A2_1.fq.gz\n        │   │   └── A2_2.fq.gz -\u003e ../../../rawdata2/S1/A2_2.fq.gz\n        │   └── A4\n        │       ├── A4_1.fq.gz -\u003e ../../../rawdata2/S1/A4_1.fq.gz\n        │       └── A4_2.fq.gz -\u003e ../../../rawdata2/S1/A4_2.fq.gz\n        └── S2\n            └── A3\n                ├── A3_1.fq.gz -\u003e ../../../rawdata2/S2/A3_1.fq.gz\n                └── A3_2.fq.gz -\u003e ../../../rawdata2/S2/A3_2.fq.gz\n\n\n## Installation\n\n`easy_qsub` and `cluster_files` is a single script written in Python using standard library.\nIt's Python 2/3 compatible, version 2.7 or later.\n\nYou can simply save the script [easy_qsub](https://raw.githubusercontent.com/shenwei356/easy_qsub/master/easy_qsub)\nand [cluster_files](https://raw.githubusercontent.com/shenwei356/easy_qsub/master/cluster_files)\nto directory included in environment PATH, e.g ```/usr/local/bin```.\n\nOr\n\n    git clone https://github.com/shenwei356/easy_qsub.git\n    cd easy_qsub\n    sudo copy easy_qsub cluster_files /usr/local/bin\n\n## Usage\n\neasy_qsub\n\n```\nusage: easy_qsub [-h] [-lp | -ls] [-N NAME] [-n NCPUS] [-m MEM] [-q QUEUE]\n                 [-w WALLTIME] [-t TEMPLATE] [-o OUTFILE] [-v]\n                 command [files [files ...]]\n\nEasily submitting PBS jobs with script template. Multiple input files\nsupported.\n\npositional arguments:\n  command               command to submit\n  files                 input files\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -lp, --local_p        run commands locally, parallelly\n  -ls, --local_s        run commands locally, serially\n  -N NAME, --name NAME  job name\n  -n NCPUS, --ncpus NCPUS\n                        cpu number [logical cpu number]\n  -m MEM, --mem MEM     memory [5gb]\n  -q QUEUE, --queue QUEUE\n                        queue [batch]\n  -w WALLTIME, --walltime WALLTIME\n                        walltime [30:00:00:00]\n  -t TEMPLATE, --template TEMPLATE\n                        script template\n  -o OUTFILE, --outfile OUTFILE\n                        output script\n  -v, --verbose         verbosely print information. -vv for just printing\n                        command not creating scripts and submitting jobs\n\nNote: if \"{}\" appears in a command, it will be replaced with the current\nfilename. More format supported: \"{%}\" for basename, \"{^suffix}\" for clipping\n\"suffix\", \"{%^suffix}\" for clipping suffix from basename. See more:\nhttps://github.com/shenwei356/easy_qsub\n```\n\ncluster_files\n\n```\nusage: cluster_files [-h] [-o OUTDIR] [-p PATTERN] [-k] [-m] [-f] indir\n\nclustering files by regular expression [V3.0]\n\npositional arguments:\n  indir                 source directory\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -o OUTDIR, --outdir OUTDIR\n                        out directory [\u003cindir\u003e.cluster]\n  -p PATTERN, --pattern PATTERN\n                        pattern (regular expression) of files in indir. if not\n                        given, it will be the longest common substring of the\n                        files. GROUP (parenthese) should be in the regular\n                        expression. Captured group will be the cluster name.\n                        e.g. \"(.+?)_\\d\\.fq\\.gz\"\n  -k, --keep            keep original dir structure\n  -m, --mv              moving files instead of creating symbolic links\n  -f, --force           force file overwriting, i.e. deleting existed out\n                        directory\n\n```\n\n## Copyright\n\nCopyright (c) 2015-2017, Wei Shen (shenwei356@gmail.com)\n\n[MIT License](https://github.com/shenwei356/easy_qsub/blob/master/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshenwei356%2Feasy_qsub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshenwei356%2Feasy_qsub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshenwei356%2Feasy_qsub/lists"}