{"id":22391824,"url":"https://github.com/datngu/deepfarm","last_synced_at":"2025-03-26T22:14:27.794Z","repository":{"id":269888763,"uuid":"908748667","full_name":"datngu/DeepFARM","owner":"datngu","description":"DeepFARM","archived":false,"fork":false,"pushed_at":"2025-02-15T23:10:21.000Z","size":24,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-16T00:18:20.027Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datngu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-26T21:36:54.000Z","updated_at":"2025-02-15T23:10:23.000Z","dependencies_parsed_at":"2024-12-27T03:53:16.562Z","dependency_job_id":null,"html_url":"https://github.com/datngu/DeepFARM","commit_stats":null,"previous_names":["datngu/deepfarm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datngu%2FDeepFARM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datngu%2FDeepFARM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datngu%2FDeepFARM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datngu%2FDeepFARM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datngu","download_url":"https://codeload.github.com/datngu/DeepFARM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245743436,"owners_count":20665093,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-05T04:13:59.928Z","updated_at":"2025-03-26T22:14:27.788Z","avatar_url":"https://github.com/datngu.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepFARM Pipeline\n\nThis pipeline is part of the manuscript: \"Sequence-based chromatin activity modeling and regulatory impact prediction of genetic variants in farmed animals using deep learning\". The computational pipeline leverages the power of [Nextflow](https://www.nextflow.io/) for scalable and reproducible workflows.  \n\n## Repository\nGitHub: [DeepFARM](https://github.com/datngu/DeepFARM)\n\n## Author\n- **Dat T Nguyen**  \n- Contact: ndat\u003c@\u003eutexas.edu  \n\n---\n\n## Features\n- Supports multiple learning rates and hyperparameter tuning.\n- Built-in reproducibility and scalability via Nextflow and Singularity.  \n\n## Quick Start\n\n### Requirements\n- **Modules**:\n  - `Nextflow/21.03`\n  - `singularity/rpm`\n- Environment variables:\n  - Set `NXF_SINGULARITY_CACHEDIR` for Singularity.\n  - Provide a valid `TOWER_ACCESS_TOKEN` for Nextflow Tower integration.\n\n### Running the Pipeline\n\nPlease customize it based on your HPC system and don't forget to look at the file nextflow.config.\n\n```bash\n## load modules\nmodule load Nextflow/21.03\nmodule load singularity/rpm\n\n## export env variables\nexport NXF_SINGULARITY_CACHEDIR=/mnt/users/ngda/software/singularity\nexport TOWER_ACCESS_TOKEN=\u003cyour_access_token\u003e\n\n## run the pipeline\nnextflow run main.nf -resume -w work_dir \\\n    --genome /mnt/users/ngda/genomes/cattle/Bos_taurus.ARS-UCD1.2.dna_sm.toplevel.fa \\\n    --chrom 29 \\\n    --val_chrom 21 \\\n    --test_chrom 25 \\\n    --window 200 \\\n    --peaks '/mnt/SCRATCH/ngda/data/Cattle/*.bed' \\\n    -with-tower\n\n\n```\n\n## Parameters and defaut setting\n\n- `params.genome`: Path to the genome file (default: `$baseDir/data/ref/genome.fa`).\n- `params.peaks`: Path to peak files (default: `$baseDir/data/peak/*`).\n- `params.outdir`: Output directory (default: `results`).\n- `params.trace_dir`: Directory for trace files (default: `trace_dir`).\n- `params.chrom`: The largest chromosome index used to build dataset (default: `29`, this mean chromosome 1,2,3,...,28,29 are used)\n- `params.val_chrom`: Chromosome for validation (default: `21`).\n- `params.test_chrom`: Chromosome for testing (default: `25`).\n- `params.window`: Window size for genomic analysis (default: `200`).\n- `params.learning_rates`: Learning rates for model tuning (default: `[1e-3, 1e-4, 5e-4, 5e-5]`).\n\n\n## Output\n\n- Results will be saved in the specified `--outdir` (default: `results`).\n- Trace files for debugging and performance monitoring will be stored in `params.trace_dir` (default: `trace_dir`).\n\n\n## Pre-train models\n\nModel weights of DanQ for cattle, pig, chicken, and salmon are available at:\n\nhttps://github.com/datngu/data/releases/download/v.0.0.4/cattle_DanQ.h5\n\nhttps://github.com/datngu/data/releases/download/v.0.0.4/chicken_DanQ.h5\n\nhttps://github.com/datngu/data/releases/download/v.0.0.4/pig_DanQ.h5\n\nhttps://github.com/datngu/data/releases/download/v.0.0.4/salmon_DanQ.h5\n\n\n\n## Citation\n\nTBA\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatngu%2Fdeepfarm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatngu%2Fdeepfarm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatngu%2Fdeepfarm/lists"}