{"id":31913464,"url":"https://github.com/alejandrogzi/noel","last_synced_at":"2025-10-13T18:50:08.863Z","repository":{"id":203898730,"uuid":"710650853","full_name":"alejandrogzi/noel","owner":"alejandrogzi","description":"GTF/GFF per gene non-overlapping exon length calculator ","archived":false,"fork":false,"pushed_at":"2024-01-02T23:52:47.000Z","size":752,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-09T03:12:25.721Z","etag":null,"topics":["exon","gene-annotation","gff","gtf","length","non-overlapping"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alejandrogzi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-10-27T06:34:38.000Z","updated_at":"2025-08-30T08:32:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"710c8ca9-b30e-4bcc-b3c4-2b7680dfbd32","html_url":"https://github.com/alejandrogzi/noel","commit_stats":null,"previous_names":["alejandrogzi/coco","alejandrogzi/noel"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/alejandrogzi/noel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alejandrogzi%2Fnoel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alejandrogzi%2Fnoel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alejandrogzi%2Fnoel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alejandrogzi%2Fnoel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alejandrogzi","download_url":"https://codeload.github.com/alejandrogzi/noel/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alejandrogzi%2Fnoel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279016622,"owners_count":26085853,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["exon","gene-annotation","gff","gtf","length","non-overlapping"],"created_at":"2025-10-13T18:49:31.334Z","updated_at":"2025-10-13T18:50:08.858Z","avatar_url":"https://github.com/alejandrogzi.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"![version-badge](https://img.shields.io/badge/version-0.2.0-green)\n![Crates.io](https://img.shields.io/crates/v/noel)\n![GitHub](https://img.shields.io/github/license/alejandrogzi/noel?color=blue)\n\n\n# noel\n\nAn extremely fast GTF/GFF per gene Non-Overlapping Exon Length calculator (noel) written in Rust.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg width=600 align=\"center\" src=\"./supp/overview.png\"\u003e\n\u003c/p\u003e\n\nTakes in a GTF/GFF file and outputs a .txt file with non-overlapping exon lengths. \n\n## Usage\n``` rust\nUsage: noel --i \u003cGTF/GFF\u003e --o \u003cOUTPUT\u003e\n\nArguments:\n    --i \u003cGTF/GFF\u003e: GTF/GFF file\n    --o \u003cOUTPUT\u003e: .txt file\n\nOptions:\n    --help: print help\n    --version: print version\n```\n\n#### crate: [https://crates.io/crates/noel](https://crates.io/crates/noel)\n\n\n## Installation\nto install noel on your system follow this steps:\n1. download rust: `curl https://sh.rustup.rs -sSf | sh` on unix, or go [here](https://www.rust-lang.org/tools/install) for other options\n2. run `cargo install noel` (make sure `~/.cargo/bin` is in your `$PATH` before running it)\n4. use `noel` with the required arguments\n\n## Build\nto build noel from this repo, do:\n\n1. get rust (as described above)\n2. run `git clone https://github.com/alejandrogzi/noel.git \u0026\u0026 cd noel`\n3. run `cargo run --release \u003cGTF/GFF\u003e \u003cOUTPUT\u003e` (arguments are positional, so you do not need to specify --i/--o)\n\n## Library\nto include noel as a library and use it within your project follow these steps:\n1. include `noel = 0.2.0` or `noel = \"*\"` under `[dependencies]` in your `Cargo.toml` file or just run `cargo add noel` from the command line\n2. the library name is `noel`, to use it just write:\n\n    ``` rust\n    use noel::{noel, noel_reader}; \n    ```\n    or \n    ``` rust\n    use noel::*;\n    ```\n3. invoke\n    ``` rust\n    let exons: HashMap\u003cString, Vec\u003c(u32, u32)\u003e\u003e = noel_reader(input: \u0026PathBuf)?\n    let lengths: Vec\u003c(String, u32)\u003e = noel(exons)\n    ```\n4. you will end with a HashMap, where each gene name (gene_id) is a key to its length\n    ```text\n    [(\"ENSG00000261469\": 533), (\"ENSG00000150990\": 6908), (\"ENSG00000136490\": 4751), (\"ENSG00000290760\": 801)]\n    ```\n\n\n\n\n## Benchmark\n\nThere are a handful amount of open-sourced tools/software/scripts to calculate non-overlapping exon lengths, namely: Kooi [1], Sun [2], and Slowikowski [3, 4] scripts, and gtftools (-l flag) [5]. The Non-Overlapping Exon Length calculator (NOEL; referred just as \"noel\"), is introduced as a novel tool that outperforms the aforementioned software due to its remarkable performance. \n\nTo assess the efficiency of noel and test the capabilities of other available scripts/tools, I used run times and memory usage estimates, based on 5 consecutive runs. This evaluation focused on two major gene annotation formats: GTF and GFF. It is worth nothing, however, that only 3 tools are capable of handling GFF files: Slowikowski, Sun* (described below) and noel. Before any batch of runs, I first modified each script to be CLI-responsive. Additionally, I further edited Sun's script to be able to handle GFF inputs by changing a regex pattern. No performance enhance-related changes or breaking structural modifications were applied.\n\nLastly, to evaluate the output consistency of the top-ranked tools (Sun, gtftools and noel), three species were used: *Homo sapiens* (GRCh38, GENCODE 44), *Canis lupus familiaris* (ROS_Cfam_1.0, Ensembl 110), and *Mus musculus* (GRCm39, GENCODE M33). \n\n\u003cp align=\"center\"\u003e\n    \u003cimg width=550 align=\"center\" src=\"./supp/time.png\"\u003e\n\u003c/p\u003e\n\nThe diverse methodologies to calculate non-overlapping exon lengths led to noticeable differences in run times. While Kooi and Slowikowski scripts were the last ranked (\u003e250s for GENCODE 44) with GTF files and Slowikowski only for GFF files (~300s for GENCODE 44); Sun, gtftools and noel were the most efficient options (\u003c50s for GENCODE 44). When analyzing these top-ranked tools, it is quickly perceived the noel's dominance over its competitors. For GTF files, noel achieves noticeably faster computation times when compared to gtftools (x4.3 faster; 4.2s vs 17.9s) and to Sun's script (x10.9 speedup; 4.2s vs 45.7s). On the other hand, noel performs the calculations on GFF3 x12.6 times faster than Sun's script (3.9s vs 49.7s).\n\n\u003cp align=\"center\"\u003e\n    \u003cimg width=550 align=\"center\" src=\"./supp/mem.png\"\u003e\n\u003c/p\u003e\n\nA similar pattern is seen when examining memory usage estimates based on GTF files. Three distinct groups of tools can be identified: high-memory-consuming tools (Sun, Slowikowski, and Kooi), tools with moderate memory usage (gtftools), and the most memory-efficient option (noel). Here, noel exhibited a significantly lower memory usage when compared to gtftools (x9.1 less; 42.9 Mb vs 391.8 Mb) and to Kooi (x73.1 less; 42.9 Mb vs 3.1 Gb). With GFF files, on the other hand, noel achieved a striking x146.1-fold reduction in memory usage compared to Slowikowski (62,700 genes).  \n\n\n\u003cp align=\"center\"\u003e\n    \u003cimg width=550 align=\"center\" src=\"./supp/corr.png\"\u003e\n\u003c/p\u003e\n\nThe comparison of output from the top-ranked tools, including Sun, gtftools, and noel, yielded consistently paired estimates for each species, resulting in a high correlation (R = 0.99). Notably, both noel and Sun's script demonstrated a one-to-one correspondence for every gene in all tested annotation models. In contrast, gtftools exhibited limitations in processing genes, with a slight deficiency in the human and mouse models (0.05% and 0.06%, respectively), and a more substantial shortfall in the dog model (26%). Furthermore, noel outperformed the other tools, significantly improving runtime efficiency in both the mouse and dog models, with a speedup of at least 2.3 times.\n\nBased on this comparative analysis between existing scripts/software to calculate non-overlapping exonic lengths and noel, it is evident that this tool represents a significant improvement. These findings unveil the potential of noel as a valuable resource to provide a fast and efficient way to automate non-overlapping exon length calculations.\n\n## References\n\n[1] https://www.biostars.org/p/83901/\n\n[2] https://gist.github.com/jsun/aeca04ee2c5b5cc53ad795b660edd6c3\n\n[3] https://gist.github.com/slowkow/8101481\n\n[4] https://gist.github.com/slowkow/8101509#file-coding_lengths-py\n\n[5] Hong-Dong Li, Cui-Xiang Lin, Jiantao Zheng, GTFtools: a software package for analyzing various features of gene models, Bioinformatics, Volume 38, Issue 20, 15 October 2022, Pages 4806–4808, https://doi.org/10.1093/bioinformatics/btac561\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falejandrogzi%2Fnoel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falejandrogzi%2Fnoel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falejandrogzi%2Fnoel/lists"}