{"id":38514776,"url":"https://github.com/ulelab/clippy","last_synced_at":"2026-01-17T06:27:14.656Z","repository":{"id":37097803,"uuid":"281914900","full_name":"ulelab/clippy","owner":"ulelab","description":"A wrapper around scipy \"find_peaks\" function to enable peak calling of CLIP data.","archived":false,"fork":false,"pushed_at":"2025-06-23T13:33:15.000Z","size":93194,"stargazers_count":2,"open_issues_count":4,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-23T14:37:37.795Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ulelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-23T09:57:16.000Z","updated_at":"2025-06-23T13:33:18.000Z","dependencies_parsed_at":"2023-01-22T09:15:27.096Z","dependency_job_id":null,"html_url":"https://github.com/ulelab/clippy","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/ulelab/clippy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulelab%2Fclippy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulelab%2Fclippy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulelab%2Fclippy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulelab%2Fclippy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ulelab","download_url":"https://codeload.github.com/ulelab/clippy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulelab%2Fclippy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28502265,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T04:31:57.058Z","status":"ssl_error","status_checked_at":"2026-01-17T04:31:45.816Z","response_time":85,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-17T06:27:13.794Z","updated_at":"2026-01-17T06:27:14.621Z","avatar_url":"https://github.com/ulelab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Clippy - Interactive, intuitive peak calling for CLIP data\n\nA wrapper around scipy \"[find_peaks](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html)\" function to enable peak calling of CLIP data.\n\n![A dumb joke](readme_assets/smallerclippy.png)\n\n## Concept\n\nUsing the annotation provided, crosslinks over each gene are smoothed using a rolling mean. The window can be decided by the user. For each gene the mean of the smoothed signal is taken (red line) and the mean + (mean * adjustment factor) (green line) is taken. The mean is used to define the minimum height of a peak. The mean + (mean * adjustment factor) is taken to define the minimum prominence of a peak. Please see [here](https://en.wikipedia.org/wiki/Topographic_prominence#:~:text=The%20prominence%20of%20a%20peak,or%20key%20saddle%2C%20or%20linking) for the definition of topographical prominence. Essentially this parameter limits shallow peaks being called in regions where there is a clearly more prominent peak. \n\nIn the image below, you can see the positions the algorithm picks out in this gene as peaks.\n\n![Graph of the gene PMT2 generated by Clippy's interactive plotting mode](readme_assets/pmt2_default.png)\n\nClippy also has an interactive mode, allowing users to tune the parameters to their protein of interest.\n\n![Animated GIF of Clippy's interactive plotting mode](readme_assets/pmt2_demo.gif)\n\n## Installation\n\nClippy is now available to install from Bioconda:\n```\nconda create --name clippy python=3.8\nconda activate clippy\nconda install -c bioconda -c conda-forge clippy \n```\nIf you are having problems solving the environment try switching to mamba solver:\n```\nconda config --set solver libmamba\n```\nUsing newer versions of bedtools \u003e2.26.0 will break Clippy.\n\n## Run test data\n\nTo start the interactive parameter search server for the test yeast data, run:\n\n```\nclippy --input_bed tests/data/crosslinkcounts.bed --output_prefix OUTPUT_PREFIX \\\n       --annotation tests/data/annot.gff --genome_file tests/data/genome.fa.fai -int\n```\n\n## Usage\n\n```\nusage: clippy [-h] [-v] -i INPUT_BED -o OUTPUT_PREFIX -a ANNOTATION -g GENOME_FILE\n              [-n [WINDOW_SIZE]] [-w [WIDTH]] [-x [MIN_PROM_ADJUST]] [-mx [MIN_HEIGHT_ADJUST]]\n              [-mg [MIN_GENE_COUNTS]] [-mb [MIN_PEAK_COUNTS]] [-alt [ALT_FEATURES]]\n              [-up [UPSTREAM_EXTENSION]] [-down [DOWNSTREAM_EXTENSION]] [-nei]\n              [-inter [INTERGENIC_PEAK_THRESHOLD]] [-t [THREADS]] [-cf [CHUNKSIZE_FACTOR]] [-int]\n\nCall CLIP peaks.\n\nrequired arguments:\n  -i INPUT_BED, --input_bed INPUT_BED\n                        bed file containing cDNA counts at each crosslink position\n  -o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX\n                        prefix for output files\n  -a ANNOTATION, --annotation ANNOTATION\n                        gtf annotation file\n  -g GENOME_FILE, --genome_file GENOME_FILE\n                        genome file containing chromosome lengths. Also known as a FASTA index\n                        file, which usually ends in .fai. This file is used by BEDTools for\n                        genomic operations\n\noptional peak size arguments:\n  Control the size of the peaks called\n\n  -n [WINDOW_SIZE], --window_size [WINDOW_SIZE]\n                        rolling mean window size [DEFAULT 10]\n  -w [WIDTH], --width [WIDTH]\n                        proportion of prominence to calculate peak widths at. Smaller values will\n                        give narrow peaks and large values will give wider peaks [DEFAULT 0.4]\n\noptional peak filtering arguments:\n  Control how peaks are filtered\n\n  -x [MIN_PROM_ADJUST], --min_prom_adjust [MIN_PROM_ADJUST]\n                        adjustment for minimum prominence threshold, calculated as this value\n                        multiplied by the mean [DEFAULT 1.0]\n  -mx [MIN_HEIGHT_ADJUST], --min_height_adjust [MIN_HEIGHT_ADJUST]\n                        adjustment for the minimum height threshold, calculated as this value\n                        multiplied by the mean [DEFAULT 1.0]\n  -mg [MIN_GENE_COUNTS], --min_gene_counts [MIN_GENE_COUNTS]\n                        minimum cDNA counts per gene to look for peaks [DEFAULT 5]\n  -mb [MIN_PEAK_COUNTS], --min_peak_counts [MIN_PEAK_COUNTS]\n                        minimum cDNA counts per broad peak [DEFAULT 5]\n\noptional annotation arguments:\n  Control how the gene annotation is interpreted and used\n\n  -alt [ALT_FEATURES], --alt_features [ALT_FEATURES]\n                        A list of alternative GTF features to set individual thresholds on in the\n                        comma-separated format \u003calt_feature_name\u003e-\u003cgtf_key\u003e-\u003csearch_pattern\u003e\n  -up [UPSTREAM_EXTENSION], --upstream_extension [UPSTREAM_EXTENSION]\n                        upstream extension added to gene models [DEFAULT 0]\n  -down [DOWNSTREAM_EXTENSION], --downstream_extension [DOWNSTREAM_EXTENSION]\n                        downstream extension added to gene models [DEFAULT 0]\n  -nei, --no_exon_info  Turn off individual exon and intron thresholds\n  -inter [INTERGENIC_PEAK_THRESHOLD], --intergenic_peak_threshold [INTERGENIC_PEAK_THRESHOLD]\n                        Intergenic peaks are called by first creating intergenic regions and\n                        calling peaks on the regions as though they were genes. The regions are\n                        made by expanding intergenic crosslinks and merging the result. This\n                        parameter is the threshold number of summed cDNA counts required to\n                        include a region. If set to zero, the default, no intergenic peaks will\n                        be called. When using this mode, the intergenic regions used will be\n                        output as a GTF file. [DEFAULT 0]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -v, --version         show program's version number and exit\n  -t [THREADS], --threads [THREADS]\n                        number of threads to use. [DEFAULT 1]\n  -cf [CHUNKSIZE_FACTOR], --chunksize_factor [CHUNKSIZE_FACTOR]\n                        A factor used to control the number of jobs given to a thread at a time.\n                        A larger number reduces the number of jobs per chunk. Only increase if\n                        you experience crashes [DEFAULT 16]\n  -int, --interactive   starts a Dash server to allow for interactive parameter tuning\n```\n\n*A note on annotation gff*\n\nThe code only requires that you have a feature labelled \"gene\" in the 3rd column of your gff, and assumes that the 9th column of your gff will uniquely identify your genes and contain some kind of gene name or ID.\n\n## Developer Functions\n\nIf you plan to contribute to the Clippy code we have some helpful functions for development. To run the automated testing, use:\n\n```\npytest --cov=clip -k \"not profiling and not web\"\n```\n\nYou might be interested in how long certain functions take to run. To run the profiling code:\n\n```\npytest -k profiling --profile-svg\npython -m gprof2dot -f pstats prof/get_the_peaks.out | dot -Tpdf -o prof/get_the_peaks.pdf\n```\n\n### Test data generation\n\nTest data for the automated testing was generated as follows.\n\nFor CEP295 data:\n\n```\ncat tests/data/gencode.v38.annotation.gtf | \\\n    awk '{if($1==\"chr11\" \u0026\u0026 93661682\u003c=$4 \u0026\u0026 $5\u003c=93730358){print($0)}}' \u003e \\\n    tests/data/gencode.v38.cep295.gtf\n\ngunzip -c tests/data/tardbp-egfp-hek293-2-20201021-ju_mapped_to_genome_reads_single.bed.gz | \\\n    awk '{if($1==\"chr11\" \u0026\u0026 93661682\u003c=$2 \u0026\u0026 $3\u003c=93730358){print($0)}}' \u003e tests/data/cep295.bed\n```\n\nFor RBFOX2 data:\n\n```\ncat tests/data/gencode.v38.annotation.gtf | \\\n    awk '{if($1==\"chr19\" \u0026\u0026 10000000\u003c=$4 \u0026\u0026 $5\u003c=11000000){print($0)}}' \u003e \\\n    tests/data/gencode.v38.chr19_10M_11M.gtf\n\ngunzip -c tests/data/HepG2_RBFOX2.xl.bed.gz | \\\n    awk '{if($1==\"chr19\" \u0026\u0026 10000000\u003c=$2 \u0026\u0026 $3\u003c=11000000){print($0)}}' \u003e \\\n    tests/data/rbfox2_chr19_10M_11M.bed\n```\n\n## Authors\n\nCharlotte Capitanchik - charlotte.capitanchik@crick.ac.uk\n\nMarc Jones - marc.jones@crick.ac.uk\n\n## Publications\nClippy has been used in multiple publications:\n\nKuret, K., Amalietti, A. G., Jones, D. M., Capitanchik, C., \u0026 Ule, J. (2022). Positional motif analysis reveals the extent of specificity of protein-RNA interactions observed by CLIP. Genome biology, 23(1), 1-34.\n\nVarier, R. A., Sideri, T., Capitanchik, C., Manova, Z., Calvani, E., Rossi, A., ... \u0026 van Werven, F. (2022). m6A reader Pho92 is recruited co-transcriptionally and couples translation efficacy to mRNA decay to promote meiotic fitness in yeast. eLife\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulelab%2Fclippy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fulelab%2Fclippy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulelab%2Fclippy/lists"}