{"id":20425016,"url":"https://github.com/databio/bedshift_analysis","last_synced_at":"2026-05-31T22:31:31.191Z","repository":{"id":86287768,"uuid":"357922910","full_name":"databio/bedshift_analysis","owner":"databio","description":null,"archived":false,"fork":false,"pushed_at":"2021-07-09T11:51:24.000Z","size":5722,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-11T10:16:11.375Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/databio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-14T13:53:04.000Z","updated_at":"2021-12-19T02:25:41.000Z","dependencies_parsed_at":"2023-03-13T09:28:29.948Z","dependency_job_id":null,"html_url":"https://github.com/databio/bedshift_analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/databio/bedshift_analysis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedshift_analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedshift_analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedshift_analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedshift_analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/databio","download_url":"https://codeload.github.com/databio/bedshift_analysis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/databio%2Fbedshift_analysis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33752286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T07:12:02.283Z","updated_at":"2026-05-31T22:31:31.175Z","avatar_url":"https://github.com/databio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bedshift analysis\n\n## Contents of the repository\n\nThis repository contains results from [Bedshift: perturbation of genomic interval sets](https://doi.org/10.1101/2020.11.11.378554), a paper describing application of the [bedshift](http://bedshift.databio.org) tool to explore the behavior of region set similarity metrics on simulated data.\n\nThere are 4 experiment folders. Each of these is organized as a [PEP](http://pep.databio.org) that includes a CSV file containing bedshift parameter sets:\n\n- [pep_demo](pep_demo) provides a quick demo that runs quickly for testing purposes.\n- [pep_basic](pep_basic) provides a set of 36 parameter sets to test combinatorial perturbations on a single input file\n- [pep_main](pep_main) runs the same parameter sets, but across 3 different input files\n- [pep_universe](pep_universe) runs the same parameter sets with the original file from pep_basic, but using 3 different universes\n\nFor each experiment, you will find a table of perturbations, with one row per perturbation, and a configuration file. For example, in the `pep_main` project, look at [sample_table.csv](/pep_main/sample_table.csv). The [config file](/pep_main/config.yaml) points to this.\n\nThis repository also contains 2 pipeline interface files the [piface_bedshift.yaml](piface_bedshift.yaml), describes how to run `bedshift` on the specified bed file with the listed perturbation parameters; and the [piface_similarity_scores.yaml](piface_similarity_scores.yaml) file runs our included similarity score calculation script (in [src](/src)).\n\n\n## Setup and configuration\n\n### Refgenie setup\n\nTo use refgenie to grab the chrom sizes file, just do:\n\n```\npip install refgenie\nexport REFGENIE=\"refgenie_config.yaml\"\nrefgenie init -c $REFGENIE\nrefgenie pull hg38/fasta\n```\n\n### Environment setup\n\n```\nexport CODE=/path/to/directory\n```\n\nWhere `directory` is the directory containing this repo, `bedshift_analysis`.\n\n## Demo\n\n### Run bedshift\n\nStart with the PEP in the [pep_demo](/pep_demo) folder. Everything needed to run this is stored in this repository, and it's a short example to show that you have everything set up correctly. Run it like this.\n\n```\nlooper run pep_demo/project_config.yaml\n```\n\nOr, to use bulker, just do `looper run pep_demo/project_config.yaml -p bulker_local`. This will produce an output folder in `results/pep_demo/bedshifted_regions`. \n\n### Aggregate bedshift scores\n\nAfter this is complete, you can aggregate the results and generate the scores with `runp` like this:\n\n```\nlooper runp project_config.yaml\n```\n\nOr `looper runp project_config.yaml -p bulker_local`. This will create score files in `pep_demo/results/scores`.\n\n### Make plots\n\nOnce the scores are generated, you can reproduce the plots we made in the paper by following the R script in [src/plot_summary_results.R](src/plot_summary_results.R).\n\n## Real project\n\nThere are 3 real projects in here:\n\n- `pep_basic` is a basic run, which tests a single BED file against a single universe\n- `pep_main` is the main project, which tests 3 BED files, using just 1 universe.\n- `pep_universe` tests how different universes behave, using 1 BED file and 3 different universes.\n\nAll projects can be run the same way.\n\n### Download data\n\nRun `./src/download_data.sh` to download all the data required to run all projects.\n\nThe bed files will be downloaded from bedbase:\n\n- CTCF TF ChIP-seq on human HCT116: http://bedbase.org/bedsplash/713f58a6497a9168a326123919672ebe\n- H3K4me3 Histone ChIP-seq on human GM12864 http://bedbase.org/bedsplash/0f84fea95b736ec99914bc66e74ab6e0\n- DNase-seq on human stromal cell of bone marrow: http://bedbase.org/bedsplash/c75ea5133f825d779a02be41a529342e\n\nThe universes are downloaded from SCREEN:\n\n- https://screen.encodeproject.org/\n\nYou can also see more information about the universes at [bedbase.org](http://bedbase.org): \n\n- 1) GRCh38-ccREs (Primary universe, http://bedbase.org/bedsplash/f31d4aa5e499637f28a338d5768e4ad5)\n- 2) DNase-H3K4me3 (http://bedbase.org/bedsplash/87d8e916fc910254aa61d5a1611622b7)\n- 3) CTCF-only (http://bedbase.org/bedsplash/fd007af75d7d0ef5c2e76d8e94916dcf)\n- 4) PLS (http://bedbase.org/bedsplash/d0b72e9adfeaf3ec37f303f16ad36bc4)\n\n\n### Run bedshift\n\n Here are some example looper run commands:\n\n```\nlooper run pep_main/project_config.yaml -p local\nlooper run pep_universe/project_config.yaml -p local\nlooper run pep_universe/project_config.yaml --lumpn 10\nlooper run pep_universe/project_config.yaml --lumpn 10 --sel-attr trial --sel-incl 0 1 2\n```\n\n### Aggregate results\n\n```\nlooper runp pep_universe/project_config.yaml -l 1 -p local\nlooper runp pep_main/project_config.yaml\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Fbedshift_analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatabio%2Fbedshift_analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatabio%2Fbedshift_analysis/lists"}