{"id":18598236,"url":"https://github.com/cmdoret/hic_scrambler","last_synced_at":"2025-05-16T14:11:09.030Z","repository":{"id":47044605,"uuid":"168626164","full_name":"cmdoret/HiC_scrambler","owner":"cmdoret","description":"Introducing structural variations in Hi-C contact maps to try detecting and reversing them.","archived":false,"fork":false,"pushed_at":"2021-09-15T23:09:43.000Z","size":6618,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-17T23:47:43.321Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmdoret.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-02-01T01:52:11.000Z","updated_at":"2021-10-28T09:53:08.000Z","dependencies_parsed_at":"2022-09-24T21:30:43.917Z","dependency_job_id":null,"html_url":"https://github.com/cmdoret/HiC_scrambler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmdoret%2FHiC_scrambler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmdoret%2FHiC_scrambler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmdoret%2FHiC_scrambler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmdoret%2FHiC_scrambler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmdoret","download_url":"https://codeload.github.com/cmdoret/HiC_scrambler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254544159,"owners_count":22088808,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T01:31:45.548Z","updated_at":"2025-05-16T14:11:09.010Z","avatar_url":"https://github.com/cmdoret.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hi-C scrambled maps generator\n\n\u003e This is a WIP. Here is the state of the different features:\n\n* [x] Boilerplate for editing genomes and generating matrices.\n* [x] Storing SV positions and windows.\n* [X] Storing pairs of whole maps before and after scrambling\n* [ ] Implementing all SV types (only inversions and deletions for now)\n* [ ] Generating features from [BAM](https://samtools.github.io/hts-specs/SAMv1.pdf) alignments\n\nThis repo contains a program to generate scrambled Hi-C maps. The program starts from an input genome and Hi-C library (reads) and introduces structural variants into the genome. Structural variants (SV) are large scale alteration to the sequence including:\n\n* Deletion: Chunk of sequence removed\n* Insertion: New chunk of sequence introduced\n* Inversion: Chunk of sequence flipped\n* Translocation: Chunk of sequence moved from one place to another\n* Duplication: Chunk of sequence copied to a different position.\n\nThese alterations can be happen sequentially and be superimposed on each other, which result in \"complex events\".\n\nThe simplest approach to generating scrambled maps would be to directly reorder rows / columns of the matrix, but this would not accurately replicate the artifacts visible in actual SV due to read alignments.\n\n# Setup\n\nTo install python dependencies, you can use the requirements.txt file as follows:\n\n```bash\npip install --user -r requirements.txt\n```\n\nTo setup install the project as a local package, run:\n\n```bash\npip install --user -e .\n```\n\n## Usage\n\nThe pipeline requires a genome (fasta format) and a Hi-C library (fastq format). \n\nA json configuration file is provided to define profiles. These profiles dictate the type of SV to generate and their properties (size, frequency, ...).\n\n## Output\n\nThe pipeline will generate an output directory containing multiple files.\nEach run will run on a random subset of the input genome. It will have its own subdirectory containing the matrix before and after scrambling, as well as the list of SV applied and zoom on the concerned regions.\n\nThe root output directory will contain the following files combining all runs:\n* `x.npy`: A 3D numpy array (npy) file containing windows around each SV as well as windows around random positions without SV (50/50).\n* `y.npy`: A 1D numpy array containing labels corresponding to the windows with the following encoding: 0=no SV, 1=INV, 2=DEL, 3=INS.\n* `truth.npy`: A 3D numpy array containing the matrix of each run's random region before scrambling.\n* `scrambled.npy`: Same, but after scrambling. Each map is added a bottom and right 0-padding to retain the same dimensions despite deletions.\n\n## Test dataset\n\nHi-C reads from a hybrid S.cerevisiae - S. paradoxus library. Reads are mapped to chromosome 4 using bowtie2 with very-fast-local preset and extracted all reads mapping to the chromosome.\n\nThe matrices were then generated using those reads. The \"original\" matrix does not contain any structural variation.\n\nAll matrices are generated with hicstuff using parameters:\n`hicstuff pipeline -t 12 -P original -e 1000 -f genome.fa -m aligned_for.fq.gz aligned_rev.fq.gz -o output`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmdoret%2Fhic_scrambler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmdoret%2Fhic_scrambler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmdoret%2Fhic_scrambler/lists"}