{"id":17338220,"url":"https://github.com/lcwong0928/hotspot-prediction","last_synced_at":"2025-03-27T08:13:34.097Z","repository":{"id":129689237,"uuid":"484808322","full_name":"lcwong0928/hotspot-prediction","owner":"lcwong0928","description":"A machine learning tool that uses DNA sequence and epigenetic data to enhance recombination hotspot predictions.","archived":false,"fork":false,"pushed_at":"2022-04-23T17:30:35.000Z","size":2731,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-01T13:13:09.389Z","etag":null,"topics":["epigenetics","machine-learning","recombination-hotspot-prediction"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lcwong0928.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-23T17:07:18.000Z","updated_at":"2022-04-23T17:20:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"753b6be8-e42f-426f-839b-1ad497ac03d2","html_url":"https://github.com/lcwong0928/hotspot-prediction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcwong0928%2Fhotspot-prediction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcwong0928%2Fhotspot-prediction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcwong0928%2Fhotspot-prediction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lcwong0928%2Fhotspot-prediction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lcwong0928","download_url":"https://codeload.github.com/lcwong0928/hotspot-prediction/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245806458,"owners_count":20675298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["epigenetics","machine-learning","recombination-hotspot-prediction"],"created_at":"2024-10-15T15:37:33.315Z","updated_at":"2025-03-27T08:13:34.076Z","avatar_url":"https://github.com/lcwong0928.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Epigenetic Data Boosts the Accuracy of Recombination Hotspot Prediction by Machine Learning Models\n\n## Introduction\n\nGenetic recombination plays an integral role in generating genetic diversity in a population, but the mechanisms of the\nprocesses governing double-strand break (DSB) formation and subsequent ligation remain poorly understood. Recent\nadvances in machine learning as applied to genetic data have demonstrated an ability to predict the location of\nrecombination hotspots in the genome based on raw DNA sequences. However, these models neglect potential contributing\nfactors from epigenetic marks and chromatin structure. Specifically, H3K4me3 and H3K36me3 are known to be correlated\nwith the activity of PRDM9, a zinc finger protein that plays a role in determining sites of recombination in humans and\nmice, and open chromatin structure is required for the activity of the DSB-forming protein, Spo11. Furthermore, some\ncorrelation may exist between hotspot regions and SNP density. We demonstrate using simple classification models that\nthe accuracy of hotspot prediction is significantly improved with the inclusion of ChIP-Seq epigenomic data, DNase\nhypersensitivity data, and Single Nucleotide Polymorphism (SNP) density data. A similar trend was observed in our deep\nlearning model consisting of a hybrid deep convolutional and recurrent neural network trained on the new datasets as\nadded features. This allowed us to produce a comprehensive predictive model for locations of hotspots in the human\ngenome. Concurrently, we utilized the Gibbs sampling motif discovery technique in an attempt to discover binding motifs\nfor Spo11 and PRDM9. These results combined will help shed light on the mechanisms of recombination and set the stage\nfor better informed GWAS and linkage analysis studies.\n\n## Links\n\n[Report](https://github.com/lcwong0928/hotspot-prediction/blob/main/results/report.pdf) \\\n[Presentation](https://github.com/lcwong0928/hotspot-prediction/blob/main/results/presentation.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flcwong0928%2Fhotspot-prediction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flcwong0928%2Fhotspot-prediction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flcwong0928%2Fhotspot-prediction/lists"}