{"id":20074705,"url":"https://github.com/greenelab/snorkeling","last_synced_at":"2025-05-05T21:32:16.083Z","repository":{"id":54739993,"uuid":"78030624","full_name":"greenelab/snorkeling","owner":"greenelab","description":"Extracting biomedical relationships from literature with Snorkel 🏊","archived":false,"fork":false,"pushed_at":"2021-02-01T14:25:37.000Z","size":342324,"stargazers_count":58,"open_issues_count":7,"forks_count":17,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-05-02T06:00:23.649Z","etag":null,"topics":["analysis","dataset","hetnet","machine-learning","methodology","nlp","script","snorkel","text-mining","tool","workflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/greenelab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-BSD.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-04T16:03:06.000Z","updated_at":"2024-05-02T06:00:23.650Z","dependencies_parsed_at":"2022-08-14T01:20:35.409Z","dependency_job_id":null,"html_url":"https://github.com/greenelab/snorkeling","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fsnorkeling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fsnorkeling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fsnorkeling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/greenelab%2Fsnorkeling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/greenelab","download_url":"https://codeload.github.com/greenelab/snorkeling/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224470634,"owners_count":17316704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","dataset","hetnet","machine-learning","methodology","nlp","script","snorkel","text-mining","tool","workflow"],"created_at":"2024-11-13T14:53:42.000Z","updated_at":"2024-11-13T14:53:42.754Z","avatar_url":"https://github.com/greenelab.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Snorkeling\n\nThis repository stores data and code to scale up the extraction of biomedical relationships (i.e. Disease-Gene associations, Compounds binding to Genes, Gene-Gene interactions etc.) from the Pubmed Abstracts. \n\n## Depreciation Note\n\nAn updated version of this project can be found at: [greenelab/snorkeling-full-text](https://github.com/greenelab/snorkeling-full-text). New changes pertaining to the repository can be found at the link provided previously.\n\n## Quick Synopsis\nThis work uses a subset of [Hetionet v1](https://doi.org/cdfk) (bolded in the resource schema below), which is a heterogenous network that contains pharmacological and biological information in the form of nodes and edges. \nThis network was made from publicly available data, which is usually populated via manual curation.\nManual curation is time consuming and difficult to scale as the rate of publications continues to rise.\nA recently introduced \"[Data Programming](https://arxiv.org/abs/1605.07723v3)\" paradigm can circumvent this issue by being able to generate large annotated datasets quickly.\nThis paradigm combines distant supervision with simple rules and heuristics written as labeling functions to automatically annotate large datasets.\nUnfortunately, it takes a significant amount of time and effort to write a useful label function.\nBecause of this fact, we aimed to speed up this process by re-using label functions across edge types.\nRead the full paper [here](https://greenelab.github.io/text_mined_hetnet_manuscript/).\n\n![Highlighted edges used in Hetionet v1](https://raw.githubusercontent.com/greenelab/text_mined_hetnet_manuscript/3a040e78114208417d2b1784ae558fb323eabe01/content/images/figures/hetionet/metagraph_highlighted_edges.png \"Metagraph of Hetionet v1\")\n\n## Directories\n\nDescribed below are the main folders for this project. \nFor convention the folder names are based on the schema shown above. \n\n| Name | Descirption |\n| ---- | ---- | \n| [compound_disease](https://github.com/greenelab/snorkeling/tree/master/compound_disease) | Head folder that contains all relationships compounds and diseases may share |\n| [compound_gene](https://github.com/greenelab/snorkeling/tree/master/compound_gene) | Head folder that contains all relationships compounds and genes may share | \n| [disease_gene](https://github.com/greenelab/snorkeling/tree/master/disease_gene) | Head folder that contains all realtionships disease and genes may share |\n| [gene_gene](https://github.com/greenelab/snorkeling/tree/master/gene_gene) | Head folder than contains all realtionships genes may share with each other |\n| [dependency cluster](https://github.com/greenelab/snorkeling/tree/master/dependency_cluster) | This folder contains preprocessed results from the [\"A global network of biomedical relationships derived from text\"](https://zenodo.org/record/1495808#.XUmlR_wpBrk) paper.\n| [figures](https://github.com/greenelab/snorkeling/tree/master/figures) | This folder contains figures for this work |\n| [modules](https://github.com/greenelab/snorkeling/tree/master/modules/utils) | This folder contains helper scripts that this work uses |\n| [playground](https://github.com/greenelab/snorkeling/tree/master/playground) | This folder contains ancient code designed to test and understand the snorkel package. |\n\n## Installing/Setting Up The Conda Environment\n\nSnorkeling uses [conda](http://conda.pydata.org/docs/intro.html) as a python package manager. Before moving on to the instructions below, please make sure to have it installed. [Download conda here!!](https://www.continuum.io/downloads)\n  \nOnce everything has been installed, type following command in the terminal: \n\n```bash\nconda env create --file environment.yml\n``` \n\nYou can activate the environment by using the following command: \n\n```bash\nsource activate snorkeling\n```  \n\n_Note_: If you want to leave the environment, just enter the following command:\n\n```bash\nsource deactivate \n```\n\n## License\n\nThis repository is dual licensed as [BSD 3-Clause](LICENSE-BSD.md) and [CC0 1.0](LICENSE-CC0.md), meaning any repository content can be used under either license. This licensing arrangement ensures source code is available under an [OSI-approved License](https://opensource.org/licenses/alphabetical), while non-code content — such as figures, data, and documentation — is maximally reusable under a public domain dedication.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fsnorkeling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgreenelab%2Fsnorkeling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgreenelab%2Fsnorkeling/lists"}