{"id":22246718,"url":"https://github.com/lucacappelletti94/mesh","last_synced_at":"2026-01-27T10:07:09.746Z","repository":{"id":264968140,"uuid":"870580573","full_name":"LucaCappelletti94/mesh","owner":"LucaCappelletti94","description":"Python package helping to work with the MESH dataset.","archived":false,"fork":false,"pushed_at":"2024-10-11T09:26:51.000Z","size":39,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-27T02:12:03.897Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LucaCappelletti94.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-10T09:50:42.000Z","updated_at":"2024-10-11T09:26:55.000Z","dependencies_parsed_at":"2024-11-27T02:12:05.405Z","dependency_job_id":"9070909d-7b2f-4289-a59e-08e47b19e518","html_url":"https://github.com/LucaCappelletti94/mesh","commit_stats":null,"previous_names":["lucacappelletti94/mesh"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fmesh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fmesh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fmesh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LucaCappelletti94%2Fmesh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LucaCappelletti94","download_url":"https://codeload.github.com/LucaCappelletti94/mesh/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227860752,"owners_count":17830871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-03T05:30:14.558Z","updated_at":"2026-01-27T10:07:04.717Z","avatar_url":"https://github.com/LucaCappelletti94.png","language":"Python","readme":"# MESH\n\nPython package helping to work with the [MESH dataset](https://www.ncbi.nlm.nih.gov/mesh/). This package is currently primarily focused on [the chemicals and drugs category of the MESH dataset](https://www.ncbi.nlm.nih.gov/mesh/1000068) and integrates the associated [PubChem database](https://pubchem.ncbi.nlm.nih.gov/) SMILES and InChI keys.\n\n## Installation\n\nAt this moment, the package is not available on PyPI. To install it, you can clone the repository and install it using `pip`:\n\n```bash\npip install .\n```\n\n## Usage\n\nThe package provides two main functionalities: downloading a pre-built MESH dataset and generating a custom MESH dataset. Once you have the dataset, you can use the `Dataset` class to work with it.\n\n### Downloading a pre-built MESH dataset\n\nWhile this package allows you to build a custom MESH dataset, since building the dataset requires reources, we also provide pre-built datasets which [we host on Zenodo](). The structure of any of the hosted tarballs is as follows:\n\n```\nmesh_chemistry_2024.tar.gz\n├── chemicals.csv\n├── descriptors.csv\n├── chemicals_to_descriptors.csv\n├── mesh_dag.csv\n├── metadata.json\n```\n\nWhere (you can see examples of these files just below):\n- `chemicals.csv` contains information about chemicals and drugs.\n- `descriptors.csv` contains information about descriptors.\n- `chemicals_to_descriptors.csv` contains the relationships between chemicals and descriptors.\n- `mesh_dag.csv` contains the Directed Acyclic Graph (DAG) of the MESH dataset.\n- `metadata.json` contains metadata about the dataset.\n\nTo download a pre-built dataset, you can use the following code:\n\n```python\nfrom mesh import Dataset\n\ndataset = Dataset.load(\"mesh_chemistry_2024\")\n```\n\nFind the available rasterized datasets [on Zenodo]().\n\nHere's some statistics regarding the rasterized MESH datasets, all created with the same settings described in the next section:\n\n| Version name | Number of nodes | Number of edges | Number of chemicals  | Number of descriptors |\n|--------------|-----------------|-----------------|----------------------|-----------------------|\n| MESH 2024    | 334220          | 367694          | 323679               | 10542                 |\n| MESH 2023    | 332999          | 365801          | 322591               | 10409                 |\n| MESH 2022    | 330106          | 364653          | 319739               | 10367                 |\n| MESH 2021    | 328884          | 363505          | 318391               | 10325                 |\n\n### Generating a custom MESH dataset\n\nThe package provides a `Dataset` class that allows you to work with the MESH dataset. The dataset is built using the `DatasetSettings` class, which allows you to specify which parts of the dataset you want to include. The `ChemicalsAndDrugsSettings` class allows you to specify which parts of the chemicals and drugs category you want to include.\n\nParticularly helpful, is the ability to include SMILES and InChI keys for the chemicals and drugs. This is done by specifying the `include_smiles` and `include_inchi_keys` methods of the `ChemicalsAndDrugsSettings` class.\n\n```python\nfrom mesh.settings import DatasetSettings, ChemicalsAndDrugsSettings\nfrom mesh import Dataset\n\n\ndef build_mesh_chemistry_2024() -\u003e Dataset:\n    \"\"\"Build MESH 2024 dataset.\"\"\"\n    # First, we need to define the settings for the dataset.\n    cad: ChemicalsAndDrugsSettings = (\n        ChemicalsAndDrugsSettings()\n        # In this case, we are including all of the submodules of\n        # categories of chemicals and drugs.\n        .include_all_submodules()\n        # We also want to include SMILES, which we obtain from the\n        # PUBCHEM database.\n        .include_smiles()\n        # Analogously, we want to include InChI keys, which we obtain\n        # from the PUBCHEM database.\n        .include_inchi_keys()\n    )\n    settings = (\n        # We are using the MESH 2024 version.\n        DatasetSettings(version=2024)\n        # We want to retrieve data only regarding chemicals and drugs.\n        .include_chemicals_and_drugs(cad)\n        # And we want to print the progress of the dataset retrieval.\n        .set_verbose(True)\n    )\n    # Now, we build the dataset. This will download the necessary files\n    # and rasterize the dataset.\n    dataset = Dataset.build(settings)\n    return dataset\n\n\nif __name__ == \"__main__\":\n    # We build the MESH 2024 dataset.\n    mesh_chemistry_2024: Dataset = build_mesh_chemistry_2024()\n    # And we save it to disk.\n    mesh_chemistry_2024.save(\"mesh_chemistry_2024\", tarball=False)\n```\n\n#### Resulting CSVs\n\nThe resulting CSVs will be saved in the `mesh_chemistry_2024` directory. The directory will contain the following CSVs:\n\n##### `chemicals.csv`\n\n|unique_identifier|name                                   |compound_id|substance_id|smiles                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |inchi                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |inchikey                   |\n|-----------------|---------------------------------------|-----------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|\n|C000002          |bevonium                               |31800.0    |500762995.0 |C[N+]1(CCCCC1COC(=O)C(C2=CC=CC=C2)(C3=CC=CC=C3)O)C                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |InChI=1S/C22H28NO3/c1-23(2)16-10-9-15-20(23)17-26-21(24)22(25,18-11-5-3-6-12-18)19-13-7-4-8-14-19/h3-8,11-14,20,25H,9-10,15-17H2,1-2H3/q+1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |UHUMRJKDOOEQIG-UHFFFAOYSA-N|\n|C000009          |N-acetylglucosaminylasparagine         |123826.0   |500203198.0 |CC(=O)N[C@@H]1[C@H]([C@@H]([C@H](O[C@H]1NC(=O)C[C@@H](C(=O)O)N)CO)O)O                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |InChI=1S/C12H21N3O8/c1-4(17)14-8-10(20)9(19)6(3-16)23-11(8)15-7(18)2-5(13)12(21)22/h5-6,8-11,16,19-20H,2-3,13H2,1H3,(H,14,17)(H,15,18)(H,21,22)/t5-,6+,8+,9+,10+,11+/m0/s1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |YTTRPBWEMMPYSW-HRRFRDKFSA-N|\n|C000011          |5-(n-acetaminophenylazo)-8-oxyquinoline|114081.0   |484035752.0 |CC(=O)NC1=CC=C(C=C1)N=NC2=C3C=CC=NC3=C(C=C2)O                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |InChI=1S/C17H14N4O2/c1-11(22)19-12-4-6-13(7-5-12)20-21-15-8-9-16(23)17-14(15)3-2-10-18-17/h2-10,23H,1H3,(H,19,22)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |DKRPSSOODLBKPQ-UHFFFAOYSA-N|\n|C000015          |N-acetyl-L-arginine                    |67427.0    |500710457.0 |CC(=O)N[C@@H](CCCN=C(N)N)C(=O)O                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |InChI=1S/C8H16N4O3/c1-5(13)12-6(7(14)15)3-2-4-11-8(9)10/h6H,2-4H2,1H3,(H,12,13)(H,14,15)(H4,9,10,11)/t6-/m0/s1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |SNEIUMQYRCDYCH-LURJTMIESA-N|\n|C000020          |N-acetylneuraminoyllactose             |           |489852514.0 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                           |\n|C000021          |acetylnovadral                         |           |            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                           |\n\n\n##### `descriptors.csv`\n\n| unique_identifier | name                              | compound_id    | substance_id   | smiles                                                                                                        | inchikey                         |\n|-------------------|-----------------------------------|----------------|----------------|---------------------------------------------------------------------------------------------------------------|----------------------------------|\n| D000001           | Calcimycin                        | 139593372.0    | 500766157.0    | C[C@@H]1CCC2([C@H](C[C@@H]([C@@H](O2)C(C)C(=O)C3=CC=CN3)C)C)O[C@@H]1CC4=NC5=C(O4)C=CC(=C5C(=O)O)NC             | HIYAVKIYRIFSCZ-LGHBZWQHSA-N      |\n| D000002           | Temefos                           | 5392.0         | 500974612.0    | COP(=S)(OC)OC1=CC=C(C=C1)SC2=CC=C(C=C2)OP(=S)(OC)OC                                                            | WWJZWCUNLNYYAU-UHFFFAOYSA-N      |\n| D000017           | ABO Blood-Group System            |                |                |                                                                                                               |                                  |\n| D000019           | Abortifacient Agents              |                |                |                                                                                                               |                                  |\n| D000020           | Abortifacient Agents, Nonsteroidal |                |                |                                                                                                               |                                  |\n| D000021           | Abortifacient Agents, Steroidal   |                |                |                                                                                                               |                                  |\n| D000036           | Abrin                             |                | 486451862.0    |                                                                                                               |                                  |\n| D000040           | Abscisic Acid                     | 5702609.0      | 500195639.0    | CC1=CC(=O)CC([C@]1(/C=C/C(=C/C(=O)O)/C)O)(C)C                                                                  | JLIDBLDQVAYHNE-IBPUIESWSA-N      |\n\n\n\n##### `chemicals_to_descriptors.csv`\n\n| chemical | descriptor |\n|----------|------------|\n| C000002  | D001561    |\n| C000006  | D061389    |\n| C000009  | D000117    |\n| C000011  | D015125    |\n| C000015  | D001120    |\n| C000020  | D007785    |\n\n\n##### `mesh_dag.csv`\n\n| parent  | child      |\n|---------|------------|\n| D000001 | D000095662 |\n| D000001 | D001583    |\n| D000002 | D063086    |\n| D000017 | D001789    |\n| D000019 | D012102    |\n| D000020 | D000019    |\n| D000021 | D000019    |\n\n##### `metadata.json`\n\n```json\n{\n    \"version\": {\n        \"version\": 2024,\n        \"descriptors\": \"https://nlmpubs.nlm.nih.gov/projects/mesh/2024/asciimesh/20240101/d2024.bin\",\n        \"chemicals\": \"https://nlmpubs.nlm.nih.gov/projects/mesh/2024/asciimesh/20240101/c2024.bin\"\n    },\n    \"roots\": [\n        {\n            \"root\": \"Chemicals and Drugs\",\n            \"included_codes\": [\n                \"D01\",\n                \"D02\",\n                \"D03\",\n                \"D04\",\n                \"D05\",\n                \"D06\",\n                \"D08\",\n                \"D09\",\n                \"D10\",\n                \"D12\",\n                \"D13\",\n                \"D20\",\n                \"D23\",\n                \"D25\",\n                \"D26\",\n                \"D27\"\n            ],\n            \"include_smiles\": true\n        }\n    ],\n    \"downloads_directory\": \"downloads\"\n}\n```\n\n### To NetworkX\n\nSince the MESH dataset is a Directed Acyclic Graph (DAG), you can convert it to a NetworkX graph. This is done by calling the `to_networkx` method of the `Dataset` class.\n\n```python\nimport networkx as nx\n\n# We convert the MESH dataset to a NetworkX graph.\ngraph: nx.DiGraph = mesh_chemistry_2024.to_networkx()\n\n# Now, we can use the NetworkX graph as we would any other NetworkX graph.\nprint(nx.info(graph))\n```\n\nIn this case, the output will be:\n\n```\nDiGraph with 334220 nodes and 367694 edges \n```\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucacappelletti94%2Fmesh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flucacappelletti94%2Fmesh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flucacappelletti94%2Fmesh/lists"}