{"id":20690252,"url":"https://github.com/merck/matcher","last_synced_at":"2025-07-25T03:04:14.122Z","repository":{"id":60582873,"uuid":"526592063","full_name":"Merck/matcher","owner":"Merck","description":"Matcher is a tool for understanding how chemical structure optimization problems have been solved. Matcher enables deep control over searching structure/activity relationships (SAR) derived from large datasets, and takes the form of an accessible web application with simple deployment. Matcher is built around the mmpdb platform.","archived":false,"fork":false,"pushed_at":"2024-02-21T15:13:53.000Z","size":7775,"stargazers_count":54,"open_issues_count":0,"forks_count":10,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-09-26T02:01:31.672Z","etag":null,"topics":["chemistry","docker-compose","drug-discovery","search-algorithm","search-engine","web-application"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Merck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-19T12:07:20.000Z","updated_at":"2024-08-12T10:13:02.000Z","dependencies_parsed_at":"2023-02-09T12:31:53.715Z","dependency_job_id":"7861639a-381d-4f24-bfe9-382682d736b6","html_url":"https://github.com/Merck/matcher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2Fmatcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2Fmatcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2Fmatcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2Fmatcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Merck","download_url":"https://codeload.github.com/Merck/matcher/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224979003,"owners_count":17401803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemistry","docker-compose","drug-discovery","search-algorithm","search-engine","web-application"],"created_at":"2024-11-16T23:12:24.427Z","updated_at":"2024-11-16T23:12:25.130Z","avatar_url":"https://github.com/Merck.png","language":"Python","readme":"![matcher search example](https://github.com/Merck/matcher/blob/main/frontend/examples/1.png?raw=True)\n[Publication: Matcher: An Open-Source Application for Translating Large Structure/Property Data Sets into Insights for Drug Design](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.3c00015)\u003cbr\u003e\n[Free preprint of above publication](https://chemrxiv.org/engage/chemrxiv/article-details/63586c15aca19850f7e53e55)\n\nMatcher is a tool for understanding how chemical structure optimization problems have been solved.\n\nMatcher enables deep control over searching structure/activity relationships (SAR) derived from large datasets, and takes the form of an accessible web application with simple deployment.\n\nMatcher is built around the [mmpdb](https://github.com/rdkit/mmpdb) platform for matched molecular pair (MMP) analysis. Matcher extends the mmpdb data model, introduces new search algorithms, provides a backend API for executing queries and fetching results, and provides a frontend user interface.\n\n# Table of Contents\n\n0. [System Requirements](#system_requirements)\n1. [Quick Start](#quick_start)\n2. [Run Example Query](#run_example_query)\n3. [Data Included](#data_included)\n4. [Use Different Data](#use_different_data)\n5. [Metadata Information](#metadata_info)\n6. [OpenAPI for Backend](#backend_OpenAPI)\n7. [Using mmpdb Commands](#mmpdb_commands)\n\n# System Requirements \u003ca id=\"system_requirements\"\u003e\u003c/a\u003e\n\n- Computer with x86 processor (i.e. most desktop/laptop computers except Apple M-series)\n    - ARM processors are not yet supported, due to certain dependencies\n\n# Quick Start \u003ca id=\"quick_start\"\u003e\u003c/a\u003e\n\nClone this repository, and the mmpdb submodule (with `git submodule init` and `git submodule update`).\n\nNavigate to the parent matcher directory, then execute:\n\n```\ndocker-compose up\n```\n\nThree containers will be launched:\n\n1. database container, which contains the PostgreSQL database\n2. backend container, which is controlled through a FastAPI accessible at localhost:8001\n3. frontend container, which hosts a Dash app through a Flask API, accessible at localhost:8000\n\nThe rate-determining step is for the backend container to use mmpdb commands to process input data, and write resulting MMP data to the database. This process is complete when the below output is observed in the docker-compose console:\n\n```\nINFO:uvicorn.error:Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)\n```\n\nAfter the above is observed, in a web browser (Google Chrome or Microsoft Edge), navigate to localhost:8000. The Matcher query interface should load after a few seconds.\n\n# Run Example Query \u003ca id=\"run_example_query\"\u003e\u003c/a\u003e\n\nMatcher query logic can be learned by example: click the \"Run Example Search\" link at the top of the matcher query frontend page (localhost:8000), which will direct to localhost:8000/examples.\n\nSeveral example inputs/outputs will be displayed. Upon clicking on an example, a new tab will open with the live Matcher interface, containing appropriately-populated input. Output results will load below the input.\n\nImportant: The example queries are only guaranteed to work with the example data provided in this package, because these queries specify properties that are present in the example data. If new data is used which has property names differing from the example data's property names, an exception will be thrown in the client js layer, and the example queries will not execute when loaded.\n\n# Data Included \u003ca id=\"data_included\"\u003e\u003c/a\u003e\n\nData is present in the backend/initialize_db directory.\n\n\u003cstrong\u003eQuick Start data (default)\u003c/strong\u003e: Data filenames begin with \"quick_start\". Contains 1089 ChEMBL compounds, the minimum to fully reproduce queries described in our publication (TODO: Add hyperlink here).\n\n\u003cstrong\u003eRapidly test/debug the deployment\u003c/strong\u003e: Data filenames begin with \"test\". Contains 16 ChEMBL compounds, a subset of the Quick Start 1078 compounds, for the purpose of rapid testing during development or troubleshooting. All example queries work, but return only a few results.\n\n\u003cstrong\u003eFull ChEMBL dataset\u003c/strong\u003e: Data filenames begin with \"ChEMBL_CYP3A4_hERG\". Contains 20267 ChEMBL compounds having CYP3A4 inhibition and/or hERG inhibition data, which were included with the [mmpdb publication](https://pubs.acs.org/doi/10.1021/acs.jcim.8b00173). A superset of the Quick Start data.\n\n# Use Different Data \u003ca id=\"use_different_data\"\u003e\u003c/a\u003e\n\nThe default input compound/property dataset is intentionally very small, so that the containers will initialize quickly for demo purposes.\n\nTo use arbitrary data, follow the below steps.\n\nAs an example, we illustrate how to use a \"medium-size\" dataset containing 20,267 compounds taken from the [mmpdb publication](https://pubs.acs.org/doi/10.1021/acs.jcim.8b00173), and referenced in [our publication](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.3c00015).\n\n1. Add raw data to the matcher/backend/initialize_db directory. Two files are required, a third file is optional. All files must begin with the same identifier: `your_dataset_name`, which in this example is `ChEMBL_CYP3A4_hERG`:\n    * **Required**: File containing compound SMILES and compound IDs.\n        * For this example, ChEMBL_CYP3A4_hERG_structures.smi is already included.\u003cbr\u003e\u003c/br\u003e\n    * **Required**: File containing compound IDs and property values.\n        * For this example, ChEMBL_CYP3A4_hERG_props.txt is already included.\u003cbr\u003e\u003c/br\u003e\n    * **Optional**: File containing metadata about the compound property data (whether the data is log transformed, the units, and how the data should be displayed to users).\n        * For this example, ChEMBL_CYP3A4_hERG_metadata.csv is already included. If you do not wish to provide metadata, edit out the `--metadata` argument from this line of code in `entrypoint.sh`: `conda run --no-capture-output -n matcher-api python $MMPDB_DIR/mmpdb.py loadprops -p \"${properties}\" --metadata \"${metadata}\" \"$postgres_schema\\$postgres\" \u0026\u0026 \\`. If no metadata file is provided, then by default, property labels and data will be displayed to users exactly as provided in the above property value file, and changes between two properties will be treated as differences (B - A).\n\u003cbr\u003e\u003c/br\u003e\n\n2. Edit matcher/backend/entrypoint.sh by setting `DATASET=your_dataset_name`, using `your_dataset_name` from step 1. above.\n\n\u003cbr\u003e\u003c/br\u003e\n\n3. Recreate the containers:\n    * Navigate to the parent directory (matcher/), and execute the following commands (note that using the --volumes flag deletes your previous matcher DB data):\n\n```\ndocker-compose down --volumes \u0026\u0026 \\\ndocker-compose build \u0026\u0026 \\\ndocker-compose up --force-recreate\n```\n\nThis time, around 20 minutes will be required to build the database (depending on computer), due to the larger number of compounds in the input data as compared to the original Quick Start data.\n\n# Metadata Information \u003ca id=\"metadata_info\"\u003e\u003c/a\u003e\n\nOptionally, property metadata can be passed to the mmpdb loadprops command, for \nthe purpose of customizing how data is displayed to end users in matcher's web UI.\n\nIf metadata is not provided, then all data will be displayed as it exists in\nthe database, and all changes will be calculated and displayed as deltas (B-A).\n\nHere is an example property metadata table (values should be tab separated):\n\n```\n  property_name base  unit  display_name  display_base  display_unit  change_displayed\n  hERG_pIC50  negative_log  M hERG_IC50_uM  raw uM  fold-change\n```\n\nExample:\n\n  % mmpdb loadprops --properties hERG.csv --metadata hERG_medatadata.csv 'database$5432$postgres'\n\nIn the above case, hERG data is provided in the --properties file as a \nnegative log of molarity, but we provide additional metadata\nwhich causes the data to appear to users in the matcher web UI as micromolar IC50 values.\nThis is the most common base/unit conversion use case; not all conversions are supported.\n\nMetadata is stored in columns within the property_name table, and therefore\nis easy to modify (albeit manually with SQL) even after the database is built.\n\nSupported metadata options are below. All other values, or * value, will be\nconverted to default. None means that the column value will be NULL in property_name table.\n\n```\n  base [default=raw]: raw, log, negative_log \n  unit [default=None]: M, uM\n  display_name [default=property_name]: characters other than *\n  display_base [default=raw]: raw\n  display_unit [default=None]: uM, M\n  change_displayed [default=delta]: delta, fold-change\n```\n\n# OpenAPI for Backend \u003ca id=\"backend_OpenAPI\"\u003e\u003c/a\u003e\n\nMatcher's backend API can be used for querying and gathering results, independently of the frontend, if desired.\n\nThe backend API endpoints are documented in backend/openapi.json. This documentation can be viewed when the matcher application is running, at localhost:8001/docs\n\n# Using mmpdb Commands \u003ca id=\"mmpdb_commands\"\u003e\u003c/a\u003e\n\nThe matcher database is an extended mmpdb database, and is reverse-compatible with mmpdb commands.\n\nFor example, to run `mmpdb transform` with matcher's database, outputting results to `results.csv` within your local directory:\n\nFirst launch matcher as described in [Quick Start](#quick_start), then run this command:\n\n```\ndocker exec -it \\\n\"$(docker ps | grep 'matcher_backend' | awk '{ print $1 }')\" \\\nconda run -n matcher-api \\\npython /opt/mmpdb/mmpdb.py transform --smiles 'O=C1NC2=C(C=NC(OC)=N2)N=C1' 'public$postgres' \u003e results.csv\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmerck%2Fmatcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmerck%2Fmatcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmerck%2Fmatcher/lists"}