{"id":18524306,"url":"https://github.com/mdanalysis/rdkitconverter-benchmark","last_synced_at":"2026-01-23T20:39:31.600Z","repository":{"id":39878617,"uuid":"471816709","full_name":"MDAnalysis/RDKitConverter-benchmark","owner":"MDAnalysis","description":"Benchmark for the RDKitConverter's inferring of bond orders and charges","archived":false,"fork":false,"pushed_at":"2024-08-07T18:17:13.000Z","size":78787,"stargazers_count":3,"open_issues_count":2,"forks_count":0,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-10-29T17:32:42.677Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MDAnalysis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-19T21:29:47.000Z","updated_at":"2024-08-07T18:10:40.000Z","dependencies_parsed_at":"2024-04-15T07:34:27.556Z","dependency_job_id":"4590fb69-3609-4e2b-9b81-768e29c69b86","html_url":"https://github.com/MDAnalysis/RDKitConverter-benchmark","commit_stats":{"total_commits":34,"total_committers":1,"mean_commits":34.0,"dds":0.0,"last_synced_commit":"8b6e4f57dd9bc58f19952576dc7c3ed16ec80815"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDAnalysis%2FRDKitConverter-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDAnalysis%2FRDKitConverter-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDAnalysis%2FRDKitConverter-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDAnalysis%2FRDKitConverter-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MDAnalysis","download_url":"https://codeload.github.com/MDAnalysis/RDKitConverter-benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239242066,"owners_count":19605946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T17:40:31.635Z","updated_at":"2025-10-31T23:30:21.965Z","avatar_url":"https://github.com/MDAnalysis.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RDKitConverter benchmark\n\nThis repository benchmarks the ability of MDAnalysis' `RDKitConverter` to infer bond orders and charges from molecules with all hydrogens explicit.\n\nTo cite this repository, please use the following DOI:\n\n[![DOI](https://zenodo.org/badge/471816709.svg)](https://zenodo.org/badge/latestdoi/471816709)\n\n## Results\n\n![current accuracy](https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2FMDAnalysis%2FRDKitConverter-benchmark%2Fmain%2Fresults%2Fbadge.json)\n\n| Description | Value |\n| --- | --- |\n| **MDAnalysis version** | 2.4.3 |\n| **Accuracy** | 99.19% |\n| **Number of molecules fetched** | 2,372,174 |\n| **Number of molecules processed** | 2,166,327 |\n| **Number of molecules failed** | 17,577 |\n\nDetails on the benchmark can also be found [here](results/results.json).\n\nThe **interactive list of molecules** currently failing can be accessed [here](https://raw.githack.com/MDAnalysis/RDKitConverter-benchmark/main/results/failed_molecules.html) (click on a molecule's image to zoom in).\n\nFailing **scaffolds** can be accessed [here](https://raw.githack.com/MDAnalysis/RDKitConverter-benchmark/main/results/failed_scaffolds.html). The scaffold network used to\ncreate this file can be viewed [here](https://raw.githack.com/MDAnalysis/RDKitConverter-benchmark/main/results/scaffold_network.html).\n\n## Instructions\n\nRunning the benchmark requires conda (or mamba) on a Linux machine.\n\nStart by cloning this repository:\n```shell\ngit clone https://github.com/MDAnalysis/RDKitConverter-benchmark.git\ncd RDKitConverter-benchmark\n```\n\nThen install the python dependencies with `make install`:\n```shell\n# to speed things up you can use mamba:\nmake install CONDA=mamba\n```\nThis will create a separate conda environment called `rdkitconverter`.\n\nFinally, run the benchmark:\n```shell\nmake\n```\n\nRun `make help` to get a list of available commands.\n\nThe results are available in the `results/` directory:\n- `results.json`, a JSON file listing all the necessary information\n- `failed_molecules.smi`, a SMILES file containing the molecules that failed the test\n- `failed_molecules.html`, an interactive table displaying the failed molecules\n\n## Methods\n\nThe benchmark will fetch ChEMBL 33 as an SDF file and process the molecules the following way:\n- Discard molecules that could not be read or sanitized by RDKit\n- Keep only the largest fragment\n- Keep only molecules with 2 to 50 heavy atoms\n- Discard molecules with radicals\n- Drop duplicate molecules based on their InchiKey\n\nOnce the data is fetched and standardized, the benchmark can be run. The benchmark will start by preparing a \"reduced\" version of the molecule by adding explicit hydrogen atoms and removing bond orders and formal charges. This is done to mimic the minimal information available in most topology files for MD simulations.\n\nThe RDKitConverter might give different results depending on the order of atoms in the molecule. For that reason, the benchmark will enumerate reordered version of the molecule so that each atom appears in the first position once.  \nThis is done by reading a SMILES of the molecule rooted at the given atom, so that the other atoms of the molecule are reordered in a realistic way.\n\nFinally, the reordered \"reduced\" molecule goes through the MDAnalysis code responsible for inferring bond orders and formal charges.\n\nDuring the enumeration of reordered molecules, if any of the inferred molecules fails to match with the original molecule or one of its resonance structures, the whole test fails for that molecule.\n\nPrototype code:\n```python\ndef benchmark_mol(reference_mol):\n    reduced_mol = remove_bond_orders_and_charges(reference_mol)\n    for mol in enumerate_reordered_mol(reduced_mol):\n        mol = infer_bond_orders_and_charges(mol)\n        valid = is_same_mol_or_resonance_structure(mol, reference_mol)\n        if not valid:\n            return \"FAILED\"\n    return \"SUCCESS\"\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdanalysis%2Frdkitconverter-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdanalysis%2Frdkitconverter-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdanalysis%2Frdkitconverter-benchmark/lists"}