{"id":27950590,"url":"https://github.com/arhs99/rdkit-cpp-utils","last_synced_at":"2025-07-16T20:35:37.695Z","repository":{"id":285954155,"uuid":"959816472","full_name":"Arhs99/rdkit-cpp-utils","owner":"Arhs99","description":"rdkit scripts for C++","archived":false,"fork":false,"pushed_at":"2025-04-03T14:01:54.000Z","size":30,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-07T16:15:04.001Z","etag":null,"topics":["cheminformatics","chemistry","rdkit"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Arhs99.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-03T11:57:30.000Z","updated_at":"2025-04-04T00:04:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"60848a1e-771c-41ec-9ead-f300c9c3b743","html_url":"https://github.com/Arhs99/rdkit-cpp-utils","commit_stats":null,"previous_names":["arhs99/rdkit-cpp-utils"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Arhs99/rdkit-cpp-utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arhs99%2Frdkit-cpp-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arhs99%2Frdkit-cpp-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arhs99%2Frdkit-cpp-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arhs99%2Frdkit-cpp-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Arhs99","download_url":"https://codeload.github.com/Arhs99/rdkit-cpp-utils/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Arhs99%2Frdkit-cpp-utils/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265538611,"owners_count":23784621,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheminformatics","chemistry","rdkit"],"created_at":"2025-05-07T16:15:03.282Z","updated_at":"2025-07-16T20:35:37.688Z","avatar_url":"https://github.com/Arhs99.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# rdkit-cpp-utils\n\n## Description\nrdkit is the goto toolkit for anything related to cheminformatics and data science applications in Chemistry. It is mostly used in python which shouldn't be uderestimated performance-wise. Still for large amounts of data getting some help from C++ which is the native language of rdkit makes sense. A couple of good sources to start are: https://www.rdkit.org/docs/GettingStartedInC%2B%2B.html and https://github.com/iwatobipen/rdkit_cpp/tree/main. The aim of this repo is to collect some standard functionality in C++ when increased efficiency is required.\n\n## Installation\nThis is the hard and less fun part. This is what worked for me:\n1. Compile **rdkit** from source following https://greglandrum.github.io/rdkit-blog/posts/2023-03-17-setting-up-a-cxx-dev-env2.html One can also look at https://github.com/rdkit/rdkit/tree/master/.azure-pipelines. After a successful set-up you should have a conda environment and all the environmental variables set as described in the links.\n2. Clone the repository, and activate the conda environment that was created in Step 1\n   ```\n   git clone https://github.com/Arhs99/rdkit-cpp-utils.git\n   cd rdkit-cpp-utils\n   conda activate py310_rdkit_build\n   ```\n4. Environmental variables: At least ```RDBASE``` and ```LD_LIBRARY_PATH``` should be set, if not:\n```\nexport LD_LIBRARY_PATH=${RDBASE}/lib:${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}\n```\n5. Create a ```build``` directory and run to compile and link:\n```\ncmake ..\nmake\n```\nThis should create the executable ```morganfp```\n\n## Usage\nOne can use either ```morganfp``` or ```fingerprint.py``` to calculate Morgan fingerprints from an sdf file and store them as numpy array files ```*.npy```. Additional arguments are the fingerprint radius and the number of bits\n### C++ example\nRun as: \n```\n./morganfp ../data/ChEMBL_set.sdf  ../data/arr_cpp.npy 3 2048\n```\n\n### python example\n```\npython fingerprints.py data/ChEMBL_set.sdf  data/arr_py.npy 3 2048\n```\n## Acknowledgment\nThe files ```cnpy.cpp``` and ```cnpy.h``` used for loading/saving C++ data as numpy ```.npy``` files were copied from the **cnpy** library https://github.com/rogersce/cnpy\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farhs99%2Frdkit-cpp-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farhs99%2Frdkit-cpp-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farhs99%2Frdkit-cpp-utils/lists"}