{"id":44527378,"url":"https://github.com/photophys/structure_clustering","last_synced_at":"2026-02-18T17:01:10.655Z","repository":{"id":257784374,"uuid":"860306276","full_name":"photophys/structure_clustering","owner":"photophys","description":"Python package to cluster molecular structures into groups of similar ones.","archived":false,"fork":false,"pushed_at":"2026-02-13T16:34:28.000Z","size":42,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-14T00:35:35.534Z","etag":null,"topics":["chemistry","clustering","graphs","molecules"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/photophys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-09-20T07:37:25.000Z","updated_at":"2026-02-13T16:33:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"ed7f8be3-34d0-438c-8b3b-72c06ebe580d","html_url":"https://github.com/photophys/structure_clustering","commit_stats":null,"previous_names":["photophys/structure_clustering"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/photophys/structure_clustering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/photophys%2Fstructure_clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/photophys%2Fstructure_clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/photophys%2Fstructure_clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/photophys%2Fstructure_clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/photophys","download_url":"https://codeload.github.com/photophys/structure_clustering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/photophys%2Fstructure_clustering/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29587066,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T16:55:40.614Z","status":"ssl_error","status_checked_at":"2026-02-18T16:55:37.558Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemistry","clustering","graphs","molecules"],"created_at":"2026-02-13T18:13:19.297Z","updated_at":"2026-02-18T17:01:10.646Z","avatar_url":"https://github.com/photophys.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# structure_clustering \u0026ndash; Cluster Molecular Structures Into Groups of Similar Ones\n\n**structure_clustering** is a Python package to cluster molecular structures into groups of similar ones. Our approach involves analysing the intermolecular distances to represent each structure's connectivity as an undirected, vertex-labelled graph. It then uses graph isomorphism to identify structures that belong to the same group. The package offers a command-line interface for clustering a multi-XYZ file or can be used within your Python code.\n\n\u003cimg src=\"https://github.com/user-attachments/assets/fef206d6-e039-49ce-911d-627068841853\" width=\"50%\" /\u003e[^1]\n\n[^1]: The figure shows exemplary clusters from Ag⁺(H₂O)₄ structures.\n\n## Installation\n\nYou can install structure_clustering via pip:\n\n```bash\npip install structure_clustering\n```\n\nPrebuilt wheels are available for most platforms (Windows, Linux, MacOS). If you prefer to compile and build the wheel yourself, ensure that the [Boost Graph Library](https://www.boost.org/doc/libs/release/libs/graph/doc/index.html) is installed system-wide.\n\nIf you want to upgrade to the latest available version, run\n\n```bash\npip install structure_clustering --upgrade\n```\n\n## Using the Command-Line Interface\n\nYou can invoke the structure_clustering script using the `structure_clustering` command.\n\n\u003cdetails\u003e\n  \u003csummary\u003eUse this method if the command does not work\u003c/summary\u003e\n\nOn some systems, scripts installed via pip are not added to the system's `PATH`. You can either [add](https://stackoverflow.com/a/70680333/17726525) them to your `PATH`, or run the script directly by invoking `python3 -m structure_clustering`.\n\n\u003c/details\u003e\n\n```bash\nusage: structure_clustering \u003cxyz_file\u003e [--config CONFIG] [--output OUTPUT] [--disconnected]\n\nCluster molecular structures into groups.\n\npositional arguments:\n  xyz_file         path of the multi-xyz-file containing the structures\n\noptions:\n  --config CONFIG  path of the config TOML file\n  --output OUTPUT  path of the resulting output file, defaults to \u003cxyz_file\u003e.sc.dat\n  --disconnected   if you want to include disconnected graphs\n  -h, --help       show this help message and exit\n```\n\nFor example, to cluster an xyz file:\n\n```bash\nstructure_clustering my_structures.xyz\n```\n\nTo specify a custom distance for recognising O-H connectivity (see the next section), use a TOML config file:\n\n```bash\nstructure_clustering my_structures.xyz --config sc_config.toml\n```\n\nIn both cases, a file named `my_structures.xyz.sc.dat` will be created, which you can import at \u003ca href=\"https://photophys.github.io/cluster-vis/\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/photophys/MOLGA.jl/refs/heads/main/docs/src/assets/logo.svg\" height=\"15px\" /\u003e https://photophys.github.io/cluster-vis/\u003c/a\u003e to visualise the results of your clustering process.\n\nThe terminal output will look like this:\n\n```\nLoading configuration from demo_config.toml\nUsing covalent radius of 1.59 for Ag\nUsing pair distance of 2.3 for O-H\nClustering does not include disconnected graphs\n\nUsing 437 structures from structures.xyz\nClustering finished \u003cstructure_clustering._core.Result object at 0x7f7c949c37b0\u003e\n  14 clusters (total 318 structures)\n  13 unique single structures\n  132 (30.21%) structures sorted out (305 remaining)\n  cluster size: Avg=22.7 Med=4.5 Q1=2.2 Q3=23.5\n  connections/structure: Avg=12.2 Med=12.0 Q1=12.0 Q3=12.0 (all 437)\n  connections/structure: Avg=12.4 Med=12.0 Q1=12.0 Q3=12.0 (remaining 305)\nWriting output file to structures.xyz.sc.dat ...\n\n🚀 Open https://photophys.github.io/cluster-vis/ to visualize your results\n```\n\n## Configuration File\n\nYou can use a TOML file to control the parameters of the command-line interface. The `[covalent]` section allows you to override the algorithm's default covalent radii. In the `[pair]` section, you can specify a maximum distance for pairs of atoms.\n\n```toml\n[covalent]\nHe = 0.9\nAg = 1.59\n\n[pair]\nO-H = 2.3\n\n[options]\nonly_connected_graphs = true\n```\n\nAll settings are optional. Distances are given in Angstrom. Elements are case-sensitive. If you specify `only_connected_graphs` in the config file, this will overwrite your setting from the command-line switch.\n\n## Example Code\n\n### Simple Example\n\n```py\nimport structure_clustering\nfrom structure_clustering import Structure, Atom\n\nsc_machine = structure_clustering.Machine()\n\nsc_machine.setCovalentRadius(1, 0.42)  # change hydrogen covalent radius to 0.42\nsc_machine.addPairDistance(8, 1, 2.3)  # extend max distance for O-H pairs to 2.3 Ang\n\nsc_machine.setOnlyConnectedGraphs(True)  # only include fully connected graphs (default)\n\n# you will need some structures\npopulation = structure_clustering.import_multi_xyz(\"structs.xyz\")\n\n# you can also create your structures programmatically\nstructure = Structure()\nstructure.addAtom(Atom(8, -1.674872668, 0.0, -0.984966492))\nstructure.addAtom(Atom(1, -1.674872668, 0.759337, -0.388923492))\nstructure.addAtom(Atom(1, -1.674872668, -0.759337, -0.388923492))\npopulation += [structure]  # add this structure to our population\n\nsc_result = sc_machine.cluster(population)\n\nprint(\"clusters\", sc_result.clusters)\nprint(\"singles\", sc_result.singles)\n\n# Output (indices from the original structure list):\n# clusters [[0, 11], [1, 2, 4, 6, 12, 13, 14, 15, 19], [3, 17, 18, 23]]\n# singles [9, 16, 22]\n```\n\n### Use Structure Hashing to Keep Track of Clusters Across Multiple Program Runs\n\nGraphs do not have a natural ordering of vertices. [Weisfeiler-Lehman](https://en.wikipedia.org/wiki/Weisfeiler_Leman_graph_isomorphism_test) (WL) refinement creates a canonical, order-independent description of a graph’s structure.\n\n1. Start with simple labels (element names, not unique).\n2. Repeatedly update each label using:\n   - the current label of the vertex\n   - the [multiset](https://en.wikipedia.org/wiki/Multiset) of neighbor labels\n3. After several iterations, vertices with different local structures almost always\n   have different labels.\n\nAssuming you have already clustered your structures, you have access to the following properties and methods:\n\n```py\nstructures = sc_result.structures\n\nstructure = structures[5]  # as example\nprint(\"num atoms\", structure.numAtoms)\nprint(\"first atomic number\", structure.getAtom(0).atomic_number)\nprint(\"first atom pos x\", structure.getAtom(0).position.x)\nprint(\"num connections\", structure.numConnections)\nprint(\"num fragments\", structure.numFragments)\nprint(\"hash\", structure.getHash())\nprint(\"atom indices for first fragment\", structure.getFragmentAtomIndices(0))\nprint(\"atom indices for second fragment\", structure.getFragmentAtomIndices(1))\n```\n\nThe output will look like this:\n\n```\nnum atoms 13\nfirst atomic number 8\nfirst atom pos x 2.026548\nnum connections 11\nnum fragments 2\nhash 0504d8ff3dc965c0\natom indices for first fragment [0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12]\natom indices for second fragment [8, 9]\n```\n\nExample structure with index `5`:\n![Structure Clustering example with two fragments](https://github.com/user-attachments/assets/8b3560e8-c334-4ac5-beac-e7ee47e2633d)\n\n## License\n\nThe structure_clustering package is licensed under the MIT License. See the [LICENSE file](LICENSE) for more details.\n\n## Contribute\n\nLocal development requires C++, CMake, and Python with `setuptools`.\n\nTo compile only the C++ code with CMake, run:\n\n```bash\nmkdir build\ncd build\ncmake ..\ncmake --build .\n```\n\nFor the full build process (Python and C++), a Python virtual environment is highly recommended. Most systems will not allow installation without one.\n\n_This tutorial assumes a WSL environment, but all WSL commands can also be executed on most other Linux systems._\n\nStart from the project root folder (no `build` folder required).\n\nCreate a virtual environment inside the WSL filesystem (outside of the mounted Windows filesystem, otherwise performance will be very poor):\n\n```bash\npython -m venv ~/venvs/structure_clustering_dev\n```\n\nActivate the virtual environment:\n\n```bash\nsource ~/venvs/structure_clustering_dev/bin/activate\n```\n\nThen install the package with:\n\n```bash\npip install .\n```\n\nYou can now iteratively change the code (either C++ or Python files) and test it using a Python script executed from the same virtual environment (most easily from the project folder).\n\nReminder: If you add a new method or property, you must also expose it in the `main.cpp` pybind11 definitions.\n\nPushing to the main branch will trigger the Github Action script, which builds the Python wheels for a matrix of platforms and Python versions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphotophys%2Fstructure_clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphotophys%2Fstructure_clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphotophys%2Fstructure_clustering/lists"}