An open API service indexing awesome lists of open source software.

https://github.com/basf/autoadsorbate

Chemical intuition for surface science in a package.
https://github.com/basf/autoadsorbate

atomistic-modelling atomistic-simulations conformer-generation dataset-generation smiles-strings surface-detection

Last synced: 10 days ago
JSON representation

Chemical intuition for surface science in a package.

Awesome Lists containing this project

README

          

{
"cells": [
{
"cell_type": "markdown",
"id": "5d816139-ae3f-4ad3-ae7a-f802e7615344",
"metadata": {},
"source": [
"## dev cells"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "689bcf2b-0278-425e-ba07-fa931abeb6f1",
"metadata": {},
"outputs": [],
"source": [
"from ase.visualize.plot import plot_atoms\n",
"\n",
"from autoadsorbate import Fragment, Surface"
]
},
{
"cell_type": "markdown",
"id": "afbf84ab-f714-4c5f-9a0b-9187123e709c",
"metadata": {},
"source": [
"## autoadsorbate"
]
},
{
"cell_type": "markdown",
"id": "d99a8a08-42a9-4ee9-97d4-37de91697265",
"metadata": {},
"source": [
"The challenge of generating initial structures for heterogeneous catalysis is traditionally addressed through manual labor. However, this package aims to offer an alternative approach.\n",
"\n",
"To effectively simulate reactive behavior at surfaces, it is crucial to establish clear definitions within our framework. The following definitions are essential in order to accurately characterize the structures of interest:\n",
"\n",
"- __Fragment__: \n",
" - Molecules - species that exist in their corresponding geometries __even when isolated from the surface__.\n",
" - Reactive species - species that exist in their corresponding geometries __only when attached to the surface__.\n",
"- __Surface__:\n",
" - The definition of the surface is simple - every atom of the slab that can be in contact with an intermediate is considered a surface atom. The surface is a collection of such atoms.\n",
" - Every atom of the surface is a \"top\" site.\n",
" - When two \"top\" sites are close (close in its literal meaning) to each other, they form a \"bridge\" site.\n",
" - When three \"top\" sites are close (close in its literal meaning) to each other, they form a \"3-fold\" site.\n",
" - etc.\n",
"- __Active Site__:\n",
" - A collection of one or more sites that can facilitate a chemical transformation is called an active site.\n",
" - A \"top\" site can be an active site only for Eley-Rideal transformations.\n",
" - All other transformations require that at least one intermediate binds through at least two sites. All involved sites compose an active site.\n",
"- __Intermediate__:\n",
" - Intermediates are fragments bound to an active site."
]
},
{
"cell_type": "markdown",
"id": "43a757a1-5a02-4a8d-9737-0252c6471a1d",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "7ae68f87-e920-4a7b-b9f8-0315a4b3a6a4",
"metadata": {},
"source": [
"the idea was to keep the package as light as possible, hence the foundation of this package is ase and rdkit, allong with some basic python packages (pandas, numpy, etc.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7dac144-3013-4cf3-9144-6b38b40cec0d",
"metadata": {},
"outputs": [],
"source": [
"# from autoadsorbate.autoadsorbate import Surface, Fragment\n",
"# from ase.io import read, write\n",
"# from ase.visualize import view\n",
"# from ase.visualize.plot import plot_atoms\n",
"# import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "e2e51886-9b71-4954-b2a9-d6846b5707ff",
"metadata": {},
"source": [
"### Fragment"
]
},
{
"cell_type": "markdown",
"id": "e7abcb94-3e04-4fef-aa8e-fee2d54df704",
"metadata": {},
"source": [
"#### Molecules"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "99f5dc2e-5778-44e8-8dd4-808faee575e1",
"metadata": {},
"outputs": [],
"source": [
"f = Fragment(input=\"COC\", to_initialize=5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b88aae20-0f80-4acd-beb8-ced4b73d321d",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import docs_plot_conformers\n",
"\n",
"conformer_trajectory = f.conformers\n",
"fig = docs_plot_conformers(conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "d89ea086-b0b5-4010-ae64-5e84ff39c63b",
"metadata": {},
"source": [
"Notice that the orientation of the fragment is arbitrary. We could simply paste these structures on a surface of some material, but it would be difficult to quantify the quality of the initial random guesses and hence how many structures we need to sample. We would then have to run dynamic simulations to probe for local minima and check which minima are the most stable.\n",
"\n",
"In this case of DME, we can use our knowledge of chemistry to simplify the problem. Since the O atom bridging the two methyl groups had 2 \"lone electron pairs,\" we can use a simple trick: replacing one of the lone pairs with a marker atom (let's use Cl)."
]
},
{
"cell_type": "markdown",
"id": "75d91009-08a3-4810-9ea2-3bdd3b410959",
"metadata": {},
"source": [
"\n",
"Notice that we had to make two adjustments to the SMILES string:\n",
"- to be able to replace the lone pair with a marker we must \"trick\" the valnce of the O atom, and reshufle the smiles formula so that the marker is in first place (for easy book-keeping)\n",
" - ```COC``` original\n",
" - ```CO(Cl)C``` add Cl instead of the O lone pair (this is an invalid SMILES)\n",
" - ```C[O+](Cl)C``` trick to make the valence work\n",
" - ```Cl[O+](C)C``` rearrange so taht the SMILES string starts with the marker first (for easy book keeping)"
]
},
{
"cell_type": "markdown",
"id": "d1ccd594-03ad-4406-b963-22cc2f8ee287",
"metadata": {},
"source": [
"This can be also done with a function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f7799f1-58be-4d0b-8253-079df53a0d85",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import get_marked_smiles\n",
"\n",
"marked_smile = get_marked_smiles([\"COC\"])[0]\n",
"marked_smile"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e32efc5-795c-4d7f-84b5-530042c8163e",
"metadata": {},
"outputs": [],
"source": [
"f = Fragment(input=\"Cl[O+](C)(C)\", to_initialize=5)\n",
"len(f.conformers)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "688e74c5-6400-4c27-8da5-a5d4eab3832d",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import docs_plot_conformers\n",
"\n",
"conformer_trajectory = f.conformers\n",
"fig = docs_plot_conformers(conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "048c6751-60ea-4692-a6c6-914e06e76a89",
"metadata": {},
"source": [
"Now we can use the marker atom to orient our molecule:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b6c6673-7b75-41ac-91ce-6d3f98e5540a",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import docs_plot_sites\n",
"\n",
"oriented_conformer_trajectory = [f.get_conformer(i) for i, _ in enumerate(f.conformers)]\n",
"fig = docs_plot_conformers(oriented_conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "0fa177ac-1f08-4488-8715-18a449cb1859",
"metadata": {},
"source": [
"We can also easily remove the marker:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08cd4bd3-85bf-4fd0-874f-55d3087f81b6",
"metadata": {},
"outputs": [],
"source": [
"clean_conformer_trajectory = [atoms[1:] for atoms in oriented_conformer_trajectory]\n",
"fig = docs_plot_conformers(clean_conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "dc13f155-13a0-437e-8a4e-f0ce7b9ab832",
"metadata": {},
"source": [
"#### Reactive species "
]
},
{
"cell_type": "markdown",
"id": "7586f51e-ab4a-4535-a6f9-7d1434d3d2bf",
"metadata": {},
"source": [
"Methoxy"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "adbae14c-f656-4659-b9cf-e2ad3e0d5188",
"metadata": {},
"outputs": [],
"source": [
"f = Fragment(input=\"ClOC\", to_initialize=5)\n",
"oriented_conformer_trajectory = [f.get_conformer(i) for i, _ in enumerate(f.conformers)]\n",
"fig = docs_plot_conformers(oriented_conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "ee45e5bc-04f4-43a1-bf22-8364e45e5dfa",
"metadata": {},
"source": [
"##### Methyl"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f13b7559-9242-4f0a-995a-3c60e4051386",
"metadata": {},
"outputs": [],
"source": [
"f = Fragment(input=\"ClC\", to_initialize=5)\n",
"oriented_conformer_trajectory = [f.get_conformer(i) for i, _ in enumerate(f.conformers)]\n",
"fig = docs_plot_conformers(oriented_conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "5dea02c1-8d39-46eb-8992-f3126fd70946",
"metadata": {},
"source": [
"##### Frangments with more than one binding mode (e.g. 1,2-PDO)"
]
},
{
"cell_type": "markdown",
"id": "ac40c5c3-da5b-46cc-bc5a-ff14b8bfc16c",
"metadata": {},
"source": [
"bound through single site:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "806d656e-48c3-4cba-afde-c9c6cb045a10",
"metadata": {},
"outputs": [],
"source": [
"f = Fragment(input=\"Cl[OH+]CC(O)C\", to_initialize=5)\n",
"oriented_conformer_trajectory = [f.get_conformer(i) for i, _ in enumerate(f.conformers)]\n",
"fig = docs_plot_conformers(oriented_conformer_trajectory)"
]
},
{
"cell_type": "markdown",
"id": "2910a638-60bc-40be-8684-f62623cd89f2",
"metadata": {},
"source": [
"Coordinated withboth hydroxil:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "306e3fb4-e542-409c-8a7f-fdce06b27440",
"metadata": {},
"outputs": [],
"source": [
"f = Fragment(input=\"S1S[OH+]CC([OH+]1)C\", to_initialize=5)\n",
"oriented_conformer_trajectory = [f.get_conformer(i) for i, _ in enumerate(f.conformers)]\n",
"fig = docs_plot_conformers(oriented_conformer_trajectory)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9afb592-cf60-4e5b-9e6d-c7501b5fcf75",
"metadata": {},
"outputs": [],
"source": [
"from ase.io import read\n",
"f = Fragment(input = read('my_slab_with_adsorbate.traj'), to_initialize = 5)\n",
"oriented_conformer_trajectory = [f.get_conformer(i) for i, _ in enumerate(f.conformers)]"
]
},
{
"cell_type": "markdown",
"id": "6bcb273a-56e1-4969-ba19-a30753730388",
"metadata": {},
"source": [
"### Surface"
]
},
{
"cell_type": "markdown",
"id": "ad9b99ac-ee89-4142-b041-c41fb1860408",
"metadata": {},
"source": [
"First we need to have a slab (slab is an arrangement of atoms that contains the boundry between the material in question and other - i.e. gas, fluid, other material). We can read one (```ase.io.read('path_to_file')```) we prepared earlier, or we can use ase to construct a new slab:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13950877-b3a8-4ad6-b35e-15610a4e7b5b",
"metadata": {},
"outputs": [],
"source": [
"from ase.build import fcc111\n",
"\n",
"slab = fcc111(\"Cu\", (4, 4, 4), periodic=True, vacuum=10)"
]
},
{
"cell_type": "markdown",
"id": "685babf1-0388-4bd5-9273-d1fcafa7fc39",
"metadata": {},
"source": [
"Now we can initalize the Surface object which associates the constructed slab (ase.Atoms) with additional information required for placing Fragments.\n",
"We can view which atoms are in the surface:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6808f6f8-9a73-4534-96e8-742f0e5f0498",
"metadata": {},
"outputs": [],
"source": [
"s = Surface(slab)\n",
"plot_atoms(s.view_surface(return_atoms=True))"
]
},
{
"cell_type": "markdown",
"id": "529f95c0-4c02-4023-bb37-b38083037b07",
"metadata": {},
"source": [
"We have access to all the sites info as a pandas dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "45a4abf2-0012-4de2-bde0-208ce9f15124",
"metadata": {},
"outputs": [],
"source": [
"s.site_df.head()"
]
},
{
"cell_type": "markdown",
"id": "99aee59b-38fe-45d4-84ca-67c5534bf213",
"metadata": {},
"source": [
"or in dict form:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89541ad9-e859-447e-a69e-fdc1982dad67",
"metadata": {},
"outputs": [],
"source": [
"s.site_dict.keys()"
]
},
{
"cell_type": "markdown",
"id": "fae84996-b0a3-49db-87f8-d118669a1782",
"metadata": {},
"source": [
"One can easily get access to sites ase.Atoms and find useful information in the ase.Atoms.info:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "63771bd4-debf-4a8c-9b5c-9b15769138fd",
"metadata": {},
"outputs": [],
"source": [
"site_atoms = s.view_site(0, return_atoms=True)\n",
"site_atoms.info"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0115fe50-0e06-4a2b-9391-12edca635e5c",
"metadata": {},
"outputs": [],
"source": [
"fig = docs_plot_sites(s)"
]
},
{
"cell_type": "markdown",
"id": "3f43d66e-beee-46c4-9773-339835ec8565",
"metadata": {},
"source": [
"We can keep only the symmetry unique ones like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bd3ed03-b0df-4404-a9d1-7e36136d8984",
"metadata": {},
"outputs": [],
"source": [
"s.sym_reduce()\n",
"s.site_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ecfc8ed1-e632-46b8-ad85-e67d3df26db2",
"metadata": {},
"outputs": [],
"source": [
"plot_atoms(s.view_surface(return_atoms=True))"
]
},
{
"cell_type": "markdown",
"id": "5ae799f7-e30c-44b8-a669-3cababcb3c8e",
"metadata": {},
"source": [
"## Making surrgate smiles automatically"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "02b246fa-fc48-4c8c-9d6b-26b6e0086796",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import _example_config\n",
"\n",
"_example_config"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a874270-94d6-4be3-bde2-30e90e746743",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import construct_smiles\n",
"\n",
"config = {\n",
" \"backbone_info\": {\"C\": 0, \"O\": 0, \"N\": 2},\n",
" \"allow_intramolec_rings\": True,\n",
" \"ring_marker\": 2,\n",
" \"side_chain\": [\"(\", \")\"],\n",
" \"brackets\": [\"[\", \"]\", \"H+]\", \"H2+]\", \"H3+]\"],\n",
" \"make_labeled\": True,\n",
"}\n",
"\n",
"smiles = construct_smiles(config)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "382a82eb-d4cb-4eb9-a03c-3f166c62ef00",
"metadata": {},
"outputs": [],
"source": [
"smiles"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a0e9320-7c9c-49ef-bea3-cc3d7d6853f7",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import Fragment\n",
"\n",
"trj = []\n",
"for s in smiles:\n",
" try:\n",
" f = Fragment(s, to_initialize=1)\n",
" a = f.get_conformer(0)\n",
" trj.append(a)\n",
" except:\n",
" pass\n",
"\n",
"lst = [z for z in zip([a.get_chemical_formula() for a in trj], trj)]\n",
"lst.sort(key=lambda tup: tup[0])\n",
"trj = [a[1] for a in lst]\n",
"len(trj)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a734dcdd-74ca-4aa8-b3d3-ff7595d3d00a",
"metadata": {},
"outputs": [],
"source": [
"from autoadsorbate import get_drop_snapped\n",
"\n",
"xtrj = get_drop_snapped(trj, d_cut=1.5)\n",
"len(xtrj)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe8e6b91-c450-4b0a-9479-98a5fc2167a4",
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"from ase import Atoms\n",
"from ase.visualize.plot import plot_atoms\n",
"\n",
"fig, axs = plt.subplots(3, 11, figsize=[10, 5], dpi=100)\n",
"\n",
"for i, ax in enumerate(axs.flatten()):\n",
" try:\n",
" platoms = xtrj[i].copy()\n",
"\n",
" except:\n",
" platoms = Atoms(\"X\", positions=[[0, 0, 0]])\n",
"\n",
" for atom in platoms:\n",
" if atom.symbol in [\"Cl\", \"S\"]:\n",
" atom.symbol = \"Ga\"\n",
" plot_atoms(platoms, rotation=(\"-90x,0y,0z\"), ax=ax)\n",
" ax.set_axis_off()\n",
" ax.set_xlim(-1, 5)\n",
" ax.set_ylim(-0.5, 5.5)\n",
"\n",
"fig.set_layout_engine(layout=\"tight\")"
]
},
{
"cell_type": "markdown",
"id": "418ee1c3-7c01-4601-89dc-cfb8316e47bb",
"metadata": {},
"source": [
"## Fully automatic - populate Surface with Fragment"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "decbf165-1a48-49a0-b16b-8c63e1d439d5",
"metadata": {},
"outputs": [],
"source": [
"from ase.build import fcc211\n",
"\n",
"from autoadsorbate import Fragment, Surface\n",
"\n",
"slab = fcc211(symbol=\"Cu\", size=(6, 3, 3), vacuum=10)\n",
"s = Surface(slab, touch_sphere_size=2.7)\n",
"s.sym_reduce()\n",
"\n",
"fragments = [\n",
" Fragment(\"S1S[OH+]CC(N)[OH+]1\", to_initialize=20),\n",
" Fragment(\"Cl[OH+]CC(=O)[OH+]\", to_initialize=5),\n",
"]\n",
"\n",
"out_trj = []\n",
"for fragment in fragments:\n",
" out_trj += s.get_populated_sites(\n",
" fragment,\n",
" site_index=\"all\",\n",
" sample_rotation=True,\n",
" mode=\"heuristic\",\n",
" conformers_per_site_cap=5,\n",
" overlap_thr=1.6,\n",
" verbose=True,\n",
" )\n",
" print(\"out_trj \", len(out_trj))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48db2ef6-4216-4688-a2bb-f24a989a5ad3",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}