An open API service indexing awesome lists of open source software.

https://github.com/nbehrnd/xyz2mol_b

based on a structure provided as .xyz file, attempt the generation of a .sdf block
https://github.com/nbehrnd/xyz2mol_b

computational-chemistry conversion rdkit sdf structure xyz

Last synced: 2 months ago
JSON representation

based on a structure provided as .xyz file, attempt the generation of a .sdf block

Awesome Lists containing this project

README

          

Code style: black

# background

Based on an algorithm by Kim and Kim[^1], Jan Jensen implemented
xyz2mol[^2] relying on RDKit to convert .xyz files into .sdf including
assignment of bond orders different than one. As presented in 2020
([recording](https://www.youtube.com/watch?v=HD6IpXMVKeo), [pdf
slides](https://github.com/rdkit/UGM_2020/blob/master/Presentations/JanJensen.pdf)),
the approach works best on small molecules which do not carry
organometallic bonds. With release of RDKit 2022.09.3, this
functionality became available in RDKit,[^3] and was presented in Greg
Landrum's blog,[^4] too.

This repository provides a moderator script to provide this
functionality until RDKit as packaged by DebiChem (for Debian, Ubuntu,
etc) would catch up (cf. notes in repology[^5]) however can be "handy"
for a rapid conversion if one forgot the required syntax in RDKit.

If you get a copy by cloning / downloading a .zip archive from GitHub,
resolve the dependencies with

``` shell
pip -r install requirements.txt
```

For an easier systemwide deployment, the [release
page](https://github.com/nbehrnd/xyz2mol_b/releases) provides a platform
independent Python wheel which in turn resolves dependencies like RDKit
and numpy from the PyPI.

# intended use

By default, the submitted structure in the input .xyz file is presumed
to be balanced and overall neutral (see for instance `ethane.xyz` in
subfolder `tests`). Call e.g.,

``` shell
$ cat ethane.xyz
8
charge 0, ethane
C -4.58735 0.92696 0.00000
C -3.11050 0.92696 0.00000
H -4.93786 1.78883 0.58064
H -4.93786 -0.00682 0.45608
H -4.93786 0.99888 -1.03672
H -2.75999 0.85505 1.03672
H -2.75998 1.86075 -0.45608
H -2.75998 0.06509 -0.58064
$ xyz2mol_b ethane.xyz

RDKit 3D

8 7 0 0 0 0 0 0 0 0999 V2000
-4.5873 0.9270 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.1105 0.9270 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.9379 1.7888 0.5806 H 0 0 0 0 0 0 0 0 0 0 0 0
-4.9379 -0.0068 0.4561 H 0 0 0 0 0 0 0 0 0 0 0 0
-4.9379 0.9989 -1.0367 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.7600 0.8550 1.0367 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.7600 1.8607 -0.4561 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.7600 0.0651 -0.5806 H 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0
3 1 1 0
4 1 1 0
5 1 1 0
6 2 1 0
7 2 1 0
8 2 1 0
M END

```

which can be redirected into a permanent record. For input structures
like the acetate anion (file `acetate.xyz`, again sub folder `tests`),
or the tetramethylammonium cation (file `NMe4_cation.xyz`, a successful
conversion requires a user assigned overall charge indicated by either
flag `--charge` (or `-c`). This can be combined freely with the optional
flag `--x3` to report the structure in the syntax of sdf (V3000)

``` shell
$ xyz2mol_b acetate.xyz -c -1

RDKit 3D

7 6 0 0 0 0 0 0 0 0999 V2000
-4.7169 0.8992 0.0571 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.2490 0.9840 -0.2283 C 0 0 0 0 0 0 0 0 0 0 0 0
-5.0417 1.7438 0.6786 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.0171 -0.0221 0.5634 H 0 0 0 0 0 0 0 0 0 0 0 0
-5.2108 0.9687 -0.9121 H 0 0 0 0 0 0 0 0 0 0 0 0
-2.6591 2.0570 -0.3402 O 0 0 0 0 0 0 0 0 0 0 0 0
-2.6341 -0.1870 -0.4868 O 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0
3 1 1 0
4 1 1 0
5 1 1 0
6 2 2 0
7 2 1 0
M CHG 1 7 -1
M END

```

and

``` shell
$ xyz2mol_b NMe4_cation.xyz -c 1 --x3

RDKit 3D

0 0 0 0 0 0 0 0 0 0999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 17 16 0 0 0
M V30 BEGIN ATOM
M V30 1 N 1.068110 0.072580 0.009700 0 CHG=1
M V30 2 C 0.562820 -1.304500 0.392100 0
M V30 3 C 0.562820 1.092290 1.011060 0
M V30 4 C 0.562800 0.429910 -1.374090 0
M V30 5 C 2.583990 0.072570 0.009700 0
M V30 6 H 0.937390 -1.544150 1.392220 0
M V30 7 H -0.531490 -1.284440 0.386550 0
M V30 8 H 0.937390 -2.025520 -0.341190 0
M V30 9 H 0.937400 2.078230 0.718560 0
M V30 10 H -0.531490 1.077450 0.996480 0
M V30 11 H 0.937390 0.817740 2.002150 0
M V30 12 H 0.937410 1.425510 -1.631870 0
M V30 13 H 0.937400 -0.316360 -2.081690 0
M V30 14 H -0.531470 0.424720 -1.353950 0
M V30 15 H 2.929140 -0.187130 1.015360 0
M V30 16 H 2.929140 -0.668510 -0.718050 0
M V30 17 H 2.929140 1.073360 -0.268230 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 2 1
M V30 2 1 3 1
M V30 3 1 4 1
M V30 4 1 5 1
M V30 5 1 6 2
M V30 6 1 7 2
M V30 7 1 8 2
M V30 8 1 9 3
M V30 9 1 10 3
M V30 10 1 11 3
M V30 11 1 12 4
M V30 12 1 13 4
M V30 13 1 14 4
M V30 14 1 15 5
M V30 15 1 16 5
M V30 16 1 17 5
M V30 END BOND
M V30 END CTAB
M END

```

Note the context of the structures you submit. As one can check with
file `C9H11.xyz` in the `tests` subfolder, one input file can lead to
both a .sdf file of a cation (`--charge 1`), and anion (`-c
-1`).[^6]

[^1]: Kim, Y and Kim, W. Y. Universal Structure Conversion Method for
Organic Molecules: From Atomic Connectivity to Three-Dimensional
Geometry. *Bull. Korean Chem. Soc.* **2015**, *36*, 1769-1777, [doi
10.1002/bkcs.10334](https://doi.org/10.1002/bkcs.10334).

[^2]:

[^3]:

[^4]:

[^5]:

[^6]: The same Hill formula equally applies to the neutral
2-phenyl-2-propyl radical, PubChem [CID
140141](https://pubchem.ncbi.nlm.nih.gov/compound/140141).