https://github.com/rasbt/smilite
A Python module to retrieve and compare SMILE strings of chemical compounds from the free ZINC online database
https://github.com/rasbt/smilite
python smile-string sqlite-database zinc zinc-online-database
Last synced: 4 months ago
JSON representation
A Python module to retrieve and compare SMILE strings of chemical compounds from the free ZINC online database
- Host: GitHub
- URL: https://github.com/rasbt/smilite
- Owner: rasbt
- License: gpl-3.0
- Created: 2014-02-28T17:48:30.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2020-07-26T03:17:28.000Z (almost 5 years ago)
- Last Synced: 2024-12-10T05:42:53.156Z (5 months ago)
- Topics: python, smile-string, sqlite-database, zinc, zinc-online-database
- Language: Python
- Homepage:
- Size: 557 KB
- Stars: 76
- Watchers: 7
- Forks: 33
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.txt
- License: LICENSE.txt
Awesome Lists containing this project
README
# smilite
smilite is a Python module to download and analyze SMILES strings (Simplified Molecular-Input Line-entry System) of chemical compounds from ZINC (a free database of commercially-available compounds for virtual screening, [http://zinc.docking.org](http://zinc.docking.org)).
Now supports both Python 3.x and Python 2.x.
#### Sections
• [Installation](#installation)
• [Simple command line online query scripts](#simple_cmd_scripts)
- [lookup_zincid.py](#lookup_zincid)
- [lookup_smile_str.py](#lookup_smile_str)
• [CSV file command line scripts](#csv_scripts)
- [gen_zincid_smile_csv.py (downloading SMILES)](#gen_zincid)
- [comp_smile_strings.py (checking for duplicates within 1 file)](#comp_smile)
- [comp_2_smile_files.py (checking for duplicates across 2 files)](#comp_2_smile)
• [SQLite file command line scripts](#sqlite_scripts)
- [lookup_single_id.py](#lookup1id)
- [lookup_smile.py](#lookupsmile)
- [add_to_sqlite.py](#add_to_sqlite)
- [sqlite_to_csv.py](#sqlite_to_csv)
• [Changelog](#changelog)# Installation
You can use the following command to install smilite:
`pip install smilite`
or
`easy_install smilite`Alternatively, you can download the package manually from the Python Package Index [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite), unzip it, navigate into the package, and use the command:
`python3 setup.py install`
# Simple command line online query scripts
If you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/cmd_line_online_query_scripts` dir.
### lookup_zincid.py
Retrieves the SMILES string and simplified SMILES string for a given ZINC ID
from the online Zinc. It uses [ZINC12](http://zinc.docking.org) as the default backend, and via an additional commandline argument `zinc15`, the [ZINC15](http://zinc15.docking.org) database will be used instead.**Usage:**
`[shell]>> python3 lookup_zincid.py ZINC_ID [zinc12/zinc15]`**Example (retrieve data from ZINC):**
`[shell]>> python3 lookup_zincid.py ZINC01234567 zinc15`**Output example:**
ZINC01234567
C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)OWhere
- 1st row: ZINC ID
- 2nd row: SMILES string
- 3rd row: simplified SMILES string### lookup_smile_str.py
Retrieves the corresponding ZINC_IDs for a given SMILES string
from the online ZINC database.**Usage:**
`[shell]>> python3 lookup_smile_str.py SMILE_str`**Example (retrieve data from ZINC):**
`[shell]>> python3 lookup_smile_str.py "C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"`**Output example:**
ZINC01234567
ZINC01234568
ZINC01242053
ZINC01242055# CSV file command line scripts
If you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/csv_scripts` dir.
### gen_zincid_smile_csv.py (downloading SMILES)
Generates a ZINC_ID,SMILE_STR csv file from a input file of ZINC IDs. The input file should consist of 1 columns with 1 ZINC ID per row. [ZINC12](http://zinc.docking.org) is used as the default backend, and via an additional commandline argument `zinc15`, the [ZINC15](http://zinc15.docking.org) database can be used instead.
**Usage:**
`[shell]>> python3 gen_zincid_smile_csv.py in.csv out.csv [zinc12/zinc15]`**Example:**
`[shell]>> python3 gen_zincid_smile_csv.py ../examples/zinc_ids.csv ../examples/zid_smiles.csv zinc15`**Screen Output:**
Downloading SMILES
0% 100%
[########## ] | ETA[sec]: 106.525**Input example file format:**

[zinc_ids.csv](https://raw.github.com/rasbt/smilite/master/examples/zinc_ids.csv)**Output example file format:**

[zid_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles.csv)### comp_smile_strings.py (checking for duplicates within 1 file)
Compares SMILES strings within a 2 column CSV file (ZINC_ID,SMILE_string) to identify duplicates. Generates a new CSV file with ZINC IDs of identified duplicates listed in a 3rd-nth column(s).
**Usage:**
`[shell]>> python3 comp_smile_strings.py in.csv out.csv [simplify]`**Example 1:**
`[shell]>> python3 comp_smile_strings.py ../examples/zinc_smiles.csv ../examples/comp_smiles.csv`**Input example file format:**

[zid_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles.csv)**Output example file format 1:**

[comp_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_smiles.csv)Where
- 1st column: ZINC ID
- 2nd column: SMILES string
- 3rd column: number of duplicates
- 4th-nth column: ZINC IDs of duplicates**Example 2:**
`[shell]>> python3 comp_smile_strings.py ../examples/zid_smiles.csv ../examples/comp_simple_smiles.csv simplify`**Output example file format 2:** 
[comp_simple_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_simple_smiles.csv)### comp_2_smile_files.py (checking for duplicates across 2 files)
Compares SMILES strings between 2 input CSV files, where each file consists of rows with 2 columns ZINC_ID,SMILE_string to identify duplicate SMILES string across both files.
Generates a new CSV file with ZINC IDs of identified duplicates listed in a 3rd-nth column(s).**Usage:**
`[shell]>> python3 comp_2_smile_files.py in1.csv in2.csv out.csv [simplify]`**Example:**
`[shell]>> python3 comp_2_smile_files.py ../examples/zid_smiles2.csv ../examples/zid_smiles3.csv ../examples/comp_2_files.csv`**Input example file 1:**

[zid_smiles2.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles2.csv)**Input example file 2:**

[zid_smiles3.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles3.csv)**Output example file format:**

[comp_2_files.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_2_files.csv)Where:
- 1st column: name of the origin file
- 2nd column: ZINC ID
- 3rd column: SMILES string
- 4th-nth column: ZINC IDs of duplicates# SQLite file command line scripts
If you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/sqlite_scripts` dir.
### lookup_single_id.py
Retrieves the SMILES string and simplified SMILES string for a given ZINC ID
from a previously built smilite SQLite database or from the online ZINC database.**Usage:**
`[shell]>> python3 lookup_single_id.py ZINC_ID [sqlite_file]`**Example1 (retrieve data from a smilite SQLite database):**
`[shell]>> python3 lookup_single_id.py ZINC01234567 ~/Desktop/smilite_db.sqlite`**Example2 (retrieve data from the ZINC online database):**
`[shell]>> python3 lookup_single_id.py ZINC01234567`**Output example:**
ZINC01234567
C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)OWhere
- 1st row: ZINC ID
- 2nd row: SMILES string
- 3rd row: simplified SMILES string### lookup_smile.py
Retrieves the ZINC ID(s) for a given SMILES string or simplified SMILES string from a previously built smilite SQLite database.
**Usage:**
`[shell]>> python3 lookup_smile.py sqlite_file SMILE_STRING [simplify]`**Example1 (search for SMILES string):**
`[shell]>> python3 lookup_smile.py ~/Desktop/smilite.sqlite "C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O"`**Example2 (search for simplified SMILES string):**
`[shell]>> python3 lookup_smile.py ~/Desktop/smilite.sqlite "CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O" simple`**Output example:**
ZINC01234567
C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O
CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)OWhere
- 1st row: ZINC ID
- 2nd row: SMILES string
- 3rd row: simplified SMILES string### add_to_sqlite.py
Reads ZINC IDs from a CSV file and looks up SMILES strings and simplified SMILES strings from the ZINC online database. Writes those SMILES strings to a smilite SQLite database. A new database will be created if it doesn't exist, yet.
**Usage:**
`[shell]>> python3 add_to_sqlite.py sqlite_file csv_file`**Example:**
`[shell]>> python3 add_to_sqlite.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_ids.csv`**Input CSV file example format:**
ZINC01234567
ZINC01234568
...An example of the smilite SQLite database contents after successful insertion is shown in the image below. 
### sqlite_to_csv.py
Writes contents of an SQLite smilite database to a CSV file.
**Usage:**
`[shell]>> python3 sqlite_to_csv.py sqlite_file csv_file`**Example:**
`[shell]>> python3 sqlite_to_csv.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_smiles.csv`**Input CSV file example format:**
ZINC_ID,SMILE,SIMPLE_SMILE
ZINC01234568,C[C@@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O
ZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)OAn example of the CSV file contents opened in an spreadsheet program is shown in the image below. 
# Changelog
**VERSION 2.3.1 (07/25/2020)**
- Fix bug to allow `zinc15` option in gen_zincid_smile_csv.py script
**VERSION 2.3.0 (06/10/2020)**
- Fixes ZINC URL in `lookup_smile_str.py`
- Adds an optional command line parameter (with arguments `zinc15` or `zinc12`) for `lookup_smile_str.py`**VERSION 2.2.0**
* Provides an optional command line argument (zinc15) to use ZINC15 as a backend for downloading SMILES
**VERSION 2.1.0**
* Functions and scripts to fetch ZINC IDs corresponding to a SMILES string query
**VERSION 2.0.1**
* Progress bar for add_to_sqlite.py
**VERSION 2.0.0**
* added SQLite features
**VERSION 1.3.0**
* added script and module function to compare SMILES strings across 2 files.
**VERSION 1.2.0**
* added Python 2.x support
**VERSION 1.1.1**
* PyPrind dependency fix
**VERSION 1.1.0**
* added a progress bar (PyPrind) to `generate_zincid_smile_csv()` function