{"id":15640493,"url":"https://github.com/rasbt/smilite","last_synced_at":"2025-08-21T02:31:50.259Z","repository":{"id":14578151,"uuid":"17294464","full_name":"rasbt/smilite","owner":"rasbt","description":"A Python module to retrieve and compare SMILE strings of chemical compounds from the free ZINC online database","archived":false,"fork":false,"pushed_at":"2020-07-26T03:17:28.000Z","size":570,"stargazers_count":76,"open_issues_count":3,"forks_count":33,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-12-10T05:42:53.156Z","etag":null,"topics":["python","smile-string","sqlite-database","zinc","zinc-online-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rasbt.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.txt","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-02-28T17:48:30.000Z","updated_at":"2024-07-04T07:25:52.000Z","dependencies_parsed_at":"2022-09-19T09:51:44.108Z","dependency_job_id":null,"html_url":"https://github.com/rasbt/smilite","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fsmilite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fsmilite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fsmilite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rasbt%2Fsmilite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rasbt","download_url":"https://codeload.github.com/rasbt/smilite/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230479864,"owners_count":18232630,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","smile-string","sqlite-database","zinc","zinc-online-database"],"created_at":"2024-10-03T11:35:56.980Z","updated_at":"2024-12-19T18:17:50.348Z","avatar_url":"https://github.com/rasbt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# smilite\n\nsmilite is a Python module to download and analyze SMILES strings (Simplified Molecular-Input Line-entry System) of chemical compounds from ZINC (a free database of commercially-available compounds for virtual screening, [http://zinc.docking.org](http://zinc.docking.org)).  \nNow supports both Python 3.x and Python 2.x.\n\n\n![](https://raw.github.com/rasbt/smilite/master/images/smilite_overview.png)  \n\n#### Sections\n\n• [Installation](#installation)  \n• [Simple command line online query scripts](#simple_cmd_scripts)  \n      - [lookup_zincid.py](#lookup_zincid)  \n      - [lookup_smile_str.py](#lookup_smile_str)  \n• [CSV file command line scripts](#csv_scripts)  \n      - [gen_zincid_smile_csv.py (downloading SMILES)](#gen_zincid)  \n      - [comp_smile_strings.py (checking for duplicates within 1 file)](#comp_smile)  \n      - [comp_2_smile_files.py (checking for duplicates across 2 files)](#comp_2_smile)  \n• [SQLite file command line scripts](#sqlite_scripts)  \n      - [lookup_single_id.py](#lookup1id)  \n      - [lookup_smile.py](#lookupsmile)  \n      - [add_to_sqlite.py](#add_to_sqlite)  \n      - [sqlite_to_csv.py](#sqlite_to_csv)  \n• [Changelog](#changelog)  \n\n\u003ca name=\"installation\"\u003e\u003c/a\u003e\n\n# Installation\n\nYou can use the following command to install smilite:  \n`pip install smilite`  \nor  \n`easy_install smilite`\n\nAlternatively, you can download the package manually from the Python Package Index [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite), unzip it, navigate into the package, and use the command:\n\n`python3 setup.py install`\n\n\u003ca name=\"simple_cmd_scripts\"\u003e\u003c/a\u003e\n\n# Simple command line online query scripts\n\nIf you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/cmd_line_online_query_scripts` dir.\n\n\u003ca name=\"lookup_zincid\"\u003e\u003c/a\u003e\n\n### lookup_zincid.py\n\nRetrieves the SMILES string and simplified SMILES string for a given ZINC ID  \nfrom the online Zinc. It uses [ZINC12](http://zinc.docking.org) as the default backend, and via an additional commandline argument `zinc15`, the [ZINC15](http://zinc15.docking.org) database will be used instead.\n\n**Usage:**  \n`[shell]\u003e\u003e python3 lookup_zincid.py ZINC_ID [zinc12/zinc15]`  \n\n**Example (retrieve data from ZINC):**  \n`[shell]\u003e\u003e python3 lookup_zincid.py ZINC01234567 zinc15`  \n\n**Output example:**\n\n\u003cpre\u003eZINC01234567\nC[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O\nCC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O\n\u003c/pre\u003e\n\nWhere  \n- 1st row: ZINC ID  \n- 2nd row: SMILES string  \n- 3rd row: simplified SMILES string\n\n\u003ca name=\"lookup_smile_str\"\u003e\u003c/a\u003e\n\n### lookup_smile_str.py\n\nRetrieves the corresponding ZINC_IDs for a given SMILES string  \nfrom the online ZINC database. \n\n**Usage:**  \n`[shell]\u003e\u003e python3 lookup_smile_str.py SMILE_str`  \n\n**Example (retrieve data from ZINC):**  \n`[shell]\u003e\u003e python3 lookup_smile_str.py \"C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O\"`  \n\n**Output example:**\n\n\u003cpre\u003eZINC01234567\nZINC01234568\nZINC01242053\nZINC01242055\u003c/pre\u003e\n\n\u003ca name=\"csv_scripts\"\u003e\u003c/a\u003e\n\n# CSV file command line scripts\n\nIf you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/csv_scripts` dir.\n\n\u003ca name=\"gen_zincid\"\u003e\u003c/a\u003e\n\n### gen_zincid_smile_csv.py (downloading SMILES)\n\nGenerates a ZINC_ID,SMILE_STR csv file from a input file of ZINC IDs. The input file should consist of 1 columns with 1 ZINC ID per row. [ZINC12](http://zinc.docking.org) is used as the default backend, and via an additional commandline argument `zinc15`, the [ZINC15](http://zinc15.docking.org) database can be used instead.\n\n**Usage:**  \n`[shell]\u003e\u003e python3 gen_zincid_smile_csv.py in.csv out.csv [zinc12/zinc15]`\n\n**Example:**  \n`[shell]\u003e\u003e python3 gen_zincid_smile_csv.py ../examples/zinc_ids.csv ../examples/zid_smiles.csv zinc15`\n\n**Screen Output:**\n\n\u003cpre\u003eDownloading SMILES\n0%                          100%\n[##########                    ] | ETA[sec]: 106.525 \u003c/pre\u003e\n\n**Input example file format:**  \n![](https://raw.github.com/rasbt/smilite/master/images/zinc_ids.png)  \n[zinc_ids.csv](https://raw.github.com/rasbt/smilite/master/examples/zinc_ids.csv)\n\n**Output example file format:**  \n![](https://raw.github.com/rasbt/smilite/master/images/zid_smiles.png)  \n[zid_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles.csv)\n\n\u003ca name=\"comp_smile\"\u003e\u003c/a\u003e\n\n### comp_smile_strings.py (checking for duplicates within 1 file)\n\nCompares SMILES strings within a 2 column CSV file (ZINC_ID,SMILE_string) to identify duplicates. Generates a new CSV file with ZINC IDs of identified duplicates listed in a 3rd-nth column(s).\n\n**Usage:**  \n`[shell]\u003e\u003e python3 comp_smile_strings.py in.csv out.csv [simplify]`\n\n**Example 1:**  \n`[shell]\u003e\u003e python3 comp_smile_strings.py ../examples/zinc_smiles.csv ../examples/comp_smiles.csv`\n\n**Input example file format:**  \n![](https://raw.github.com/rasbt/smilite/master/images/zid_smiles.png)  \n[zid_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles.csv)\n\n**Output example file format 1:**  \n![](https://raw.github.com/rasbt/smilite/master/images/comp_smiles.png)  \n[comp_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_smiles.csv)\n\nWhere  \n- 1st column: ZINC ID  \n- 2nd column: SMILES string  \n- 3rd column: number of duplicates  \n- 4th-nth column: ZINC IDs of duplicates\n\n**Example 2:**  \n`[shell]\u003e\u003e python3 comp_smile_strings.py ../examples/zid_smiles.csv ../examples/comp_simple_smiles.csv simplify`\n\n**Output example file format 2:** ![](https://raw.github.com/rasbt/smilite/master/images/comp_simple_smiles.png)  \n[comp_simple_smiles.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_simple_smiles.csv)\n\n\u003ca name=\"comp_2_smile\"\u003e\u003c/a\u003e\n\n### comp_2_smile_files.py (checking for duplicates across 2 files)\n\nCompares SMILES strings between 2 input CSV files, where each file consists of rows with 2 columns ZINC_ID,SMILE_string to identify duplicate SMILES string across both files.  \nGenerates a new CSV file with ZINC IDs of identified duplicates listed in a 3rd-nth column(s).\n\n**Usage:**  \n`[shell]\u003e\u003e python3 comp_2_smile_files.py in1.csv in2.csv out.csv [simplify]`\n\n**Example:**  \n`[shell]\u003e\u003e python3 comp_2_smile_files.py ../examples/zid_smiles2.csv ../examples/zid_smiles3.csv ../examples/comp_2_files.csv`\n\n**Input example file 1:**  \n![](https://raw.github.com/rasbt/smilite/master/images/zid_smiles2.png)  \n[zid_smiles2.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles2.csv)\n\n**Input example file 2:**  \n![](https://raw.github.com/rasbt/smilite/master/images/zid_smiles3.png)  \n[zid_smiles3.csv](https://raw.github.com/rasbt/smilite/master/examples/zid_smiles3.csv)\n\n**Output example file format:**  \n![](https://raw.github.com/rasbt/smilite/master/images/comp_2_files.png)  \n[comp_2_files.csv](https://raw.github.com/rasbt/smilite/master/examples/comp_2_files.csv)\n\nWhere:  \n- 1st column: name of the origin file  \n- 2nd column: ZINC ID  \n- 3rd column: SMILES string  \n- 4th-nth column: ZINC IDs of duplicates\n\n\u003ca name=\"sqlite_scripts\"\u003e\u003c/a\u003e\n\n# SQLite file command line scripts\n\nIf you downloaded the smilite package from [https://pypi.python.org/pypi/smilite](https://pypi.python.org/pypi/smilite) or [https://github.com/rasbt/smilite](https://github.com/rasbt/smilite), you can use the command line scripts I provide in the `scripts/sqlite_scripts` dir.\n\n\u003ca name=\"lookup1id\"\u003e\u003c/a\u003e\n\n### lookup_single_id.py\n\nRetrieves the SMILES string and simplified SMILES string for a given ZINC ID  \nfrom a previously built smilite SQLite database or from the online ZINC database.\n\n**Usage:**  \n`[shell]\u003e\u003e python3 lookup_single_id.py ZINC_ID [sqlite_file]`  \n\n**Example1 (retrieve data from a smilite SQLite database):**  \n`[shell]\u003e\u003e python3 lookup_single_id.py ZINC01234567 ~/Desktop/smilite_db.sqlite`  \n\n**Example2 (retrieve data from the ZINC online database):**  \n`[shell]\u003e\u003e python3 lookup_single_id.py ZINC01234567`  \n\n**Output example:**\n\n\u003cpre\u003eZINC01234567\nC[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O\nCC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O\n\u003c/pre\u003e\n\nWhere  \n- 1st row: ZINC ID  \n- 2nd row: SMILES string  \n- 3rd row: simplified SMILES string\n\n\u003ca name=\"lookupsmile\"\u003e\u003c/a\u003e\n\n### lookup_smile.py\n\nRetrieves the ZINC ID(s) for a given SMILES string or simplified SMILES string from a previously built smilite SQLite database.\n\n**Usage:**  \n`[shell]\u003e\u003e python3 lookup_smile.py sqlite_file SMILE_STRING [simplify]`  \n\n**Example1 (search for SMILES string):**  \n`[shell]\u003e\u003e python3 lookup_smile.py ~/Desktop/smilite.sqlite \"C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O\"`  \n\n**Example2 (search for simplified SMILES string):**  \n`[shell]\u003e\u003e python3 lookup_smile.py ~/Desktop/smilite.sqlite \"CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O\" simple`  \n\n**Output example:**\n\n\u003cpre\u003eZINC01234567\nC[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O\nCC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O\n\u003c/pre\u003e\n\nWhere  \n- 1st row: ZINC ID  \n- 2nd row: SMILES string  \n- 3rd row: simplified SMILES string\n\n\u003ca name=\"add_to_sqlite\"\u003e\u003c/a\u003e\n\n### add_to_sqlite.py\n\nReads ZINC IDs from a CSV file and looks up SMILES strings and simplified SMILES strings from the ZINC online database. Writes those SMILES strings to a smilite SQLite database. A new database will be created if it doesn't exist, yet.\n\n**Usage:**  \n`[shell]\u003e\u003e python3 add_to_sqlite.py sqlite_file csv_file`  \n\n**Example:**  \n`[shell]\u003e\u003e python3 add_to_sqlite.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_ids.csv`  \n\n**Input CSV file example format:**\n\n\u003cpre\u003eZINC01234567\nZINC01234568\n...\n\u003c/pre\u003e\n\nAn example of the smilite SQLite database contents after successful insertion is shown in the image below. ![https://raw.github.com/rasbt/smilite/master/images/add_to_sqlite_1.png](https://raw.github.com/rasbt/smilite/master/images/add_to_sqlite_1.png)\n\n\u003ca name=\"sqlite_to_csv\"\u003e\u003c/a\u003e\n\n### sqlite_to_csv.py\n\nWrites contents of an SQLite smilite database to a CSV file.\n\n**Usage:**  \n`[shell]\u003e\u003e python3 sqlite_to_csv.py sqlite_file csv_file`  \n\n**Example:**  \n`[shell]\u003e\u003e python3 sqlite_to_csv.py ~/Desktop/smilite.sqlite ~/Desktop/zinc_smiles.csv`  \n\n**Input CSV file example format:**\n\n\u003cpre\u003eZINC_ID,SMILE,SIMPLE_SMILE\nZINC01234568,C[C@@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O\nZINC01234567,C[C@H]1CCCC[NH+]1CC#CC(c2ccccc2)(c3ccccc3)O,CC1CCCCN1CCCC(C2CCCCC2)(C3CCCCC3)O\n\u003c/pre\u003e\n\nAn example of the CSV file contents opened in an spreadsheet program is shown in the image below. ![https://raw.github.com/rasbt/smilite/master/images/sqlite_to_csv_2.png](https://raw.github.com/rasbt/smilite/master/images/sqlite_to_csv_2.png)\n\n\n\n\u003ca name=\"changelog\"\u003e\u003c/a\u003e\n\n# Changelog\n\n**VERSION 2.3.1 (07/25/2020)**\n\n- Fix bug to allow `zinc15` option in gen_zincid_smile_csv.py script \n\n**VERSION 2.3.0 (06/10/2020)**\n\n- Fixes ZINC URL in `lookup_smile_str.py`\n- Adds an optional command line parameter (with arguments `zinc15` or `zinc12`) for `lookup_smile_str.py`\n\n**VERSION 2.2.0**\n\n*   Provides an optional command line argument (zinc15) to use ZINC15 as a backend for downloading SMILES\n\n**VERSION 2.1.0**\n\n*   Functions and scripts to fetch ZINC IDs corresponding to a SMILES string query\n\n**VERSION 2.0.1**\n\n*   Progress bar for add_to_sqlite.py\n\n**VERSION 2.0.0**\n\n*   added SQLite features\n\n**VERSION 1.3.0**\n\n*   added script and module function to compare SMILES strings across 2 files.\n\n**VERSION 1.2.0**\n\n*   added Python 2.x support\n\n**VERSION 1.1.1**\n\n*   PyPrind dependency fix\n\n**VERSION 1.1.0**\n\n*   added a progress bar (PyPrind) to `generate_zincid_smile_csv()` function","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frasbt%2Fsmilite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frasbt%2Fsmilite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frasbt%2Fsmilite/lists"}