https://github.com/HzaCode/ChemInformant
⚗️ An all-in-one solution for chemical property retrieval from PubChem.
https://github.com/HzaCode/ChemInformant
api-client batch-processing caching cas cheminformatics chemistry cli compound-search data-validation dataframe drug-discovery iupac molecular-descriptors molecular-weight pandas pubchem python rest-api smiles sql
Last synced: about 2 months ago
JSON representation
⚗️ An all-in-one solution for chemical property retrieval from PubChem.
- Host: GitHub
- URL: https://github.com/HzaCode/ChemInformant
- Owner: HzaCode
- License: mit
- Created: 2024-01-21T12:03:01.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-10-15T03:33:37.000Z (3 months ago)
- Last Synced: 2025-10-26T17:14:05.240Z (3 months ago)
- Topics: api-client, batch-processing, caching, cas, cheminformatics, chemistry, cli, compound-search, data-validation, dataframe, drug-discovery, iupac, molecular-descriptors, molecular-weight, pandas, pubchem, python, rest-api, smiles, sql
- Language: Python
- Homepage: https://hezhiang.com/cheminformant_real.html
- Size: 12.1 MB
- Stars: 7
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Zenodo: .zenodo.json
Awesome Lists containing this project
- awesome-python-chemistry - ChemInformant - High-throughput PubChem client for batch queries with caching, validation, rate-limit-aware retries, and a simple CLI. (Database Wrappers / Force Fields)
README

# ChemInformant
*A Robust Data Acquisition Engine for the Modern Scientific Workflow*
[](https://pepy.tech/project/cheminformant)
---
**ChemInformant** is a robust data acquisition engine for the [PubChem](https://pubchem.ncbi.nlm.nih.gov/) database, engineered for the modern scientific workflow. It intelligently manages network requests, performs rigorous runtime data validation, and delivers analysis-ready results, providing a dependable foundation for any computational chemistry project in Python.
---
### ✨ Key Features
* **Analysis-Ready Pandas/SQL Output:** The core API (`get_properties`) returns either a clean Pandas DataFrame or a direct SQL output, eliminating data wrangling boilerplate and enabling immediate integration with both the Python data science ecosystem and modern database workflows.
* **Automated Network Reliability:** Ensures your workflows run flawlessly with built-in persistent caching, smart rate-limiting, and automatic retries. It also transparently handles API pagination (`ListKey`) for large-scale queries, delivering complete result sets without any manual intervention.
* **Flexible & Fault-Tolerant Input:** Natively accepts mixed lists of identifiers (names, CIDs, SMILES) and intelligently handles any invalid inputs by flagging them with a clear status in the output, ensuring a single bad entry never fails an entire batch operation.
* **A Dual API for Simplicity and Power:** Offers a clear `get_()` convenience layer for quick lookups, backed by a powerful `get_properties` engine for high-performance batch operations.
* **Guaranteed Data Integrity:** Employs Pydantic v2 models for rigorous, runtime data validation when using the object-based API, preventing malformed or unexpected data from corrupting your analysis pipeline.
* **Terminal-Ready CLI Tools:** Includes `chemfetch` and `chemdraw` for rapid data retrieval and 2D structure visualization directly from your terminal, perfect for quick lookups without writing a script.
* **Modern and Actively Maintained:** Built on a contemporary tech stack for long-term consistency and compatibility, providing a reliable alternative to older or less frequently updated libraries.
---
### 📦 Installation
Install the library from PyPI:
```bash
pip install ChemInformant
```
To include plotting capabilities for use with the tutorial, install the `[plot]` extra:
```bash
pip install "ChemInformant[plot]"
```
---
### 🚀 Quick Start
Retrieve multiple properties for multiple compounds, directly into a Pandas DataFrame, in a single function call:
```python
import ChemInformant as ci
# 1. Define your identifiers
identifiers = ["aspirin", "caffeine", 1983] # 1983 is paracetamol's CID
# 2. Specify the properties you need
properties = ["molecular_weight", "xlogp", "cas"]
# 3. Call the core function
df = ci.get_properties(identifiers, properties)
# 4. Save the results to an SQL database
ci.df_to_sql(df, "sqlite:///chem_data.db", "results", if_exists="replace")
# 5. Analyze your results!
print(df)
```
**Output:**
```
input_identifier cid status molecular_weight xlogp cas
0 aspirin 2244 OK 180.16 1.2 50-78-2
1 caffeine 2519 OK 194.19 -0.1 58-08-2
2 1983 1983 OK 151.16 0.5 103-90-2
```
➡️ Click to see Convenience API Cheatsheet
| Function | Description |
| -------------------------- | ------------------------------------------------------------- |
| `get_weight(id)` | Molecular weight *(float)* |
| `get_formula(id)` | Molecular formula *(str)* |
| `get_cas(id)` | CAS Registry Number *(str)* |
| `get_iupac_name(id)` | IUPAC name *(str)* |
| `get_canonical_smiles(id)` | Canonical SMILES with Canonical→Connectivity fallback *(str)* |
| `get_isomeric_smiles(id)` | Isomeric SMILES with Isomeric→SMILES fallback *(str)* |
| `get_xlogp(id)` | XLogP (calculated hydrophobicity) *(float)* |
| `get_synonyms(id)` | List of synonyms *(List\[str])* |
| `get_compound(id)` | Full, validated **`Compound`** object (Pydantic v2 model) |
*Note: This table shows key convenience functions for demonstration. ChemInformant provides **22 convenience functions** in total, covering molecular descriptors, mass properties, stereochemistry, and more.*
*All functions accept a **CID, name, or SMILES** and return `None`/`[]` on failure.*
ChemInformant also includes handy command-line tools for quick lookups directly from your terminal:
* **`chemfetch`**: Fetches properties for one or more compounds.
```bash
chemfetch aspirin --props "cas,molecular_weight,iupac_name"
```
* **`chemdraw`**: Renders the 2D structure of a compound.
```bash
chemdraw aspirin
```
---
### 📚 Documentation & Examples
For a deep dive, please see our detailed guides:
* **➡️ Online Documentation:** The **[official documentation site](https://hezhiang.com/ChemInformant)** contains complete API references, guides, and usage examples. **This is the most comprehensive resource.**
* **➡️ Interactive User Manual:** Our [**Jupyter Notebook Tutorial**](examples/ChemInformant_User_Manual_v1.0.ipynb) provides a complete, end-to-end walkthrough. This is the best place to start for a hands-on experience.
* **➡️ Performance Benchmarks:** Run integrated benchmarks with `pytest tests/test_benchmarks.py --benchmark-only` to see the performance advantages of batching and caching.
#### 📖 Additional Resources & Use Cases
* **[Basic Usage Guide](https://hezhiang.com/ChemInformant/basic_usage.html)** - Quick start examples for common tasks
* **[Advanced Usage Guide](https://hezhiang.com/ChemInformant/advanced_usage.html)** - Complex workflows and batch processing
* **[Caching Guide](https://hezhiang.com/ChemInformant/caching_guide.html)** - Optimize performance with intelligent caching
* **[CLI Tools Documentation](https://hezhiang.com/ChemInformant/cli.html)** - Complete reference for `chemfetch` and `chemdraw`
* **[API Reference](https://hezhiang.com/ChemInformant/api/cheminfo_api.html)** - Full function documentation with examples
---
### 🤔 Why ChemInformant?
> ChemInformant's core mission is to serve as a high-performance data backbone for the Python cheminformatics ecosystem. As a software package that has undergone rigorous peer review by both the [Journal of Open Source Software (JOSS)](https://doi.org/10.21105/joss.08341) and [pyOpenSci](https://github.com/pyOpenSci/software-submission/issues/254), it delivers clean, validated, and analysis-ready Pandas DataFrames. This enables researchers to effortlessly pipe PubChem data into powerful toolkits like RDKit, Scikit-learn, or custom machine learning models, transforming multi-step data acquisition and wrangling tasks into single, elegant lines of code.
>
> A detailed comparison with other existing tools is provided in our [JOSS paper](https://github.com/HzaCode/ChemInformant/blob/main/paper/paper.md). For the story and the "why" behind the code, we've shared our thoughts in a post on the [official pyOpenSci website](https://www.pyopensci.org/).
### 🤝 Contributing
Contributions are welcome! For guidelines on how to get started, please read our [contributing guide](https://github.com/HzaCode/ChemInformant/blob/main/CONTRIBUTING.md). You can [open an issue](https://github.com/HzaCode/ChemInformant/issues) to report bugs or suggest features, or [submit a pull request](https://github.com/HzaCode/ChemInformant/pulls) to contribute code.
### 📄 License
This project is licensed under the MIT License - see the [LICENSE](LICENSE.md) file for details.
### 📑 Citation
```bibtex
@article{He2025,
doi = {10.21105/joss.08341},
url = {https://doi.org/10.21105/joss.08341},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {112},
pages = {8341},
author = {He, Zhiang},
title = {ChemInformant: A Robust and Workflow-Centric Python Client for High-Throughput PubChem Access},
journal = {Journal of Open Source Software}
}
```