{"id":29890032,"url":"https://github.com/prospero1988/logd_predictor","last_synced_at":"2026-05-19T14:10:00.659Z","repository":{"id":306911769,"uuid":"869705201","full_name":"Prospero1988/logD_predictor","owner":"Prospero1988","description":"Prediction of CHI logD from ¹H/¹³C NMR spectra and molecular fingerprints using ML and deep learning.","archived":false,"fork":false,"pushed_at":"2025-07-28T10:19:19.000Z","size":106690,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-28T12:18:14.557Z","etag":null,"topics":["13c-nmr","1h-nmr","cheminformatics","cheminformatics-software","chilogd","deep-learning","drug-discovery","fingerprint","gui","logd","machine-learning","neural-networks","nmr","nmr-data","nmr-spectroscopy","optuna","pytorch","qspr","rdkit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Prospero1988.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-10-08T18:37:57.000Z","updated_at":"2025-07-28T10:19:22.000Z","dependencies_parsed_at":"2025-07-28T12:29:40.350Z","dependency_job_id":null,"html_url":"https://github.com/Prospero1988/logD_predictor","commit_stats":null,"previous_names":["prospero1988/logd_predictor"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Prospero1988/logD_predictor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prospero1988%2FlogD_predictor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prospero1988%2FlogD_predictor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prospero1988%2FlogD_predictor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prospero1988%2FlogD_predictor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Prospero1988","download_url":"https://codeload.github.com/Prospero1988/logD_predictor/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prospero1988%2FlogD_predictor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268126599,"owners_count":24200291,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-31T02:00:08.723Z","response_time":66,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["13c-nmr","1h-nmr","cheminformatics","cheminformatics-software","chilogd","deep-learning","drug-discovery","fingerprint","gui","logd","machine-learning","neural-networks","nmr","nmr-data","nmr-spectroscopy","optuna","pytorch","qspr","rdkit"],"created_at":"2025-07-31T22:23:31.489Z","updated_at":"2025-10-10T11:06:29.775Z","avatar_url":"https://github.com/Prospero1988.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\u003cimg src=\"logD_predictor_bin/img/LOGO.png\", width=\"400px\"/\u003e\u003c/p\u003e\n\n# logD Predictor\n\n**logD Predictor** is a graphical software platform designed for accurate prediction of the **CHI logD** (Chromatographic Hydrophobicity Index) of chemical compounds — a chromatographic surrogate that is **experimentally and statistically equivalent to traditional logD** for comparative and modeling purposes. It leverages machine learning (ML) and deep neural network (DNN) models trained on **¹H and ¹³C NMR spectral representations**, as well as on **RDKit-derived molecular fingerprints**. LogD simulations can be carried out using individual ¹H and ¹³C spectra, or via a hybrid approach (¹H | ¹³C) that fuses these spectra into a single vector representation for prediction — **the hybrid method consistently delivers the best results**.\n\nUnlike traditional cheminformatics tools, **logD Predictor integrates simulated NMR spectra** as compact, information-rich descriptors, providing a physicochemically grounded alternative to conventional fingerprint encodings. The software supports both **single-input** and **ensemble-based predictions**, offering flexibility for exploratory analysis and robust screening workflows.\n\nAll models were trained on datasets of **over 1200 real-world compounds** synthesized in **medicinal chemistry and drug discovery pipelines**, ensuring high applicability to pharmaceutically relevant chemical space. Through extensive hyperparameter optimization using the [Optuna](https://optuna.org/) framework, the best models consistently achieved **RMSE \u003c 0.6** and **Q² \u003e 0.7** across multiple pH conditions — outperforming many traditional QSPR approaches.\n\n**logD Predictor** combines scientific rigor with ease of use: the entire pipeline — from SMILES to logD — is operated via an intuitive GUI, requiring no coding skills from the user. Predictions are visualized, summarized, and exported with just a few clicks, making the tool suitable for both research and production settings in computational chemistry.\n\n---\n\n### 📖 Associated Research \u0026 Citation\n\n**For more detailed information, check the original Open Access research paper:**\n\nLeniak, A.; Pietruś, W.; Kurczab, R. From NMR to AI: Fusing 1H and 13C Representations \nfor Enhanced QSPR Modeling. J Chem Inf Model 2025. \n[https://doi.org/10.1021/acs.jcim.5c01791](https://doi.org/10.1021/acs.jcim.5c01791).\n\n**If you use this software in your research, please cite our publication.**\n\n---\n\n## 📑 Table of Contents\n- [Key Features](#-key-features)\n- [Repository Structure](#-repository-structure)\n- [Installation](#️-installation)\n- [Running the Application](#-running-the-application)\n- [Input File Format](#-input-file-format)\n- [Prediction Options](#-prediction-options-via-gui)\n- [Preview of the Interface](#-preview-of-the-interface)\n- [Examples of Working Program](#-examples-of-working-program)\n- [Related Projects](#-related-projects)\n- [Troubleshooting](#-troubleshooting)\n- [License](#-license)\n\n---\n\n## 💡 Key Features\n\n- Spectral-based prediction using theoretical **¹H and ¹³C NMR** vectors or their **fused variant**\n- Optional prediction from RDKit molecular fingerprints\n- GUI-based interface for input, model selection, and result visualization\n- Multi-model ensemble predictions with averaging across ML/DNN models\n- Full support for Java-based NMR spectrum simulation via NMRshiftDB2\n- **Low prediction error (RMSE \u003c 0.6) and high R² (\u003e 0.7) correlation coefficients across models**\n- Models fully optimized via Optuna-based hyperparameter tuning\n- Automatic chart generation and result export with customizable verbosity\n- **Trained on compounds (more than 1200) synthesized in real-world drug discovery pipelines**\n\n---\n\n## 🗂 Repository Structure\n\n```\nlogD_predictor/\n│\n├── logD_predictor_bin/                 # Core processing and GUI logic\n│   ├── bucket.py                       # Buckets NMR spectra into predefined ranges\n│   ├── csv_checker.py                  # Verifies input CSV structure, format, separators, decimal markers\n│   ├── custom_header.py                # Adds consistent headers for bucketed NMR spectra\n│   ├── concatenator.py                 # Concatenate the vectors from the 1H and 13C single-modal representations into a single fused bimodal vector.\n│   ├── fp_generator.py                 # Generates RDKit molecular fingerprints (e.g. ECFP4)\n│   ├── gen_mols.py                     # Converts SMILES strings to .mol files for NMR prediction\n│   ├── logD_predictor.py               # Main GUI logic handler; manages file I/O and prediction logic\n│   ├── merger.py                       # Merges bucketed ¹H and ¹³C spectra into combined matrix\n│   ├── model_query.py                  # Prediction engine to querry saved models and get logD values\n│   ├── predictor.py                    # Launches Java-based NMR spectrum prediction (via CDK .jar)\n│   ├── CNN_predict.py                  # Predicts using CNN-based neural networks\n│   ├── DNN_predict.py                  # Predicts using MLP-based deep networks\n│   ├── SVR_predict.py                  # Loads and runs SVR models from joblib\n│   ├── XGB_predict.py                  # Loads and runs XGBoost models from joblib\n│   ├── install_modules.py              # Called by INSTALL.pyw to install required Python libraries\n│   ├── install_text.txt                # Text displayed during GUI-based installation\n│   ├── input_example.csv               # Example SMILES input file for testing GUI\n│   └── joblib_models/                  # Directory to hold pre-trained model files (user must supply)\n│\n├── Prediction_Results/                # Automatically generated output folder for logs, plots, and CSVs\n│\n├── INSTALL.pyw                        # GUI-based Python library installer\n├── START.pyw                          # Main launcher for the logD Predictor GUI\n├── conda_environment.yml              # Conda environment definition file (create with `conda env export`)\n├── README.md                          # This documentation file\n├── RUN_LOG_FILE.log                   # Runtime log generated by the application\n```\n\n---\n\n## ⚙️ Installation\n\n### ✅ Option 1: Native Python (Windows 11)\n\n1. Ensure that **Python ≥ 3.12** is installed on your system. You can download the latest version from [https://www.python.org](https://www.python.org).\n2. Double-click `INSTALL.pyw` – it will install all required Python packages using `pip`.\n3. Download \u0026 Install **Java SDK** (tested on version 23). Ensure `java` and `javac` are accessible in your PATH.\n4. Download, Install and add to PATH **Open Babel**.\n5. Download the model archive:  \n   [joblib_models.rar](https://sourceforge.net/projects/logd-predictor/files/joblib_models.rar/download)  \n   - Extract and place the folder `joblib_models/` into `logD_predictor_bin/`.\n\n---\n\n### ✅ Option 2: Conda Environment\n\n1. Use the provided environment file to create your Conda environment:\n   ```bash\n   conda env create -f conda_environment.yml\n   conda activate predictor_logD\n   ```\n2. Download \u0026 Install **Java SDK** (tested on version 23). Ensure `java` and `javac` are accessible in your PATH.\n3. Download the model archive:  \n   [joblib_models.rar](https://sourceforge.net/projects/logd-predictor/files/joblib_models.rar/download)  \n   - Extract and place the folder `joblib_models/` into `logD_predictor_bin/`.\n---\n\n## 🚀 Running the Application\n\n- **With native Python**: Double-click `START.pyw`\n- **With Conda**:\n  Navigate with `cd` command to the directory where `START.pyw` file is located. For example:\n\n  ```bash\n  cd D:\\Git\\logD_predictor\n  ```\n  Make sure you're running the script from the root directory of the project.\n  Then activate conda environment and start LogD Predictor.\n  ```bash\n  conda activate predictor_logD\n  python START.pyw\n  ```\n\n---\n\n## 📄 Input File Format\n\nThe input should be a `.csv` file containing SMILES strings. Use the GUI's **\"Open Input File Example\"** button to see the required format. Example:\n\n```csv\nID;SMILES\nMol01;CC(=O)Oc1ccccc1C(=O)O\nMol02;CCN(CC)CCOC(=O)c1ccc(C#N)cc1\n...\n```\n\n- Columns must be **semicolon-separated (`;`)**.\n- Headers must remain unchanged.\n- The first column is molecule ID; second column is the SMILES string.\n\n---\n\n## 🧪 Prediction Options (via GUI)\n\nAfter launching the graphical interface using `START.pyw`, the following configuration options are available:\n\n### 🧬 Select Representation\nChoose the input data representation used by the predictive models:\n- **Hybrid ¹H|¹³C** - use hybrid bispectral representation for the best prediction results, but it's the slowest method.\n- **Proton (¹H)** – use ¹H NMR spectra. Fast, but less acurate.\n- **Carbon (¹³C)** – use ¹³C NMR spectra. Slower, more acurate.\n- **RDKit FP** – use molecular fingerprints generated from SMILES, benchmarkt for testing and comparision.\n\n### 🧠 Available Models\nSpecify which machine learning models to include in the prediction:\n- **SVR** – Support Vector Regression models\n- **XGBoost** – Gradient boosting tree models\n- **DNN** – Multilayer Perceptrons (MLPs)\n- **CNN** – Convolutional Neural Networks\n- Selecting multiple options enables ensemble prediction and result averaging.\n\n### ⚙️ Execution Options\nFine-tune runtime behavior of the program:\n- **Quiet mode** – suppresses non-essential output messages (enabled by default)\n- **Debug mode** – saves all temporary files, including intermediate MOLs and spectrum predictions\n- **Show models** – prints detailed performance metrics (RMSE, MAE, R²) for each model after execution\n- **Generate charts** – creates visual summaries of predicted logD values with standard deviations\n\nEach option is accompanied by helpful tooltips in the GUI for ease of configuration. After selecting the CSV file and desired settings, simply click **Start Prediction** to begin the analysis.\n\n---\n\n## 🖼 Preview of the Interface\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"logD_predictor_bin/img/IMG/logD_predictor_GUI.png\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n---\n\n## 🖥 Examples of Working Program\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"logD_predictor_bin/img/IMG/logD_predictor_console_1.png\" width=\"600\"/\u003e\u003c/p\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"logD_predictor_bin/img/IMG/1H13C_summary_results_plot.png\" width=\"800\"/\u003e\u003c/p\u003e\n---\n\n## 🔗 Related Projects\n\n- [Demiurge (NMR processing backend)](https://github.com/Prospero1988/Demiurge)\n- [Main NMR-AI Project](https://github.com/Prospero1988/NMR-AI_part3)\n\n---\n\n## 🛠 Troubleshooting\n\nIf you encounter any problems during installation or usage, feel free to contact the author.  \nI will gladly assist with any technical issues related to environment setup, execution, or interpretation of results.\n\n---\n\n## 📜 License\n\nThis project is released under the **MIT License**.  \nAll scripts, models, and GUI tools are provided **free of charge** for academic and non-commercial use.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprospero1988%2Flogd_predictor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprospero1988%2Flogd_predictor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprospero1988%2Flogd_predictor/lists"}