{"id":21678158,"url":"https://github.com/pnnl/cactus","last_synced_at":"2025-04-12T05:14:08.897Z","repository":{"id":238651036,"uuid":"794341052","full_name":"pnnl/cactus","owner":"pnnl","description":"LLM Agent that leverages cheminformatics tools to provide informed responses.","archived":false,"fork":false,"pushed_at":"2024-10-19T04:18:00.000Z","size":19808,"stargazers_count":34,"open_issues_count":2,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-12T05:14:00.575Z","etag":null,"topics":["cheminformatics","chemistry","foundation-models","llm","llm-agent","nlp","science"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pnnl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-30T23:52:57.000Z","updated_at":"2025-04-07T22:26:22.000Z","dependencies_parsed_at":"2024-07-24T22:09:58.237Z","dependency_job_id":null,"html_url":"https://github.com/pnnl/cactus","commit_stats":null,"previous_names":["pnnl/cactus"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fcactus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fcactus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fcactus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pnnl%2Fcactus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pnnl","download_url":"https://codeload.github.com/pnnl/cactus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248519556,"owners_count":21117761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cheminformatics","chemistry","foundation-models","llm","llm-agent","nlp","science"],"created_at":"2024-11-25T14:26:43.773Z","updated_at":"2025-04-12T05:14:08.874Z","avatar_url":"https://github.com/pnnl.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CACTUS 🌵 | Chemistry Agent Connecting Tool Usage to Science\n\n[![arXiv](https://img.shields.io/badge/arXiv-2405.00972-b31b1b.svg)](https://arxiv.org/abs/2405.00972)\n[![License](https://img.shields.io/badge/License-BSD_2--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Rye](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/rye/main/artwork/badge.json\n)](https://rye.astral.sh/)\n\n\n[![Spaces](https://img.shields.io/badge/Open_in_HF_Spaces-yellow?style=for-the-badge\u0026logo=huggingface\u0026logoColor=black)](https://huggingface.co/spaces/PNNL/cactus-demo)\n\n\n# Introduction \n\nCACTUS is an innovative tool-augmented language model designed to assist researchers and chemists in various chemistry-related tasks. By integrating state-of-the-art language models with a suite of powerful cheminformatics tools, CACTUS provides an intelligent and efficient solution for exploring chemical space, predicting molecular properties, and accelerating drug discovery. Just as the cactus thrives in the harsh desert environment, adapting to limited resources and extreme conditions, CACTUS has been implemented by Pacific Northwest National Laboratory (PNNL) Scientists to navigate the complex landscape of chemical data and extract valuable insights.\n\n\u003cimg width=\"1000\" alt=\"Cactus_header\" src=\"assets/workflow_diagram_V2_white_bkg.png\"\u003e \n\n# Preprint Available [here](https://arxiv.org/abs/2405.00972)\n\n# Demo (API-only) on HuggingFace Spaces [here](https://huggingface.co/spaces/PNNL/cactus-demo)\n\n## Running Cactus 🏃\n\nGetting started with Cactus is as simple as:\n\n```python\nfrom cactus.agent import Cactus\n\nModel = Cactus(model_name=\"google/gemma7b\", model_type=\"vllm\")\nModel.run(\"What is the molecular weight of the smiles: OCC1OC(O)C(C(C1O)O)O\")\n```\n\n## Installation 💻\n\nTo install `cactus`:\n\n```bash\npip install git+https://github.com/pnnl/cactus.git\n```\n\nThe default `PyTorch` version is compiled for `cuda` 12.1 (or cpu for non-cuda systems). If you want to install for an older version of `cuda`, you should install from source and edit the `pyproject.toml` file at the `[[tool.rye.sources]]` section before installing. But be aware `vllm` may not work properly for older versions of `PyTorch`.\n\nNote: `cactus` currently only supports Python versions `3.10`-`3.12`. Ensure you are using one of these versions before installation.\n\nAlternatively for development, you can install in an editable configuration using:\n\n```bash\ngit clone https://github.com/pnnl/cactus.git\ncd cactus\npython -m pip install -e .\n```\n\nor install using `rye` by running:\n\n```bash\ngit clone https://github.com/pnnl/cactus.git\ncd cactus\nrye sync\n```\n\n## Benchmarking 📊\n\nWe provide scripts for generating lists of benchmarking questions to evaluate the performance of the CACTUS agent.\n\nThese scripts are located in the `benchmark` directory.\n\nTo build the dataset used in the paper, we can run:\n\n```bash\npython benchmark_creation.py\n```\n\nThis will generate a readable dataset named `QuestionsChem.csv` for use with the `Cactus` agent.\n\n## Models Tested\n\nFor this application we are benchmarking the following models:\n\n| Model        | model_name                         |\n|--------------|------------------------------------|\n| `llama2-7b`  | `meta-llama/Llama-2-7b-hf`         |\n| `llama3-8b`  | `meta-llama/Meta-Llama-3-8B`       |\n| `mistral-7b` | `mistralai/Mistral-7B-v0.1`        |\n| `gemma-7b`   | `google/gemma-7b-it`               |\n| `falcon-7b`  | `tiiuae/falcon-7b`                 |\n| `MPT-7b`     | `mosaicml/mpt-7b`                  |\n| `Phi-2`      | `microsoft/phi-2`                  |\n| `Phi-3`      | `microsoft/Phi-3-mini-4k-instruct` |\n| `OLMo-1b`    | `allenai/OLMo-1B`                  |\n\nThese models were selected based on their strong performance in natural language tasks and their potential for adaptation to domain-specific applications.\n\n## Tools Available\n\nFor the initial release, we have simple cheminformatics tools available:\n| Tool Name                 | Tool Usage                                           |\n|---------------------------|------------------------------------------------------|\n| `calculate_molwt`         | Calculate Molecular weight                           |\n| `calculate_logp`          | Calculate the Partition Coefficient                  |\n| `calculate_tpsa`          | Calculate the Topological Polar Surface Area         |\n| `calculate_qed`           | Calculate the Qualitative Estimate of Drug-likeness  |\n| `calculate_sa`            | Calculate the Synthetic Accessibility                |\n| `calculate_bbb_permeant`  | Calculate Blood Brain Barrier Permeance              |\n| `calculate_gi_absorption` | Calculate the Gastrointestinal Absorption            |\n| `calculate_druglikeness`  | Calculate druglikeness based on Lipinski's Rule of 5 |\n| `brenk_filter`            | Calculate if molecule passes the Brenk Filter        |\n| `pains_filter`            | Calculate if molecule passes the PAINS Filter        |\n\n⚠️ Notice: These tools currently expect a SMILES as input, tools for conversion between identifiers are available but not yet working as intended. Fix to come soon.\n\n## Future Directions\n\nWe are continuously working on improving CACTUS and expanding its capabilities for molecular discovery. Some of our planned features include:\n\n    🧬 Integration with physics-based models for 3D structure prediction and analysis\n    🔧 Support for advanced machine learning techniques (e.g., graph neural networks)\n    🎯 Enhanced tools for target identification and virtual screening    \n    📜 Improved interpretability and explainability of the model's reasoning process\n\nWe welcome contributions from the community and are excited to collaborate with researchers and developers to further advance the field of AI-driven drug discovery.\n\n## Citation \n\nIf you use CACTUS in your research, please cite our preprint: \n```bibtex\n@article{mcnaughton2024cactus,\n    title={CACTUS: Chemistry Agent Connecting Tool-Usage to Science},\n    author={Andrew D. McNaughton and Gautham Ramalaxmi and Agustin Kruel and Carter R. Knutson and Rohith A. Varikoti and Neeraj Kumar},\n    year={2024},\n    eprint={2405.00972},\n    archivePrefix={arXiv},\n    primaryClass={cs.CL}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpnnl%2Fcactus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpnnl%2Fcactus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpnnl%2Fcactus/lists"}