{"id":13699040,"url":"https://github.com/lamalab-org/chemlift","last_synced_at":"2025-10-06T17:04:22.985Z","repository":{"id":198165420,"uuid":"664518883","full_name":"lamalab-org/chemlift","owner":"lamalab-org","description":"Language-interfaced fine-tuning for chemistry ","archived":false,"fork":false,"pushed_at":"2023-11-30T10:47:50.000Z","size":235,"stargazers_count":45,"open_issues_count":11,"forks_count":8,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-06T17:04:03.333Z","etag":null,"topics":["chemistry","few-shot-learning","fine-tuning","hacktoberfest","llm","materials"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lamalab-org.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-07-10T06:54:07.000Z","updated_at":"2025-09-28T13:22:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"e9ecf592-6435-4a66-8414-2a4c34e09834","html_url":"https://github.com/lamalab-org/chemlift","commit_stats":{"total_commits":35,"total_committers":1,"mean_commits":35.0,"dds":0.0,"last_synced_commit":"ac6754fb4b8ad222b56b9413b2bee69ff50eb7a0"},"previous_names":["lamalab-org/chemlift"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/lamalab-org/chemlift","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamalab-org%2Fchemlift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamalab-org%2Fchemlift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamalab-org%2Fchemlift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamalab-org%2Fchemlift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lamalab-org","download_url":"https://codeload.github.com/lamalab-org/chemlift/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lamalab-org%2Fchemlift/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278646783,"owners_count":26021512,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-06T02:00:05.630Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemistry","few-shot-learning","fine-tuning","hacktoberfest","llm","materials"],"created_at":"2024-08-02T19:00:56.669Z","updated_at":"2025-10-06T17:04:22.955Z","avatar_url":"https://github.com/lamalab-org.png","language":"Jupyter Notebook","funding_links":[],"categories":["Language Models"],"sub_categories":[],"readme":"\u003c!--\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/lamalab-org/chemlift/raw/main/docs/source/logo.png\" height=\"150\"\u003e\n\u003c/p\u003e\n--\u003e\n\n\u003ch1 align=\"center\"\u003e\n  chemlift\n\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/lamalab-org/chemlift/actions/workflows/tests.yml\"\u003e\n        \u003cimg alt=\"Tests\" src=\"https://github.com/lamalab-org/chemlift/workflows/Tests/badge.svg\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/chemlift\"\u003e\n        \u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/chemlift\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/chemlift\"\u003e\n        \u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/chemlift\" /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/lamalab-org/chemlift/blob/main/LICENSE\"\u003e\n        \u003cimg alt=\"PyPI - License\" src=\"https://img.shields.io/pypi/l/chemlift\" /\u003e\n    \u003c/a\u003e\n    \u003ca href='https://chemlift.readthedocs.io/en/latest/?badge=latest'\u003e\n        \u003cimg src='https://readthedocs.org/projects/chemlift/badge/?version=latest' alt='Documentation Status' /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://codecov.io/gh/lamalab-org/chemlift/branch/main\"\u003e\n        \u003cimg src=\"https://codecov.io/gh/lamalab-org/chemlift/branch/main/graph/badge.svg\" alt=\"Codecov status\" /\u003e\n    \u003c/a\u003e  \n    \u003ca href=\"https://github.com/cthoyt/cookiecutter-python-package\"\u003e\n        \u003cimg alt=\"Cookiecutter template from @cthoyt\" src=\"https://img.shields.io/badge/Cookiecutter-snekpack-blue\" /\u003e \n    \u003c/a\u003e\n    \u003ca href='https://github.com/psf/black'\u003e\n        \u003cimg src='https://img.shields.io/badge/code%20style-black-000000.svg' alt='Code style: black' /\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/lamalab-org/chemlift/blob/main/.github/CODE_OF_CONDUCT.md\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/Contributor%20Covenant-2.1-4baaaa.svg\" alt=\"Contributor Covenant\"/\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nChemical language interfaced predictions using large language models.\n\n## 💪 Getting Started\n\nWith ChemLIFT you can use large language models to make predictions on chemical data. \nYou can use two different approaches:\n\n- **Few-shot learning**: Provide a few examples in the prompt along with the points you want to predict and the model will learn to predict the property of interest.\n- **Fine-tuning**: Fine-tune a large language model on a dataset of your choice and use it to make predictions. \n\nFine-tuning updates the weights of the model, while few-shot learning does not.\n\n### Few-shot learning\n\n```python\nfrom chemlift.icl.fewshotclassifier import FewShotClassifier\nfrom langchain.llms import OpenAI\n\nllm = OpenAI()\nfsc = FewShotClassifier(llm, property_name='bandgap')\n\n# Train on a few examples\nfsc.fit(['ethane', 'propane', 'butane'], [0,1,0])\n\n# Predict on a few more\nfsc.predict(['pentane', 'hexane', 'heptane'])\n```\n\n### Fine-tuning\n\n```python\n\nfrom chemlift.finetuning.classifier import ChemLIFTClassifierFactory\n\nmodel = ChemLIFTClassifierFactory('property name',\n                                    model_name='EleutherAI/pythia-1b-deduped').create_model()\nmodel.fit(X, y)\nmodel.predict(X)\n```\n\n## 🚀 Installation\n\n\u003c!-- Uncomment this section after your first ``tox -e finish``\nThe most recent release can be installed from\n[PyPI](https://pypi.org/project/chemlift/) with:\n\n```shell\n$ pip install chemlift\n```\n--\u003e\n\nThe most recent code and data can be installed directly from GitHub with:\n\n```bash\n$ pip install git+https://github.com/lamalab-org/chemlift.git\n```\n\n## 👐 Contributing\n\nContributions, whether filing an issue, making a pull request, or forking, are appreciated. See\n[CONTRIBUTING.md](https://github.com/lamalab-org/chemlift/blob/master/.github/CONTRIBUTING.md) for more information on getting involved.\n\n## 👋 Attribution\n\n### ⚖️ License\n\nThe code in this package is licensed under the MIT License.\n\n\n### 📖 Citation\n\nCitation goes here!\n\n```\n@article{Jablonka_2023,\n    doi = {10.26434/chemrxiv-2023-fw8n4},\n    url = {https://doi.org/10.26434%2Fchemrxiv-2023-fw8n4},\n    year = 2023,\n    month = {feb},\n    publisher = {American Chemical Society ({ACS})},\n    author = {Kevin Maik Jablonka and Philippe Schwaller and Andres Ortega-Guerrero and Berend Smit},\n    title = {Is {GPT}-3 all you need for low-data discovery in chemistry?}\n}\n```\n\n\n\n### 🎁 Support\nThe work of the LAMALab is supported by the Carl-Zeiss foundation. \n\nIn addition, the work was supported by the MARVEL National Centre for Competence in Research funded by the Swiss National Science Foundation (grant agreement ID 51NF40-182892). In addition, we acknoweledge support by the USorb-DAC Project, which is funded by a grant from The Grantham Foundation for the Protection of the Environment to RMI’s climate tech accelerator program, Third Derivative. \n\n\n\n\u003c!--\n### 💰 Funding\n\nThis project has been supported by the following grants:\n\n| Funding Body                                             | Program                                                                                                                       | Grant           |\n|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|-----------------|\n| DARPA                                                    | [Automating Scientific Knowledge Extraction (ASKE)](https://www.darpa.mil/program/automating-scientific-knowledge-extraction) | HR00111990009   |\n--\u003e\n\n\n## 🛠️ For Developers\n\n\u003cdetails\u003e\n  \u003csummary\u003eSee developer instructions\u003c/summary\u003e\n\nThe final section of the README is for if you want to get involved by making a code contribution.\n\n### Development Installation\n\nTo install in development mode, use the following:\n\n```bash\n$ git clone git+https://github.com/lamalab-org/chemlift.git\n$ cd chemlift\n$ pip install -e .\n```\n\n### 🥼 Testing\n\nAfter cloning the repository and installing `tox` with `pip install tox`, the unit tests in the `tests/` folder can be\nrun reproducibly with:\n\n```shell\n$ tox\n```\n\nAdditionally, these tests are automatically re-run with each commit in a [GitHub Action](https://github.com/lamalab-org/chemlift/actions?query=workflow%3ATests).\n\n### 📖 Building the Documentation\n\nThe documentation can be built locally using the following:\n\n```shell\n$ git clone git+https://github.com/lamalab-org/chemlift.git\n$ cd chemlift\n$ tox -e docs\n$ open docs/build/html/index.html\n``` \n\nThe documentation automatically installs the package as well as the `docs`\nextra specified in the [`setup.cfg`](setup.cfg). `sphinx` plugins\nlike `texext` can be added there. Additionally, they need to be added to the\n`extensions` list in [`docs/source/conf.py`](docs/source/conf.py).\n\n### 📦 Making a Release\n\nAfter installing the package in development mode and installing\n`tox` with `pip install tox`, the commands for making a new release are contained within the `finish` environment\nin `tox.ini`. Run the following from the shell:\n\n```shell\n$ tox -e finish\n```\n\nThis script does the following:\n\n1. Uses [Bump2Version](https://github.com/c4urself/bump2version) to switch the version number in the `setup.cfg`,\n   `src/chemlift/version.py`, and [`docs/source/conf.py`](docs/source/conf.py) to not have the `-dev` suffix\n2. Packages the code in both a tar archive and a wheel using [`build`](https://github.com/pypa/build)\n3. Uploads to PyPI using [`twine`](https://github.com/pypa/twine). Be sure to have a `.pypirc` file configured to avoid the need for manual input at this\n   step\n4. Push to GitHub. You'll need to make a release going with the commit where the version was bumped.\n5. Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can\n   use `tox -e bumpversion -- minor` after.\n\u003c/details\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flamalab-org%2Fchemlift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flamalab-org%2Fchemlift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flamalab-org%2Fchemlift/lists"}