https://github.com/lilyminium/qca-datasets-report
https://github.com/lilyminium/qca-datasets-report
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/lilyminium/qca-datasets-report
- Owner: lilyminium
- License: mit
- Created: 2024-07-09T04:02:57.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-07-25T16:34:20.000Z (11 months ago)
- Last Synced: 2025-02-01T05:17:51.432Z (5 months ago)
- Language: Python
- Size: 31.9 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# qca-datasets-report
**Note: this repo is under development and support is not guaranteed.**
This repo lets you search the corpus of
[qca-dataset-submission](https://github.com/openforcefield/qca-dataset-submission)
for existing data.## Searching
In a [GitHub discussion](https://github.com/lilyminium/qca-datasets-report/discussions/categories/search-qca),
make either a **new Discussion** or a comment in an existing discussion. If a new Discussion, it must be
labelled with `botsearch` to trigger the search.The command *must* include a valid SMARTS `--pattern`.
```
botsearch --pattern '[#15:1]-[#16:2]'
```Optionally, limit the specifications searched (note: only the 'default' specification has been download for now.)
```
botsearch --pattern '[#15:1]-[#16:2]' --spec 'default'
```Optionally, limit the datasets searched (multiple dataset supported).
```
botsearch --pattern '[#15:1]-[#16:2]' --dataset 'SMIRNOFF Coverage Set 1' --dataset 'OpenFF Optimization Set 1'
```Optionally, limit the dataset type searched.
```
botsearch --pattern '[#15:1]-[#16:2]' --type 'optimization'
botsearch --pattern '[#15:1]-[#16:2]' --type 'torsiondrive'
```Optionally, limit to just a subset of data, e.g. the Sage 2.2.0 training set. This uses combinations of QCA IDs put together in the `combinations/` directory.
```
botsearch --pattern '[#15:1]-[#16:2]' --combination 'sage-2.2.0'
```A GitHub Action will get started searching for the molecule.
The record IDs will get saved as an artifact.
If under a certain number of molecules are matched (up to 300), the molecules
will get rendered as images and returned.