Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ur-whitelab/nmrdata
https://github.com/ur-whitelab/nmrdata
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/ur-whitelab/nmrdata
- Owner: ur-whitelab
- License: mit
- Created: 2020-08-26T20:17:31.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-06-08T19:58:59.000Z (over 1 year ago)
- Last Synced: 2024-10-05T17:46:02.376Z (3 months ago)
- Language: Python
- Size: 497 KB
- Stars: 10
- Watchers: 4
- Forks: 3
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Data for NMR GNN
This contains the parsing scripts and data used for our [GNN chemical shift predictor model](https://github.com/ur-whitelab/nmrgnn).
## Install
```sh
pip install nmrgnn-data
```## Working in Python
Here's an example of how to load and work with data in python. The records
are loaded as a tensorflow dataset ([read more here](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)), but can be used in a for loop as shown below.```py
import nmrdata
dataset = nmrdata.load_records('data/metabolite-records.tfrecord')
for record in dataset:
# get single record
break
print(record.keys())
```
output:
```
dict_keys(['natoms', 'nneigh', 'features', 'nlist', 'positions', 'peaks', 'mask', 'name', 'class', 'index'])
```Access positions as a numpy array
```py
record['positions'].numpy()
```
output:
```
array([[ 0.83740795, 0.09760247, 0.2959486 ],
[-0.562893 , 0.00262405, -0.00434441],
[-1.0725924 , -0.37873718, 0.9061929 ],
[-0.75536764, -0.72710234, -0.8159687 ],
[-1.0367495 , 0.9557108 , -0.27988592],
[ 1.2855262 , -0.8334997 , 0.10487328],
[ 1.3046683 , 0.8834019 , -0.20681578]], dtype=float32)
```
Get chemical shifts
```py
record['peaks'].numpy()
```
```
array([0. , 0. , 2.59, 2.59, 2.59, 0. , 0. ], dtype=float32)
```## Numpy Error
If you see this error:
```py
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
```Try re-install numpy
```sh
pip uninstall -y numpy && pip install numpy
```## Parsing Scripts
To install with the parsing functionality, use this```sh
conda install -c omnia openmm
pip install nmrgnn-data[parse]
```## Working with Data
All commands below can have additional information printed using the `--help` argument.
### Find pairs
Find pairs of atoms with chemical shifts that are neighbors and sort them based on distance.
```sh
nmrdata find-pairs structure-test.tfrecords-data.tfrecord ALA-H ALA-N
```### Count Names
Get class/atom name counts:
```sh
nmrdata count-names structure-test.tfrecords-data.tfrecord
```### Validate
Check that records are consistent with embeddings
```sh
nmrdata validate-embeddings structure-test.tfrecords-data.tfrecord
```Check that neighbor lists are consistent with embeddings
```sh
nmrdata validate-nlist structure-test.tfrecords-data.tfrecord
```Check that peaks are reasonable (no nans, no extreme values, no bad masks)
```sh
nmrdata validate-peaks structure-test.tfrecords-data.tfrecord
```### Output Lables
To extract labels ordered by PDB and residue:
```sh
nmrdata write-peak-labels test-structure-shift-data.tfrecord test-structure-shift-record-info.txt labels.txt
```## Making New Data
See commands `nmrparse shiftml`, `nmrparse metabolites`, `nmrparse shiftx` which are parsers for various databases.
### From RefDB Files
This requires a pickled python object called `data.pb` to be in the directory. It is
a list of `dict`s containing `pdb_file` (path to PDB), `pdb` (PDB ID), `corr` (path to `.corr` file), and `chain` (which chain).
`chain` can be `_` to indicate use first chain.```sh
nmrparse parse-refdb directory name --pdb_filter exclude_ids.txt
```
## CitationPlease cite [Predicting Chemical Shifts with Graph Neural Networks](https://pubs.rsc.org/en/content/articlehtml/2021/sc/d1sc01895g)
```bibtex
@article{yang2021predicting,
title={Predicting chemical shifts with graph neural networks},
author={Yang, Ziyue and Chakraborty, Maghesree and White, Andrew D},
journal={Chemical science},
volume={12},
number={32},
pages={10802--10809},
year={2021},
publisher={Royal Society of Chemistry}
}
```