https://github.com/andrewtarzia/enzyme_screen
Molecular size calculations for enzyme reaction screening in metal--organic frameworks
https://github.com/andrewtarzia/enzyme_screen
enzymes mofs molecular-size screening
Last synced: 9 months ago
JSON representation
Molecular size calculations for enzyme reaction screening in metal--organic frameworks
- Host: GitHub
- URL: https://github.com/andrewtarzia/enzyme_screen
- Owner: andrewtarzia
- License: mit
- Created: 2020-09-08T15:25:24.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2021-02-10T22:15:04.000Z (over 5 years ago)
- Last Synced: 2025-07-26T11:41:26.176Z (11 months ago)
- Topics: enzymes, mofs, molecular-size, screening
- Language: Jupyter Notebook
- Homepage:
- Size: 10.5 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# enzyme_screen
Scripts and functions for extracting and analysing biochemical reactions.
Author: Andrew Tarzia
Email: andrew.tarzia@gmail.com or atarzia@ic.ac.uk
This work was produced in the final year of my PhD at the University of Adelaide under the supervision of A/Prof David Huang and Prof Christian Doonan.
Previously at: https://bitbucket.org/andrewtarzia/psp_source/src/master/
A Jupyter notebook that runs through the molecular size calculation from a SMILES string is available:
[](https://mybinder.org/v2/gh/andrewtarzia/enzyme_screen/master?filepath=examples%2Fcalculate_molecular_size.ipynb)
The molecular size calculation code is also available in a refactored form at my GitHub and through PyPi: https://github.com/andrewtarzia/mol-ellipsize
## Installation
* Tested on Ubuntu 18.04 using conda and pip
* Install Anaconda in standard way (Python 3.7.3)
* Packages required outside of what comes with conda
* RDKit:
* `conda install -c conda-forge rdkit`
* Version: 2019.09.2.0
* chemcost:
* Python code written by Steven Bennett for the extraction of purchasability from the ZINC15 database.
* Follow instructions found here: https://github.com/stevenbennett96/chemcost
* Only required for `molecule_population.py`
## Workflow
### Collecting database from KEGG
* Download br08201 JSON file from [the KEGG library](https://www.genome.jp/kegg-bin/get_htext?query=08201&htext=br08902.keg)
* Used version as of May12_2020 of br08201: Enyzmatic reactions
* Run `util/split_KEGG.py` in working directory to produce:
* `_ECtop.json`: A dictionary of all reactions for all ECs
* `_EClist.txt`: A list of all ECs to iterating through
* Update `data/param_file.txt` with location of these files.
### Parameter testing
* All parameter screens in the supporting information of DOI: **awaiting** are run in `param_screening.py`
* `data/test_molecules.txt` contains the required molecular information
* Within `param_screening.py` are the range of parameters to test, the originals being set in `data/param_file.txt`
### Reaction collection and analysis
* `RS_collection.py`
* Iterates through provided EC and reaction files to collect reaction systems
* Also collects unique molecules to molecule database
* Currently only implements API for KEGG
* To be run in directory with reactions
* `molecule_population.py`
* Trivial parallelisation done using `utils/molecule_splitter.py`
* Takes _unopt.mol file of all collected molecules:
* Optimises them using ETKDG -> _opt.mol
* Calculates their properties -> _prop.json
* Calculates the molecule size of N conformers -> _size.csv
* To be run in directory with molecules
* Produces some plots of chemical space
* `chemical_space_plot.py`
* Iterates through all collected molecules and plots various chemical space plots
* To be run in directory with molecules
* `RS_analysis.py`
* `molecule_population.py` must be run before this point!
* Unanalysed molecules result in skipped reactions
* Populates the properties of each reaction system based on the properties of constituent molecules (in molecule database)
* To be run in directory with reactions
* Outputs all properties to `rs_properties.csv`
* `screening.py`
* Produces the plots and screening of all reaction systems seen in DOI: **awaiting**
* Multiple cases are defined within the script to look at specific EC numbers or system types
* case = production for plots in DOI:
* To be run in directory with reactions
## Examples
* `biomin_screening.py`
* A script used to produce Figure **XX** in DOI:
* Analyses a list of molecules that have been tested for enzyme@ZIF-8 reactions
* `examples/calculate_molecular_size.ipynb`
* Jupyter notebook that runs a user through calculating the size of any molecule
* `examples/screen_new_reactions.ipynb`
* Jupyter notebook that runs through the screening process exemplified in the paper search for new reactions
* `visualise_ellipsoid_steps.py`
* Allows the user to visualise the step-wise calculation of the min. vol. enclosing ellipsoid
* `visualise_reaction_system.py`
* Allows the user to print properties of a reaction system