https://github.com/katherinemossdeveloper/the-georgia-project
Study of the cropped images in the OpenCrystalData dataset on Kaggle
https://github.com/katherinemossdeveloper/the-georgia-project
antibiotic binary-classification cephalexin crystallization imagenet images inference machine-learning mettler-toledo phenylglycine python-3 resnet
Last synced: about 2 months ago
JSON representation
Study of the cropped images in the OpenCrystalData dataset on Kaggle
- Host: GitHub
- URL: https://github.com/katherinemossdeveloper/the-georgia-project
- Owner: KatherineMossDeveloper
- License: mit
- Created: 2025-04-08T17:58:20.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-11T10:33:51.000Z (12 months ago)
- Last Synced: 2025-06-11T11:46:16.861Z (12 months ago)
- Topics: antibiotic, binary-classification, cephalexin, crystallization, imagenet, images, inference, machine-learning, mettler-toledo, phenylglycine, python-3, resnet
- Language: Python
- Homepage:
- Size: 9.4 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README




[](https://github.com/d3blocks/d3blocks)
[](https://www.kaggle.com/datasets/opencrystaldata/cephalexin-reactive-crystallization)




[](https://github.com/weaviate/weaviate)
## Content.
[Quick start.](#quick-start) •
[Model comparison](#model-comparison) •
[Slow start.](#slow-start) •
[Contributions](#contributions) •
[Known issues](#known-issues) •
[Contact info](#contact-info)
## Quick start.
1. Set up the code for this project.
- Download the [source code ZIP file.](https://github.com/KatherineMossDeveloper/The-Georgia-Project/archive/refs/tags/v1.6.0.zip) or the [source code TAR file.](https://github.com/KatherineMossDeveloper/The-Georgia-Project/archive/refs/tags/v1.6.0.tar.gz) and extract it.
- Set up a Python environment, if you don't already have one. I used PyCharm, ver. 2023.2.4, Community Edition.
- Install dependencies as needed. I used Python 3.8, TensorFlow 2.10.1, and Keras 2.10.0.
2. Get the data from Kaggle.
- Download [OpenCrystalData Crystal Impurity Detection](https://www.kaggle.com/datasets/opencrystaldata/cephalexin-reactive-crystallization?resource=download) and extract it.
- Edit `GAsplitDataIntoTrainValidandTest.py` so that the folder_prefix variable points to the data on your pc.
- Run `GAsplitDataIntoTrainValidandTest.py` to split the data up into training, validation, and testing.
3. Run the training.
- Edit `GAmain.py` so that the folder_prefix variable points to the OpenCrystalData on your pc.
- Run `GAmain.py` to train the model.
- Check the results in the \GAdeliverables folder (where you extracted the data).
4. Play time.
- Download the [HDF5 weights file](https://github.com/KatherineMossDeveloper/The-Georgia-Project/releases/download/v1.6.0/GAweights.h5) or the [ONNX weights file](https://github.com/KatherineMossDeveloper/The-Georgia-Project/releases/download/v1.6.0/GAweights.onnx) to the \images_testing folder.
- Edit `GA_visualize.py` so that the folder_prefix variable points to the Georgia Project code.
- Run `GA_visualize.py` to label images in the \images_testing folder and create plots about them.
## Model comparison.
In the table below are the details offered by the published paper, then on the right are the choices that I elected to work with.
| |Salami et al. paper |my work |
|-------------------------|------------------------|-----------------------|
|framework |MATLAB |PyCharm |
|model type |ResNet-18, ResNet-50 |ResNet-101 |
|optimization method |SGDM |Keras SGD (momentum .9)|
|learning rate |1 × 10−4 |1 × 10−1 |
|training data |3200−3600 in each class |(same) |
|train/val./test % |70/25/5% |(same) |
|minibatch size |32−64 |64 |
|validation frequency |10−50 |1 |
|added dropout layers |(did not comment) |2 |
|trainable ImageNet layers|(did not comment) |made last 10% trainable|
## Slow start.
This project was inspired by a research paper: Salami, H., McDonald, M. A., Bommarius, A. S., Rousseau, R. W., & Grover, M. A. (2021). [In Situ Imaging Combined with Deep Learning for Crystallization Process Monitoring: Application to Cephalexin Production](https://doi.org/10.1021/acs.oprd.1c00136). *Organic Process Research & Development*, 25, 1670–1679.
The scientists who wrote the paper trained ResNet models with ImageNet weights on the OpenCrystalData dataset. The models were trained to do binary classification of images of crystals, designating them as either CEX (a.k.a., “cephalexin antibiotic,” a good thing) or PG (a.k.a. “phenylglycine,” a bad thing). This project recreates their work.
Here is the Georgia Project's detailed documentation.
[Go to the main doc file](docs/maindoc.md)
## Contributions.
If you found an issue or would like to make a suggestion for an improvement to the code or documentation, please click on the issue tab on the project page and leave me a note. If you like this project, leave a star.
## Known issues.
None.
## Contact info.
For more details about this project, feel free to reach out to me at katherinemossdeveloper@gmail.com or my account on [LinkedIn](https://www.linkedin.com/pub/katherine-moss/3/b49/228) .
My time zone is EST in the U.S.
[back to top](#content)