https://github.com/eshansurendra/cell_anomaly_detection_using_autoencoders
This repository offers a TensorFlow-based anomaly detection system for cell images using adversarial autoencoders, capable of identifying anomalies even in contaminated datasets. Check out our code, pretrained models, and papers for more details.
https://github.com/eshansurendra/cell_anomaly_detection_using_autoencoders
anomaly-detection autoencoder deep-learning h5 keras kernel-density-estimation machine-learning neural-network tensorflow
Last synced: 4 months ago
JSON representation
This repository offers a TensorFlow-based anomaly detection system for cell images using adversarial autoencoders, capable of identifying anomalies even in contaminated datasets. Check out our code, pretrained models, and papers for more details.
- Host: GitHub
- URL: https://github.com/eshansurendra/cell_anomaly_detection_using_autoencoders
- Owner: eshansurendra
- License: mit
- Created: 2024-03-14T08:24:18.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-17T12:56:22.000Z (over 1 year ago)
- Last Synced: 2025-04-09T13:48:00.432Z (10 months ago)
- Topics: anomaly-detection, autoencoder, deep-learning, h5, keras, kernel-density-estimation, machine-learning, neural-network, tensorflow
- Language: Jupyter Notebook
- Homepage:
- Size: 2.63 MB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: docs/README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cell Anomaly Detection using Autoencoders
This repository provides an implementation of an anomaly detection system for cell images using autoencoders. The project draws inspiration from the paper "Robust Anomaly Detection in Images using Adversarial Autoencoders" by Laura Beggel, Michael Pfeiffer, and Bernd Bischl.






## Project Structure
The project focuses on detecting anomalies in images using autoencoder neural networks. An autoencoder learns to reconstruct normal images and can classify images as anomalies when the reconstruction error exceeds a certain threshold. The code in this repository implements an autoencoder-based anomaly detection method using TensorFlow.
### Overview

The project addresses a fundamental challenge in anomaly detection using autoencoders, particularly when the training set contains outliers. Continued training of autoencoders tends to reduce the reconstruction error of outliers, thereby degrading anomaly detection performance. To mitigate this issue, an adversarial autoencoder architecture is employed, which imposes a prior distribution on the latent representation, typically placing anomalies into low likelihood regions. By utilizing the likelihood model, potential anomalies can be identified and rejected during training, resulting in a more robust anomaly detector.
### Architecture

The architecture employed in this project leverages a VGG16-based model, modified for the task of encoding and decoding images for anomaly detection. Here's a breakdown of how the architecture is structured:
- **Encoder:** The encoder part of the architecture is based on the VGG16 model, with the fully connected layers removed. The final layer of this modified VGG16 encoder outputs a 7x7x7 encoded image vector. This condensed representation captures the essential features of the input images.
- **Decoder:** The decoder mirrors the architecture of the encoder but in reverse. It takes the encoded vector and reconstructs the image. The quality of reconstruction is crucial for detecting anomalies.
```python
#Encoder
model = Sequential()
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(SIZE, SIZE, 3)))
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(Conv2D(16, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D((2, 2), padding='same'))
```
```python
#Decoder
model.add(Conv2D(16, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(UpSampling2D((2, 2)))
```
- **Kernel Density Estimation (KDE):** In the middle of the architecture, Kernel Density Estimation (KDE) is used to calculate the likelihood of an image belonging to the 'good' class. KDE is applied to the training data to provide an estimate of where the input image vector space lies within the latent space. This estimation helps in determining the 'normal' density regions.
- Method: In here build new encoder network, with trained weights from above model.This is used to get the compressed output (latent space) of the input image.The compressed output is then used to calculate the KDE
```python
encoder_model = Sequential()
encoder_model.add(Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(SIZE, SIZE, 3), weights=model.layers[0].get_weights()) )
encoder_model.add(MaxPooling2D((2, 2), padding='same'))
encoder_model.add(Conv2D(32, (3, 3), activation='relu', padding='same', weights=model.layers[2].get_weights()))
encoder_model.add(MaxPooling2D((2, 2), padding='same'))
encoder_model.add(Conv2D(16, (3, 3), activation='relu', padding='same', weights=model.layers[4].get_weights()))
encoder_model.add(MaxPooling2D((2, 2), padding='same'))
encoder_model.summary()
```
- **Anomaly Detection:** Anomalies are determined based on two criteria:
1. **KDE Value:** If the KDE of an image's latent representation is below a certain threshold, the image is considered an anomaly. This threshold is set based on the density distribution of the training images. Images with latent representations that lie far from the high-density regions (where most training images lie) are flagged as anomalies.
2. **Reconstruction Error:** Additionally, if the reconstruction error (the difference between the original image and its reconstructed image from the decoder) exceeds a predefined threshold, the image is also classified as an anomaly.
This dual-criterion approach helps in robustly identifying images that do not conform to the learned distribution of 'normal' images, either through significant deviations in their latent space positioning or through poor reconstruction quality.
```python
def check_anomaly(img_path):
density_threshold = 2500 #Set this value based on the above exercise
reconstruction_error_threshold = 0.004 # Set this value based on the above exercise
img = Image.open(img_path)
img = np.array(img.resize((128,128), Image.Resampling.LANCZOS))
plt.imshow(img)
img = img / 255.
img = img[np.newaxis, :,:,:]
encoded_img = encoder_model.predict([[img]])
encoded_img = [np.reshape(img, (out_vector_shape)) for img in encoded_img]
density = kde.score_samples(encoded_img)[0]
reconstruction = model.predict([[img]])
reconstruction_error = model.evaluate([reconstruction],[[img]], batch_size = 1)[0]
if density < density_threshold or reconstruction_error > reconstruction_error_threshold:
print("The image is an anomaly")
else:
print("The image is NOT an anomaly")
```
## Repository Structure
- **`docs`:** Contains documentation files related to the project.
- **`notebook`:** Holds Jupyter notebook files.
- **`pretrained_models`:** Contains pretrained models saved in various formats.
- **`scripts`:** Holds Python scripts used in the project.
- **`src`:** Contains the source code files of the project.
- [main.py](/src/main.py): Python script orchestrating the workflow by calling functions from other modules.
- [data.py](/src/data.py): Module handling data loading.
- [train.py](/src/train.py): Module dealing with model building and training.
- [evaluate.py](/src/evaluate.py): Module including functions for calculating the density and reconstruction errors.
- [visualize.py](/src/visualize.py): Module containing the function for plotting the training and validation loss.
## Prerequisites
Install the required packages using:
```bash
pip install -r requirements.txt
```
## How to Use
**Step 1: Download Dataset** [Malaria Cell Images Dataset](https://data.lhncbc.nlm.nih.gov/public/Malaria/cell_images.zip)
After downloading, unzip the dataset and place it in the appropriate directory.
**Step 2: Data Preparation**
Split the downloaded dataset into training and testing sets using the `split.py` script.
```bash
python scripts/split.py
```
**Step 3: Training the Model**
Train the autoencoder model using the training data.
```bash
python src/train.py
```
**Step 4: Evaluating the Model**
Evaluate the model to calculate density and reconstruction errors.
```bash
python src/evaluate.py
```
**Step 5: Visualizing the Results**
Visualize the training and validation loss.
```bash
python src/visualize.py
```
**Step 6: Running the Full Pipeline**
You can run the entire pipeline using the `main.py` script.
```bash
python src/main.py
```
## Pretrained Models
Pretrained models are provided in the `pretrained_models` directory. You can load and use these models directly without training:
* `cell_anomaly_detection.h5`
* `cell_anomaly_detection_encoder.h5`
* `my_model.keras`
## Reference
This project is based on the paper:
- Laura Beggel, Michael Pfeiffer, Bernd Bischl. "Robust Anomaly Detection in Images using Adversarial Autoencoders" [arXiv:1901.06355 [cs.LG]](https://arxiv.org/abs/1901.06355)
The detailed methodology and experimental results are available in the paper included in the `docs` directory.
## Contributing
Contributions are welcome!
- **Bug Fixes:** If you find any bugs or issues, feel free to create an issue or submit a pull request.
- **Feature Enhancements:** If you have ideas for new features or improvements, don't hesitate to share them.
## License
This project is licensed under the [MIT License](LICENSE).