https://github.com/axegon/tf2_malaria_cell_detection

Malaria cell detection with TensorFlow 2.0
https://github.com/axegon/tf2_malaria_cell_detection

opencv python3 tensorflow

Last synced: about 1 year ago
JSON representation

Malaria cell detection with TensorFlow 2.0

Host: GitHub
URL: https://github.com/axegon/tf2_malaria_cell_detection
Owner: axegon
License: mit
Created: 2019-03-24T15:54:18.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2024-06-17T22:44:24.000Z (about 2 years ago)
Last Synced: 2025-04-25T16:46:00.037Z (about 1 year ago)
Topics: opencv, python3, tensorflow
Language: Python
Size: 1.32 MB
Stars: 6
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Malaria cell detection with TensorFlow 2.0

### Introduction

A model originally written for a talk but as the meetup was delayed considerably, I'm open sourcing it.

In addition, the code was slightly modified to work with the new  Tensorflow 2.0.

The data-set is provided by the U.S National Library of Medicine. It consists of 27,558 images of individual

cells, 13779 infected and 13779 uninfected cells. The provided images come in various sizes, ranging from

90 x 90 pixels, all the way up to 320 x 240 pixels.

Special thanks to [Dr. Julian Rayner](https://www.sanger.ac.uk/people/directory/rayner-julian), who kindly shed a lot of light on the subject and cleared many doubts.

There is plenty of variation in the images and after close examination, there are cells infections from early to late

stages, where the cells become physically deformed.

### Installation

You should have two command line tools installed:

* wget

* unzip

I advise you to use a virtual environment. All the required packages can be found in requirements.txt or

requirements-gpu.txt. Alternatively you could use the setup.sh which should pick the right one for you.

**IMPORTANT NOTES**

TensorFlow 2.0 requires CUDA 10.0 and cuDNN 7.4.1. Installing those will break anything you have running with

TensorFlow 1.x. Use at your own risk.

**ONLY PYTHON 3.6+ IS SUPPORTED!**

I've become a big fan of the f-string syntax and I would like to avoid maintaining code that needs additional work for

it to run on older versions.

### Running the model

Once you've trained(or downloaded the pre-trained model), you can run it with:

```

$ python run.py --model=/path/to/model.h5 \

        --image=/path/to/directory/of/images/or/single/image

```

Use the `--help` flag for more options.

The JSON model description must be in the same directory as the pre-trained weights and having the same name as the

weights file with a json extension:

```

pretrained

├── model_2019-03-18T23:31:49.432520_0.9594166874885559_0.11796065587550401.h5

└── model_2019-03-18T23:31:49.432520_0.9594166874885559_0.11796065587550401.json

```

Use the `--help` flag to see more options.

### Model architecture

![alt text](images/model_arch.png "Model architecture")

```

Model: "sequential_1"

_________________________________________________________________

Layer (type)                 Output Shape              Param #

=================================================================

conv2d_1 (Conv2D)            (None, 139, 139, 32)      896

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 69, 69, 32)        0

_________________________________________________________________

conv2d_2 (Conv2D)            (None, 67, 67, 64)        18496

_________________________________________________________________

conv2d_3 (Conv2D)            (None, 65, 65, 64)        36928

_________________________________________________________________

max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64)        0

_________________________________________________________________

conv2d_4 (Conv2D)            (None, 30, 30, 128)       73856

_________________________________________________________________

conv2d_5 (Conv2D)            (None, 28, 28, 128)       147584

_________________________________________________________________

conv2d_6 (Conv2D)            (None, 26, 26, 128)       147584

_________________________________________________________________

max_pooling2d_3 (MaxPooling2 (None, 13, 13, 128)       0

_________________________________________________________________

flatten_1 (Flatten)          (None, 21632)             0

_________________________________________________________________

dropout_1 (Dropout)          (None, 21632)             0

_________________________________________________________________

dense_1 (Dense)              (None, 256)               5538048

_________________________________________________________________

dense_2 (Dense)              (None, 256)               65792

_________________________________________________________________

dense_3 (Dense)              (None, 1)                 257

=================================================================

Total params: 6,029,441

Trainable params: 6,029,441

Non-trainable params: 0

_________________________________________________________________

```

A stripped down and simplified version of VGG16 with some tweaks and changes: Namely, binary

cross-entropy and Adam optimizer.

### Training

Downloading the data.

The data set is around just under 400MB compressed and can be downloaded through the shell script in the repository:

```

$ bash download.sh

```

If you want to train the model on your own, I would advise using a CUDA-powered GPU. The training on a CPU would take

several hours(6 to 8 give or take depending on your CPU). On a GPU (Gigabyte GeForce GTX 1080TI 11GB GDDR5X) the time

goes down to around 40 minutes.

You can tune the training parameters via the `config.yaml` file. **Before you start training, make sure you create the

directory defined under `output_models`. The training script will not do that for you.**

```

$ python train.py --config=config.yaml

```

### Pre-trained model.

The pre-trained model can be downloaded from [here](https://www.dropbox.com/s/0bw5u7a0q2oh57s/pretrained_model.zip)

**Training accuracy**

![alt text](/images/epoch_acc.svg "Training accuracy")

**Training loss**

![alt text](/images/epoch_loss.svg "Training loss")

**Sample resuls**

Random sample of uninfected cells, never used in the training:

![uninfected](/images/Uninfected.png "Uninfected")

Random sample of parasitized cells, never used in the training:

![parasitized](/images/Parasitized.png "Parasitized")

**UPDATE**

I got a message from a user (who shall remain anonymous due to request) who trained the same network on a completely different dataset: The [Chest X-Ray Pneumonia dataset](http://academictorrents.com/details/7208a86910cc518ae8feaa9021bf7f8565b97644) and got similar results (over 90% accuracy) with just 1000 images per class. His parameters were set as follows:

```

batch_size: 60

epochs: 35

image_size: [150, 150]

labels: [NORMAL, PNEUMONIA]

output_models: models/

plot_training_data: true

random_state: 42

test_size: 0.3

train_images: chest_xray/train/

train_size: 1000

validation_steps: 15

steps_per_epoch: 60

```

The models he trained are available [here](https://www.dropbox.com/s/2e2au2osagkiu4t/pneumonia.zip).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/axegon/tf2_malaria_cell_detection

Awesome Lists containing this project

README