https://github.com/idaraabasiudoh/svm_cell_classification

This repository contains code for classifying cell samples using Support Vector Machine (SVM) with Scikit-learn.
https://github.com/idaraabasiudoh/svm_cell_classification

machine-learning python3 scikit-learn svm-classifier

Last synced: 5 months ago
JSON representation

This repository contains code for classifying cell samples using Support Vector Machine (SVM) with Scikit-learn.

Host: GitHub
URL: https://github.com/idaraabasiudoh/svm_cell_classification
Owner: idaraabasiudoh
License: mit
Created: 2024-08-27T16:30:09.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-08-27T16:43:06.000Z (almost 2 years ago)
Last Synced: 2025-04-07T07:52:46.875Z (about 1 year ago)
Topics: machine-learning, python3, scikit-learn, svm-classifier
Language: Jupyter Notebook
Homepage:
Size: 1.27 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # SVM Classification of Cell Samples

This repository contains code for classifying cell samples using Support Vector Machine (SVM) with Scikit-learn. The dataset used includes various features of cell samples, and the project involves preprocessing the data, training the SVM model, and evaluating its performance using metrics such as the confusion matrix, F1 score, and Jaccard index.

## Table of Contents

- [Installation](#installation)

- [Dataset](#dataset)

- [Feature Selection](#feature-selection)

- [Modeling](#modeling)

- [Evaluation](#evaluation)

- [Results](#results)

- [Contributing](#contributing)

- [License](#license)

- [Acknowledgments](#acknowledgments)

## Installation

To run the code in this repository, you will need to have Python installed along with the following libraries:

```bash

pip install scikit-learn==0.23.1

pip install pandas

pip install matplotlib

```

## Dataset

The dataset used in this project is a collection of cell samples that includes features such as `Clump Thickness`, `Uniformity of Cell Size`, `Uniformity of Cell Shape`, `Marginal Adhesion`, `Single Epithelial Cell Size`, `Bare Nuclei`, `Bland Chromatin`, `Normal Nucleoli`, and `Mitoses`. The target variable (`Class`) indicates whether the cells are benign (2) or malignant (4).

### Downloading the Dataset

The dataset can be downloaded using the following code:

```python

import requests

url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/data/cell_samples.csv'

response = requests.get(url)

with open('cell_samples.csv', 'wb') as file:

    file.write(response.content)

```

## Feature Selection

The following features are selected for training the model:

- `Clump Thickness`

- `Uniformity of Cell Size`

- `Uniformity of Cell Shape`

- `Marginal Adhesion`

- `Single Epithelial Cell Size`

- `Bare Nuclei`

- `Bland Chromatin`

- `Normal Nucleoli`

- `Mitoses`

Data preprocessing includes handling missing values in the `Bare Nuclei` column.

## Modeling

The model is built using the Support Vector Machine (SVM) algorithm with an RBF kernel:

```python

from sklearn import svm

clf = svm.SVC(kernel='rbf')

clf.fit(X_train, y_train)

```

## Evaluation

The model is evaluated using the following metrics:

- **Confusion Matrix**: Displays the true positive, false positive, true negative, and false negative counts.

- **F1 Score**: The weighted average of precision and recall.

- **Jaccard Index**: A similarity measure that is used to compare the actual labels with the predicted labels.

The confusion matrix can be visualized using the `plot_confusion_matrix` function:

```python

from sklearn.metrics import confusion_matrix

cnf_matrix = confusion_matrix(y_test, yhat, labels=[2,4])

# Plot non-normalized confusion matrix

plt.figure()

plot_confusion_matrix(cnf_matrix, classes=['Benign(2)','Malignant(4)'], normalize=False, title='Confusion matrix')

```

## Results

- **Jaccard Index**: The Jaccard index for the model is computed as follows:

```python

from sklearn.metrics import jaccard_score

print(jaccard_score(y_test, yhat, pos_label=2))

```

## Contributing

Contributions are welcome! If you have any improvements or suggestions, please feel free to create a pull request or raise an issue.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

## Acknowledgments

This project is part of the IBM Developer Skills Network's machine learning course. Special thanks to the course creators for providing the dataset and the initial framework for this project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/idaraabasiudoh/svm_cell_classification

Awesome Lists containing this project

README