https://github.com/pabsan-0/image-clustering
K-means clustering of images with VGG18 featurizer
https://github.com/pabsan-0/image-clustering
opencv pytorch repos-ml sklearn
Last synced: 3 months ago
JSON representation
K-means clustering of images with VGG18 featurizer
- Host: GitHub
- URL: https://github.com/pabsan-0/image-clustering
- Owner: pabsan-0
- Created: 2022-06-17T13:29:44.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-06-19T08:12:17.000Z (about 3 years ago)
- Last Synced: 2025-03-17T07:44:55.091Z (over 1 year ago)
- Topics: opencv, pytorch, repos-ml, sklearn
- Language: Python
- Homepage:
- Size: 3.15 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image clustering
Simple K-means clustering of VGG-16 featurized images.

## Requirements
- Install through pip: `$ pip3 install -r requirements.txt`
- The following are also needed, mind that you have no version collisions:
- Pytorch
- Opencv
Optionally, use Docker to run a `pytorch` container: `docker compose run pytorch`. You still need to install deps from `requirements.txt`.
## Usage
- Run featurization & clustering with:
```
$ python3 main.py --n-clusters 5 --no-use-cache samples/*.jpg
```
- An image-cluster preview is stored in `cache/clusters.png`.
- Find the output in `cache/groups/*.txt` as path lists of your files by cluster. Then you can do some stuff like:
```
# Verify listed items exist
cat cache/groups/0.txt | xargs ls
# Copy/link image to pwd
cat cache/groups/0.txt | xargs cp -t .
cat cache/groups/0.txt | xargs realpath | xargs ln -st .
```
By default, intermediate files containing the list of input images and their features will be stored on disk. If trying to re-featurize the very same images, cached features will be loaded to save time.
## Scripts
```
$ python3 main.py --help
usage: main.py [-h] [-n N_CLUSTERS] [--no-save-cache] [--no-use-cache] [--no-preview] ...
positional arguments:
images Images to cluster, accepts a glob
optional arguments:
-h, --help show this help message and exit
-n N_CLUSTERS, --n-clusters N_CLUSTERS
K means number of clusters
--no-save-cache Do not cache features
--no-use-cache Do not load features from cache
--no-preview Do not build a preview image
```
## Known issues
- `RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling 'cublasCreate(handle)'`
- Random error, just rerun
- If persisting, try decreasing batch size in the dataloader func
## Diving deeper
- https://datagen.tech/guides/computer-vision/vgg16/
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
- https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html