https://github.com/mreliptik/dmfinalproject
Final project for Data Mining course : Using OPTICS on 2 datasets
https://github.com/mreliptik/dmfinalproject
clustering datamining optics-clustering python
Last synced: 5 months ago
JSON representation
Final project for Data Mining course : Using OPTICS on 2 datasets
- Host: GitHub
- URL: https://github.com/mreliptik/dmfinalproject
- Owner: MrEliptik
- License: mit
- Created: 2019-01-09T21:07:25.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T01:32:28.000Z (over 3 years ago)
- Last Synced: 2025-04-02T05:09:32.374Z (about 1 year ago)
- Topics: clustering, datamining, optics-clustering, python
- Language: Python
- Size: 92.3 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# DMFinalProject
This is a project that uses OPTICS clustering algorithm to cluster footballer faces and diabetic patients' data.
## Requirements :
Install scipy==0.21dev0 with :
pip install git+https://github.com/scikit-learn/scikit-learn.git
unless OPTICS is now part of the stable releas
Install all the rest of the requirements with:
pip install -r requirements.txt
## Getting Started
### File structure
**Datasets**
*dataset_diabetes*
- diabetic_data.csv : 130 US hospital data from 1999 to 2008[1](https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008)
- IDs_mapping.csv : mapping for admission type, discharge disposition and admission source
*footballers* : 124 footballers' photo (Neymar Jr., Lionel Messi, Cristiano Ronaldo, Luis Suarez, and Mohamed Salah)
*Predict* : 5 footballers' photo to use for prediction (Neymar Jr., Lionel Messi, Cristiano Ronaldo, Luis Suarez, and Mohamed Salah)
**Ressources**
- footballers_encodings.pickle : Encodings of the footballers after using *encode_faces.py*
- footballers_predict_encodings.pickle : Encodings of the footballers for prediction after using *encode_faces.py*
- GUFD_encodings.pickle : Encodings of the GUFD photo after using *encode_faces.py*
- shape_predictor_68_face_landmarks.dat : Used to extract facial features in *encode_features.py*
- *encode_faces.py* : used to encode the face in an image as a 128-d vector
- *encode_features.py* : used to encode the facial features of a face as a 7-d vector
- *faces_clustering.py* : used to cluster the footballers' faces
- *diabetic_clustering.py* : used to cluster the diabetic's data
- *similarity_clustering.py* : used to cluster the GUFD dataset based on facial features similiarities
- *optics.py* : contains the clustering and predicting methods
- *requirements.txt* : file containing all the python packages requirements
## Authors
* **Victor MEUNIER** - *DMFinalProject* - [MrEliptik](https://github.com/MrEliptik)
## License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details