{"id":23190946,"url":"https://github.com/pspanoudakis/data-mining-techniques-projects","last_synced_at":"2025-04-05T06:42:55.112Z","repository":{"id":178448445,"uuid":"623637159","full_name":"pspanoudakis/Data-Mining-Techniques-Projects","owner":"pspanoudakis","description":"Data Visualization 📊 Clustering and Classification 🗂️ techniques on Customer 🛍️ \u0026 Book 📖 datasets","archived":false,"fork":false,"pushed_at":"2024-01-03T17:42:42.000Z","size":10458,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-10T14:28:55.885Z","etag":null,"topics":["customer-personality-analysis","goodreads-data","matplotlib","numpy","pandas","python-typing","seaborn","sklearn-classifier","sklearn-clustering","wordcloud"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pspanoudakis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-04T19:16:27.000Z","updated_at":"2024-07-15T12:14:51.000Z","dependencies_parsed_at":"2024-01-03T18:47:48.459Z","dependency_job_id":null,"html_url":"https://github.com/pspanoudakis/Data-Mining-Techniques-Projects","commit_stats":null,"previous_names":["pspanoudakis/data-mining-techniques-projects"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pspanoudakis%2FData-Mining-Techniques-Projects","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pspanoudakis%2FData-Mining-Techniques-Projects/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pspanoudakis%2FData-Mining-Techniques-Projects/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pspanoudakis%2FData-Mining-Techniques-Projects/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pspanoudakis","download_url":"https://codeload.github.com/pspanoudakis/Data-Mining-Techniques-Projects/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247299790,"owners_count":20916186,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["customer-personality-analysis","goodreads-data","matplotlib","numpy","pandas","python-typing","seaborn","sklearn-classifier","sklearn-clustering","wordcloud"],"created_at":"2024-12-18T12:15:49.443Z","updated_at":"2025-04-05T06:42:55.092Z","avatar_url":"https://github.com/pspanoudakis.png","language":"Jupyter Notebook","readme":"# Data Mining Techniques Projects\nThis is a series of projects for the Spring 2023 **Data Mining Techniques** course on [DIT@UoA](https://www.di.uoa.gr/en).\n\n## Project 1 - Customer Personality Analysis\nGiven a dataset which describes the customers of a company, we try to draw deductions on \n- The profile of the customers who are more likely to spend more\n- The campaign channels which bring more revenue\n- The purchase channels which bring more revenue\n\nTo reach such conclusions, we use common data mining techniques:\n- Data **preprocessing \u0026 cleaning**\n- **Generation** of new data features using given ones\n- Elimination of **outliers**\n- Data **Visualization**, e.g. using heatmaps, histograms and bar plots\n- **Principal Component Analysis**, to reduce the number of features of the data to extract clusters from\n- Cluster extraction using **Agglomerative Clustering** \u0026 **K-Means**\n\n## Project 2 - Book Recommendation \u0026 Classification\nGiven a [Goodreads](https://www.goodreads.com/) books dataset:\n- We visualize our data and extract deductions using the techniques mentioned in project 1. We also emphasize on extensive Pandas `DataFrame` manipulation, to collect various metrics and statistics on our data.\n- We develop a **Book Recommendation System** which can recommend similar dataset books given a specific book id:\n    - We vectorize the description of each book, using **TF-IDF**\n    - The recommender caclulates the **cosine similarity** for all book descriptions in an efficient way (see [`Pairwise Calculator`](https://github.com/pspanoudakis/Data-Mining-Techniques-Projects/blob/master/project2/modules/recommender.py#L19))\n    - We can then query the recommender to return the most similar books for the given one\n- We develop a **Book Genre Classifier**, which estimates the Genre for a book given the description of it:\n    - We vectorize each description using the mean of the included [**Word2Vec**](https://radimrehurek.com/gensim/models/word2vec.html) vectors, to create the training \u0026 test data\n    - We use an scikit-learn base classifier such as **Naive Bayes**, **Random Forest** and **Support Vector Classifier**  to perform **K-Fold Cross-Validation**, calculate metrics (accuracy, f-score, precision \u0026 recall) and measure the performance of our classifier.\n\n## Technologies \u0026 Tools used for development\n- **Pandas** \u0026 **NumPy**\n- **matplotlib** \u0026 **seaborn**\n- **scikit-learn**\n- VS Code \u0026 Google Colab\n\n## Repository content\nBoth projects include the following:\n- `hw[x].pdf` file describing the corresponding project tasks in detail\n- `.ipynb` and `.py` implementation files\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpspanoudakis%2Fdata-mining-techniques-projects","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpspanoudakis%2Fdata-mining-techniques-projects","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpspanoudakis%2Fdata-mining-techniques-projects/lists"}