{"id":31389156,"url":"https://github.com/quantum-software-development/1-datamining_main_repository","last_synced_at":"2026-02-08T21:31:08.787Z","repository":{"id":309604861,"uuid":"1036912485","full_name":"Quantum-Software-Development/1-DataMining_Main_Repository","owner":"Quantum-Software-Development","description":"data mining, focusing on unsupervised learning methods (clustering, PCA, dictionary learning, anomaly detection) applied to real-world projects for third-sector organizations. Results are shared publicly in open repositories and community platforms.","archived":false,"fork":false,"pushed_at":"2025-09-17T15:48:58.000Z","size":13900,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-17T17:54:54.182Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/Quantum-Software-Development/specialized-consulting-data-mining","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Quantum-Software-Development.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"Quantum-Software-Development","Custom":"https://github.com/sponsors/Quantum-Software-Development/card"}},"created_at":"2025-08-12T19:11:16.000Z","updated_at":"2025-09-17T15:49:01.000Z","dependencies_parsed_at":"2025-08-22T02:31:08.099Z","dependency_job_id":"4d1da7c1-56e1-4693-8529-19b9c233a74e","html_url":"https://github.com/Quantum-Software-Development/1-DataMining_Main_Repository","commit_stats":null,"previous_names":["quantum-software-development/specialized-consulting-data-mining","quantum-software-development/main-repo-consulting-data-mining","quantum-software-development/main--data-miningepo-consulting","quantum-software-development/1-main_datamining_repository","quantum-software-development/1-datamining_main_repository"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Quantum-Software-Development/1-DataMining_Main_Repository","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F1-DataMining_Main_Repository","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F1-DataMining_Main_Repository/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F1-DataMining_Main_Repository/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F1-DataMining_Main_Repository/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Quantum-Software-Development","download_url":"https://codeload.github.com/Quantum-Software-Development/1-DataMining_Main_Repository/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F1-DataMining_Main_Repository/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277446623,"owners_count":25819183,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-28T02:00:08.834Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-28T23:59:19.441Z","updated_at":"2026-02-08T21:31:08.778Z","avatar_url":"https://github.com/Quantum-Software-Development.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/Quantum-Software-Development","https://github.com/sponsors/Quantum-Software-Development/card"],"categories":[],"sub_categories":[],"readme":"\u003cbr\u003e\n\n**\\[[🇧🇷 Português](README.pt_BR.md)\\] \\[**[🇺🇸 English](README.md)**\\]**\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n#  \u003cp align=\"center\"\u003e 1- [Data Mining]() /  [Main Repository]()\n\n\n\n\u003c!-- ======================================= Start DEFAULT HEADER ===========================================  --\u003e\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n[**Institution:**]() Pontifical Catholic University of São Paulo (PUC-SP)  \n[**School:**]() Faculty of Interdisciplinary Studies  \n[**Program:**]() Humanistic AI and Data Science\n[**Semester:**]() 2nd Semester 2025  \nProfessor:  [***Professor Doctor in Mathematics Daniel Rodrigues da Silva***](https://www.linkedin.com/in/daniel-rodrigues-048654a5/)\n\n\u003cbr\u003e\u003cbr\u003e\n\n#### \u003cp align=\"center\"\u003e [![Sponsor Quantum Software Development](https://img.shields.io/badge/Sponsor-Quantum%20Software%20Development-brightgreen?logo=GitHub)](https://github.com/sponsors/Quantum-Software-Development)\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\u003c!--Confidentiality statement --\u003e\n\n#\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\u003e [!IMPORTANT]\n\u003e \n\u003e ⚠️ Heads Up\n\u003e\n\u003e * Projects and deliverables may be made [publicly available]() whenever possible.\n\u003e * The course emphasizes [**practical, hands-on experience**]() with real datasets to simulate professional consulting scenarios in the fields of **Data Analysis and Data Mining** for partner organizations and institutions affiliated with the university.\n\u003e * All activities comply with the [**academic and ethical guidelines of PUC-SP**]().\n\u003e * Any content not authorized for public disclosure will remain [**confidential**]() and securely stored in [private repositories]().  \n\u003e\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n#\n\n\u003c!--END--\u003e\n\n\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\n\n\u003c!-- PUC HEADER GIF\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/0d6324da-9468-455e-b8d1-2cce8bb63b06\" /\u003e\n--\u003e\n\n\n\u003c!-- video presentation --\u003e\n\n\n##### 🎶 Prelude Suite no.1 (J. S. Bach) - [Sound Design Remix]()\n\nhttps://github.com/user-attachments/assets/4ccd316b-74a1-4bae-9bc7-1c705be80498\n\n####  📺 For better resolution, watch the video on [YouTube.](https://youtu.be/_ytC6S4oDbM)\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\n\u003e [!TIP]\n\u003e \n\u003e * #### **If you’d like to explore the Full Statistics Materials from the 1st year (not only the review), you can visit the complete repository** [**Here**](https://github.com/FabianaCampanari/PracticalStats-PUCSP-2024). \u003cbr\u003e\n\u003e\n\u003e\n\n\n\n\u003c!-- =======================================END DEFAULT HEADER ===========================================  --\u003e\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\n\n## Table of Contents\n\n\u003cbr\u003e\n\n\n1. [Course Overview](#course-overview)\n   - I - [class 1 - Intoductioon and Assessment](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/a98512aa9dc2525446a3ffb236d06cbfb16d1f43/class_1-Introduction)\n   - II - [class_2 - Introduction - Data Mining With Python](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/a98512aa9dc2525446a3ffb236d06cbfb16d1f43/class_2%20-%20Introduction%20-%20Data%20Mining%20With%20Python)\n   - III - [class_3 - Stats Review](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/a98512aa9dc2525446a3ffb236d06cbfb16d1f43/class_3%20-%20Stats%20Review)\n   - IV - [Data Cleaning by Zara Amini](https://github.com/Quantum-Software-Development/1-DataMining_Main_Repository/blob/cb4075948c0ae9f90ead385d620147daf0641f7c/Data%20Cleaning%20by%20Zahra%20Amini%20.pdf)\n2. [Objectives](#objectives)\n3. [Syllabus](#syllabus)\n4. [Weekly Schedule](#weekly-schedule)\n5. [Tools and Technologies](#tools-and-technologies)\n6. [Installation and Setup](#installation-and-setup)\n7. [Assessment](#assessment)\n8. [Bibliography](#bibliography)\n   - [Basic Bibliography](#basic-bibliography)\n   - [Complementary Bibliography](#complementary-bibliography)\n9. [Notes](#notes)\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n##  [Course Overview]()\n\n\u003cbr\u003e\n\n\nThis course introduces [**data mining techniques**]() with a focus on [**unsupervised learning methods**](), including:\n\n- Clustering algorithms (K-Means, Affinity Propagation, Mean-Shift)\n- Principal Component Analysis (PCA)\n- Dictionary Learning\n- Novelty and outlier detection\n\nStudents will work on [**practical projects**]() inspired by real-world problem-solving in third-sector organizations. Final deliverables will be shared in **open repositories** and made available to the broader community, schools, libraries, and non-profits.\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## [Objectives]()\n\nEnable students to **plan, conduct, and complete a research project** applying key **data mining concepts, algorithms, and methodologies**.\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## [Syllabus]()\n\n\u003cbr\u003e\n\n- Fundamentals of Data Mining\n- Data cleaning and preparation\n- Predictive analysis\n- Clustering methods (K-Means, Affinity Propagation, Mean-Shift)\n- Principal Component Analysis (PCA)\n- Dictionary Learning\n- Novelty and outlier detection\n- Application of concepts to real-world consulting scenarios\n\n\n\u003cbr\u003e\u003cbr\u003e\n\nStatistic Review - Stats Measures - Mean - Median - Mode - Variance]()\n\nhttps://github.com/Quantum-Software-Development/7-DataMining-Regression-Techniques-Data-Integration\n\n##  [Weekly Schedule]()\n\n\u003cbr\u003e\n\n| [Week]() | [Repos]() | [Methodology]() | [Tools]() |\n|------|-------|-------------|-------|\n| 1     | [Course introduction](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/d737ff164c6b4d6e580d5ba6e95c54ac604f7ea4/class_1-Introduction) | Active methodology | – |\n| 2  | [Statistical Review  - Stats Measures - Mean - Median - Mode - Variance](https://github.com/Quantum-Software-Development/2-DataMining_Statistical_Measures) | Active methodology | Python |\n| 3  | [Statistical Review - Variation Measures and Standard Deviation](https://github.com/Quantum-Software-Development/3-DataMining_VariationMeasures_Standard-Deviation) | Active methodology | Python |\n| 4     | [Data Mining - Concepts - Exploratory Analysis](https://github.com/Quantum-Software-Development/4-DataMining_Concepts_ExploratoryAnalysis) | Active methodology | Python - R |\n| 5   | [Data Cleaning - Preparation - Anomalies (Outliers)](https://github.com/Quantum-Software-Development/5-DataMining_DataCleaning_Preparation_Anomalies_Outlier) | Active methodology | Python |\n| 6     | [Data Mining - Pre Processing](https://github.com/Quantum-Software-Development/6-DataMining_Pre-Processing) | Active methodology | Python |\n| 7     | [Regression Techniques with Data Integration](https://github.com/Quantum-Software-Development/7-DataMining-Regression-Techniques-Data-Integration) | Active methodology | Python |\n| 8     | [Predictive  K-Means Clustering  Data and Figures Analysis](https://github.com/Quantum-Software-Development/8-DataMining-KMeans-Non-Hierarchical-Clustering) | Active methodology | Python |\n| 9     | [* Project 1 – K-Means Clustering Repository Presentation](https://github.com/Quantum-Software-Development/9-DataMining_Project_1_K-Means_Clustering_Presentation) | Active methodology | Python |\n| 10    | [Clustering Mean Shift](https://github.com/Quantum-Software-Development/10-DataMining_MeanShift) | Active methodology | Python |\n| 11    | [Affinity Propagation](https://github.com/Quantum-Software-Development/11-DataMining_Affinity_Propagation_Algorithm) | Active methodology  | Python |\n| 12    | [* Project 2 – Clustering Algorithms Exploration and Comparison- K-Means - Mean Shift - Affinity Propagation](https://github.com/Quantum-Software-Development/12-DataMining_Project_2_-Clustering_Comparison_KMeans_MeanShift-_AffinityPropagation) | Active methodology | Python |\n| 13    | [Principal Component Analysis (PCA) and Isolation Forest Algorithms](https://github.com/Quantum-Software-Development/13-DataMining_PCA_IsolationForest-Guide) | Active methodology | Python |\n| 14    | [DBSCAN and Spectral Clustering](https://github.com/Quantum-Software-Development/14-DataMining_DBSCAN_and_Spectral-Clustering) | Active methodology | Python |\n| 15    | [* Project 3 – Clustering Algorithms Exploration and Comparison- K-Means - Mean Shift - - Dbscan](https://github.com/Quantum-Software-Development/15-DataMining_Project_3_-Clustering_Comparison_KMeans_MeanShift_DBSCAN) | Active methodology | Python |\n| 16    | [ Dictionary-Based Feature Grouping for LLM/AI Pipelines](https://github.com/Quantum-Software-Development/16-DataMining_llm-tabular-preprocessing-dict-groups) | Active methodology | Python |\n| 17    | **P2 Exam** | Written (Individual) | – |\n| 18    | **P3 Exam \u0026 Grade Closure** | Written (Individual) | – |\n| 19     | Final grade submission | – | – |\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n##  [Tools and Technologies]()\n\n\u003cbr\u003e\n\n- **Programming Language:** Python  \n- **Libraries:** NumPy, Pandas, Scikit-learn, Matplotlib, Seaborn  \n- **Environment:** Jupyter Notebook or other Python IDEs\n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n##  Installation and Setup\n\n\u003cbr\u003e\n\nFollow these steps to set up your local environment for the course projects:\n\n\u003cbr\u003e\n\n[1](). **Clone the repository**\n\n\n```\ngit clone https://github.com/\u003cusername\u003e/\u003crepository-name\u003e.git\ncd \u003crepository-name\u003e\n```\n\n\n\u003cbr\u003e\n\n\n[2](). **Create a virtual environment** (recommended)\n\n\n```\npython -m venv venv\nsource venv/bin/activate   \\# Mac/Linux\nvenv\\Scripts\\activate      \\# Windows\n```\n\n\n\u003cbr\u003e\n\n\n[3](). **Install dependencies**\nMake sure `pip` is updated:\n```\n\npip install --upgrade pip\n\n```\nThen install the required packages:\n```\n\npip install -r requirements.txt\n\n```\n*(If `requirements.txt` is not provided, install manually:)*  \n```\n\npip install numpy pandas scikit-learn matplotlib seaborn jupyter\n```\n\n\n\u003cbr\u003e\n\n\n[4](). **Run Jupyter Notebook**\n   \n```\njupyter notebook\n```\n\n\n\u003cbr\u003e\n\n\n[5](). **Open course notebooks** and start practicing.\n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n##  I - [Intoductioon and Assessment](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/86d9d9fbc56efdd0b8e377955c1c7abf8879b775/class_1-Introduction)\n\n\u003cbr\u003e\n\n\n| Exam | Date | Format | Weight |\n|------|------|--------|--------|\n| **P1** | 01/10/2025 | Written – Individual | Arithmetic mean |\n| **P2** | 19/11/2025 | Written – Individual | Arithmetic mean |\n| **P3** | Substitution exam | Written – Individual | Replaces lowest score |\n\n\u003cbr\u003e\n\n[**Final Grade:**]() Arithmetic mean of assessments.\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n## II - [class_2- Introduction - Data Mining With Python](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/86d9d9fbc56efdd0b8e377955c1c7abf8879b775/class_2%20-%20Introduction%20-%20Data%20Mining%20With%20Python)\n\n\u003cbr\u003e\n\n☞ [Access Booklet](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/blob/81e2951f73c87cf7c4396a36d48be92384b7b720/class_1-%20Introduction%20-%20Data%20Mining%20With%20Python/Book%20-%20Introd%20to%20Data%20Mining%20With%20Python.pdf)\n\n\n\u003cbr\u003e\n\n## [Example 1]()\n\n\u003cbr\u003e\n\n\nThe following sample lists the number of minutes that 60 cable TV users watched content from their package in the last two hours. Construct a frequency distribution with 8 classes and build a histogram.\n\n\u003cbr\u003e\n\n\n[Data]():\n\n```\n20, 55, 5, 64, 78, 49, 91, 87, 18, 83, 33, 39, 30, 31, 59, 85, 102, 24, 27, 28,\n92, 108, 98, 67, 85, 109, 48, 19, 32, 69, 24, 59, 6, 49, 116, 37, 92, 43, 101, 60,\n55, 107, 25, 33, 57, 25, 17, 49, 24, 101, 14, 45, 73, 120, 91, 2, 11, 47, 21, 38\n```\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Step 1](): Determine Range and Number of Classes\n\n- Minimum value: 2\n- Maximum value: 120\n- Number of classes ($k$): 8 (given)\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Step 2](): Calculate Class Width\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n$$\n\\huge\nw = \\left\\lceil \\frac{\\text{max} - \\text{min}}{k} \\right\\rceil = \\left\\lceil \\frac{120 - 2}{8} \\right\\rceil = 15\n$$\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n### [Step 3](): Construct Class Intervals (from minimum value)\n\n| Class Interval | Explanation |\n| :-- | :-- |\n| 2 - 16 | Starts from minimum 2 |\n| 17 - 31 | 16 + 1 to 31 |\n| 32 - 46 | Next range |\n| 47 - 61 | Next range |\n| 62 - 76 | Next range |\n| 77 - 91 | Next range |\n| 92 - 106 | Next range |\n| 107 - 121 | Covers maximum 120 |\n\n\u003cbr\u003e\n\n### [Step 4](): Frequency Distribution Table\n\n\u003cbr\u003e\n\n| Class Interval | Frequency |\n| :--: | :--: |\n| 2 - 16 | 5 |\n| 17 - 31 | 14 |\n| 32 - 46 | 8 |\n| 47 - 61 | 13 |\n| 62 - 76 | 5 |\n| 77 - 91 | 8 |\n| 92 - 106 | 6 |\n| 107 - 121 | 5 |\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Step 5](): Calculate Midpoints for Each Class\n\n\u003cbr\u003e\n\n$$\n\\Huge\nx_i = \\frac{\\text{Lower limit} + \\text{Upper limit}}{2}\n$$\n\n\u003cbr\u003e\u003cbr\u003e\n\n| Class Interval | Midpoint ($x_i$) |\n| :-- | :-- |\n| 2 - 16 | 9 |\n| 17 - 31 | 24 |\n| 32 - 46 | 39 |\n| 47 - 61 | 54 |\n| 62 - 76 | 69 |\n| 77 - 91 | 84 |\n| 92 - 106 | 99 |\n| 107 - 121 | 114 |\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Step 6](): Calculate Mean Using Frequency and Midpoints\n\n\u003cbr\u003e\n\n### [Mean](): ($\\bar{x}$) is calculated by:\n\n\u003cbr\u003e\u003cbr\u003e\n\n$$\n\\Huge\n\\bar{x} = \\frac{\\sum f_i x_i}{\\sum f_i}\n$$\n\n\u003cbr\u003e\u003cbr\u003e\n\n### [Where](): $f_i$ = frequency, $x_i$ = [Midpoint]().\n\n\n\u003cbr\u003e\n\n### [Calculate each product]():\n\n\u003cbr\u003e\n\n| Class Interval | $f_i$ | $x_i$ | $f_i \\times x_i$ |\n| :-- | :-- | :-- | :-- |\n| 2 - 16 | 5 | 9 | 45 |\n| 17 - 31 | 14 | 24 | 336 |\n| 32 - 46 | 8 | 39 | 312 |\n| 47 - 61 | 13 | 54 | 702 |\n| 62 - 76 | 5 | 69 | 345 |\n| 77 - 91 | 8 | 84 | 672 |\n| 92 - 106 | 6 | 99 | 594 |\n| 107 - 121 | 5 | 114 | 570 |\n\n\u003cbr\u003e\n\n### [Sum frequencies](): $5 + 14 + 8 + 13 + 5 + 8 + 6 + 5$ = [64]()\n\n### [Sum of products](): $45 + 336 + 312 + 702 + 345 + 672 + 594 + 570$ = [3576]()\n\n\u003cbr\u003e\n\n### [Calculate mean]():\n\n\u003cbr\u003e\u003cbr\u003e\n\n$$\n\\huge\n\\bar{x} = \\frac{3576}{64} = 55.875\n$$\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Step 7](): Histogram, Bar Plot and Time Series Frequency Distribution Over Time\n\n- Construct a histogram, bar plot and  Time Series  with class intervals on the x-axis and frequencies on the y-axis.\n- Each bar height corresponds to the frequency of the class.\n\n\n\u003cbr\u003e\n\n☞ [Access Code](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/blob/a61b0572e5bca4d6f06b0187722f8ef97214c0a4/class_1-%20Introduction%20-%20Data%20Mining%20With%20Python/Code/DataMining_1.ipynb)\n\n☞ [Access Dataset](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/blob/01b6e27e588c3b830561385f14bd0d246f55049d/class_1-%20Introduction%20-%20Data%20Mining%20With%20Python/Banks%20Dataset/banco.csv)\n\n☞ [Access Plots](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/a61b0572e5bca4d6f06b0187722f8ef97214c0a4/class_1-%20Introduction%20-%20Data%20Mining%20With%20Python/Plots)\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n###[Frequency Analysis and Time Series Visualization]()\n\nThis notebook demonstrates how to perform frequency analysis on a CSV dataset, visualize results with histograms and bar plots, and create a time series chart using Python.\n\n\u003cbr\u003e\n\n###  [1](). Install and Import Libraries\n\n```python\n# Import required libraries\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n```\n\n\u003cbr\u003e\n\n###  [2](). Load Dataset\n\n```python\n# Load CSV file (semicolon-separated)\ndf = pd.read_csv('chose your dataset', sep=';')\n\n# Select only the \"day\" column\ndf1 = df['day']\n```\n\n\u003cbr\u003e\n\n###  [3](). Calculate Frequencies\n\n\n```python\n# Calculate absolute frequency (ascending order)\nfreq_abs = pd.Series(df1).value_counts(ascending=True)\n\n# Calculate relative frequency (normalized, 3 decimal places)\nfreq_rel = pd.Series(df1).value_counts(normalize=True).round(3)\n\n# Create a DataFrame with both measures\ndf_freq = pd.DataFrame({\n    'Absolute Frequency': freq_abs,\n    'Relative Frequency': freq_rel\n})\n\n# Display the frequency table\ndisplay(df_freq)\n```\n\n\u003cbr\u003e\n\n###  [4]().  Histogram (Dark Theme)\n\n```python\n# Create figure and axes with dark background\nplt.style.use('seaborn-v0_8-darkgrid')\nfig, ax = plt.subplots(figsize=(16, 4))\nfig.patch.set_facecolor('black')\nax.set_facecolor('black')\n\n# Plot histogram\nsns.histplot(df1, color='turquoise', ax=ax)\n\n# Customize labels and ticks\nplt.xlabel(\"Values\")\nplt.ylabel(\"Frequency\")\nplt.title(\"Frequency Distribution\", color='white')\nplt.tick_params(axis='x', colors='white')\nplt.tick_params(axis='y', colors='white')\n\n# Show plot\nplt.show()\n```\n\n\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"1307\" height=\"386\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/48b994b0-6bf8-425d-8bc8-ecd7395c45c5\" /\u003e\n\n\u003cbr\u003e\n\n###  [5](). Bar Plot (Dark Theme)\n\n```python\n# Create figure and axes\nplt.style.use('seaborn-v0_8-darkgrid')\nfig, ax = plt.subplots(figsize=(10, 6))\nfig.patch.set_facecolor('black')\nax.set_facecolor('black')\n\n# Bar plot of absolute frequency\ndf_freq['Absolute Frequency'].plot(kind='bar', color=\"turquoise\", ax=ax)\n\n# Customize labels and ticks\nplt.xlabel(\"Values\")\nplt.ylabel(\"Frequency\")\nplt.title(\"Frequency Distribution\", color='white')\nplt.xticks(rotation=0, color='white')\nplt.yticks(color='white')\n\n# Show plot\nplt.show()\n```\n\n\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"842\" height=\"540\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/6c28b3bf-1940-44e7-a4b3-80c03a736919\" /\u003e\n\n\u003cbr\u003e\n\n\n###  [6](). Time Series Preparation\n\n\n```python\n# Inspect available columns\nprint(df.columns)\n\n# Create a new DataFrame for time series analysis\ndf_time_series = df[['day', 'month']].copy()\n\n# Add dummy year (if year column is missing)\ndf_time_series['year'] = 2022\n\n# Convert to strings for concatenation\ndf_time_series['day'] = df_time_series['day'].astype(str)\ndf_time_series['year'] = df_time_series['year'].astype(str)\n\n# Create \"date\" column in dd-MMM-yyyy format\ndf_time_series['date'] = df_time_series['day'] + '-' + df_time_series['month'] + '-' + df_time_series['year']\ndf_time_series['date'] = pd.to_datetime(df_time_series['date'], format='%d-%b-%Y')\n\n# Set \"date\" as index\ndf_time_series = df_time_series.set_index('date')\n\n# Count occurrences per day\ndaily_counts = df_time_series.groupby(df_time_series.index).size()\n\n# Display first rows\ndisplay(daily_counts.head())\n```\n\n\u003cbr\u003e\n\n###  [7](). Time Series Plot (Dark Theme)\n\n\n```python\n# Set plot style\nplt.style.use('seaborn-v0_8-darkgrid')\nfig, ax = plt.subplots(figsize=(16, 6))\nfig.patch.set_facecolor('black')\nax.set_facecolor('black')\n\n# Plot time series\nplt.plot(daily_counts, color='turquoise')\n\n# Customize labels and ticks\nplt.title(\"Frequency Distribution Over Time\", color='white')\nplt.xlabel(\"Date\", color='white')\nplt.ylabel(\"Frequency\", color='white')\nplt.tick_params(axis='x', colors='white')\nplt.tick_params(axis='y', colors='white')\n\n# Show plot\nplt.show()\n```\n\n\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"1307\" height=\"540\" alt=\"Image\" src=\"https://github.com/user-attachments/assets/319298c8-04a3-4335-80e1-4b0c92fde027\" /\u003e\n\n\u003cbr\u003e\n\n### [Summary]()\n\nDummy Year: 2022 was used when year column was missing.\n\nVisualizations: Histograms, bar plots, and time series chart.\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## III - [class_3- Stats Review](https://github.com/Quantum-Software-Development/specialized-consulting-data-mining/tree/86d9d9fbc56efdd0b8e377955c1c7abf8879b775/class_3%20-%20Stats%20Review)\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access](https://github.com/Quantum-Software-Development/2_3-DataMining_Statistical_Review)  Class_3\n\u003e \n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n## IV - [class_4- Data Mining - Concepts - Exploratory Analysis]()\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access](https://github.com/Quantum-Software-Development/4-DataMining_Concepts_ExploratoryAnalysis)  Class_4\n\u003e \n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## V - [class_5- Data Cleaning - Preparation - Anomalies(Outliers)]()\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access](https://github.com/Quantum-Software-Development/5-DataMining_DataCleaning_Preparation_Anomalies_Outlier)  Class_5\n\u003e \n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## VI - [class_6- Data Mining - Pre Processing]()\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access](https://github.com/Quantum-Software-Development/6-DataMining_Pre-Processing)  Class_6\n\u003e \n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## VII - [class_7- Normalization](https://github.com/Quantum-Software-Development/1-DataMining_Main_Repository/tree/b555158e64f626fda67229fcc80bff665090c876/class_7-Normalization_Code)\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access]()  Class_7\n\u003e\n\u003e ⚠️ Coming Soon\n\u003e \n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## VIII - [class_8 - KMeans_NonHierarchical_Clustering](https://github.com/Quantum-Software-Development/1-DataMining_Main_Repository/tree/cd4d463e1745f2778db4d69e7faade4bfbc00c05/class_8-KMeans_NonHierarchical_Clustering)\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access](https://github.com/Quantum-Software-Development/1-DataMining_Main_Repository/tree/cd4d463e1745f2778db4d69e7faade4bfbc00c05/class_8-KMeans_NonHierarchical_Clustering)   Class_8 - KMeans_NonHierarchical_Clustering\n\n\u003e\n\u003e ⚠️ Coming Soon\n\u003e \n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n## IX - [lass_8 - KMeans_NonHierarchical_Clustering](https://github.com/Quantum-Software-Development/1-DataMining_Main_Repository/tree/cd4d463e1745f2778db4d69e7faade4bfbc00c05/class_8-KMeans_NonHierarchical_Clustering)\n\n\u003cbr\u003e\n\n\u003e [!TIP]\n\u003e \n\u003e [Access]()  Class_8\n\u003e\n\u003e ⚠️ Coming Soon\n\u003e \n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\u003cbr\u003e\u003cbr\u003e\n\u003cbr\u003e\u003cbr\u003e\n\u003cbr\u003e\u003cbr\u003e\n\u003cbr\u003e\u003cbr\u003e\n\u003cbr\u003e\u003cbr\u003e\n\n\n\u003c!-- ========================== [Bibliographr ====================  --\u003e\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## [Bibliography]()\n\n[1](). **Castro, L. N. \u0026 Ferrari, D. G.** (2016). *Introduction to Data Mining: Basic Concepts, Algorithms, and Applications*. Saraiva.\n\n[2](). **Ferreira, A. C. P. L. et al.** (2024). *Artificial Intelligence – A Machine Learning Approach*. 2nd Ed. LTC.\n\n[3](). **Larson \u0026 Farber** (2015). *Applied Statistics*. Pearson.\n\n\n\u003cbr\u003e\n\n### [Complementary Bibliography]()\n\n- THOMAS, C. *Data Mining*. IntechOpen, 2018.  \n- HUTTER, F.; KOTTHOFF, L.; VANSCHOREN, J. *Automated Machine Learning: Methods, Systems, Challenges*. Springer Nature, 2019.  \n- NETTO, A.; MACIEL, F. *Python para Data Science e Machine Learning Descomplicado*. Alta Books, 2021.  \n- RUSSELL, S. J.; NORVIG, P. *Artificial Intelligence: A Modern Approach*. GEN LTC, 2022.  \n- SUD, K.; ERDOGMUS, P.; KADRY, S. *Introduction to Data Science and Machine Learning*. IntechOpen, 2020.\n\n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n      \n\u003c!-- ======================================= Bibliography Portugues ===========================================  --\u003e\n\n\u003c!--\n\n## [Bibliography]()\n\n\n[1](). **Castro, L. N. \u0026 Ferrari, D. G.** (2016). *Introdução à mineração de dados: conceitos básicos, algoritmos e aplicações*. Saraiva.\n\n[2](). **Ferreira, A. C. P. L. et al.** (2024). *Inteligência Artificial - Uma Abordagem de Aprendizado de Máquina*. 2nd Ed. LTC.\n\n[3](). **Larson \u0026 Farber** (2015). *Estatística Aplicada*. Pearson.\n\n\n\u003cbr\u003e\u003cbr\u003e\n--\u003e\n\n\u003c!-- ======================================= Start Footer ===========================================  --\u003e\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## 💌 [Let the data flow... Ping Me !](mailto:fabicampanari@proton.me)\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n#### \u003cp align=\"center\"\u003e  🛸๋ My Contacts [Hub](https://linktr.ee/fabianacampanari)\n\n\n\u003cbr\u003e\n\n### \u003cp align=\"center\"\u003e \u003cimg src=\"https://github.com/user-attachments/assets/517fc573-7607-4c5d-82a7-38383cc0537d\" /\u003e\n\n\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e  ────────────── 🔭⋆ ──────────────\n\n\n\u003cp align=\"center\"\u003e ➣➢➤ \u003ca href=\"#top\"\u003eBack to Top \u003c/a\u003e\n\n\u003c!--\n\u003cp align=\"center\"\u003e  ────────────── ✦ ──────────────\n--\u003e\n\n\n\n\u003c!-- Programmers and artists are the only professionals whose hobby is their profession.\"\n\n\" I love people who are committed to transforming the world \"\n\n\" I'm big fan of those who are making waves in the world! \"\n\n##### \u003cp align=\"center\"\u003e( Rafael Lain ) \u003c/p\u003e   --\u003e\n\n#\n\n###### \u003cp align=\"center\"\u003e Copyright 2025 Quantum Software Development. Code released under the [MIT License license.](https://github.com/Quantum-Software-Development/Math/blob/3bf8270ca09d3848f2bf22f9ac89368e52a2fb66/LICENSE)\n\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantum-software-development%2F1-datamining_main_repository","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantum-software-development%2F1-datamining_main_repository","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantum-software-development%2F1-datamining_main_repository/lists"}