{"id":31389155,"url":"https://github.com/quantum-software-development/2-datamining_statistical_measures","last_synced_at":"2025-10-09T04:37:26.122Z","repository":{"id":309762585,"uuid":"1037470953","full_name":"Quantum-Software-Development/2-DataMining_Statistical_Measures","owner":"Quantum-Software-Development","description":"2- DataMining  - Statistical Review - Stats  Measures- Mean - Median - Mode - Variance","archived":false,"fork":false,"pushed_at":"2025-09-17T15:38:58.000Z","size":4706,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-28T23:59:22.961Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Quantum-Software-Development.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"Quantum-Software-Development","Custom":"https://github.com/sponsors/Quantum-Software-Development/card"}},"created_at":"2025-08-13T16:11:33.000Z","updated_at":"2025-09-17T15:39:00.000Z","dependencies_parsed_at":"2025-08-31T04:11:44.556Z","dependency_job_id":"e2e7ad3d-2000-4ce3-97ed-9c35e24cdfe0","html_url":"https://github.com/Quantum-Software-Development/2-DataMining_Statistical_Measures","commit_stats":null,"previous_names":["quantum-software-development/intro-data-mining-python-class1","quantum-software-development/class_2-and-3-intro-data-mining-python","quantum-software-development/class_2-and-3-data-mining-python","quantum-software-development/2-3-data-mining-review-statistical-methods","quantum-software-development/2-3-datamining_statistical_review","quantum-software-development/2-datamining_statistical_measures"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Quantum-Software-Development/2-DataMining_Statistical_Measures","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F2-DataMining_Statistical_Measures","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F2-DataMining_Statistical_Measures/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F2-DataMining_Statistical_Measures/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F2-DataMining_Statistical_Measures/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Quantum-Software-Development","download_url":"https://codeload.github.com/Quantum-Software-Development/2-DataMining_Statistical_Measures/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Quantum-Software-Development%2F2-DataMining_Statistical_Measures/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000705,"owners_count":26082921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-28T23:59:17.776Z","updated_at":"2025-10-09T04:37:26.092Z","avatar_url":"https://github.com/Quantum-Software-Development.png","language":"Jupyter Notebook","readme":"\n\u003cbr\u003e\n\n**\\[[🇧🇷 Português](README.pt_BR.md)\\] \\[**[🇺🇸 English](README.md)**\\]**\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n# \u003cp align=\"center\"\u003e  2- [Data Mining]() / [Statistic Review - Stats Measures - Mean - Median - Mode - Variance]()\n\n\n\n\u003c!-- =======================================END DEFAULT HEADER ===========================================  --\u003e\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n[**Institution:**]() Pontifical Catholic University of São Paulo (PUC-SP)  \n[**School:**]() Faculty of Interdisciplinary Studies  \n[**Program:**]() Humanistic AI and Data Science\n[**Semester:**]() 2nd Semester 2025  \nProfessor:  [***Professor Doctor in Mathematics Daniel Rodrigues da Silva***](https://www.linkedin.com/in/daniel-rodrigues-048654a5/)\n\n\u003cbr\u003e\u003cbr\u003e\n\n#### \u003cp align=\"center\"\u003e [![Sponsor Quantum Software Development](https://img.shields.io/badge/Sponsor-Quantum%20Software%20Development-brightgreen?logo=GitHub)](https://github.com/sponsors/Quantum-Software-Development)\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\u003c!--Confidentiality statement --\u003e\n\n#\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\u003e [!IMPORTANT]\n\u003e \n\u003e ⚠️ Heads Up\n\u003e\n\u003e * Projects and deliverables may be made [publicly available]() whenever possible.\n\u003e * The course emphasizes [**practical, hands-on experience**]() with real datasets to simulate professional consulting scenarios in the fields of **Data Analysis and Data Mining** for partner organizations and institutions affiliated with the university.\n\u003e * All activities comply with the [**academic and ethical guidelines of PUC-SP**]().\n\u003e * Any content not authorized for public disclosure will remain [**confidential**]() and securely stored in [private repositories]().  \n\u003e\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n#\n\n\u003c!--END--\u003e\n\n\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\n\n\u003c!-- PUC HEADER GIF\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/user-attachments/assets/0d6324da-9468-455e-b8d1-2cce8bb63b06\" /\u003e\n--\u003e\n\n\n\u003c!-- video presentation --\u003e\n\n\n##### 🎶 Prelude Suite no.1 (J. S. Bach) - [Sound Design Remix]()\n\nhttps://github.com/user-attachments/assets/4ccd316b-74a1-4bae-9bc7-1c705be80498\n\n####  📺 For better resolution, watch the video on [YouTube.](https://youtu.be/_ytC6S4oDbM)\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\u003e [!TIP]\n\u003e \n\u003e  This repository is a review of the Statistics course from the undergraduate program Humanities, AI and Data Science at PUC-SP.\n\u003e\n\u003e   ### ☞ **Access Data Mining [Main Repository](https://github.com/Quantum-Software-Development/1-Main_DataMining_Repository)**\n\u003e \n\u003e  If you’d like to explore the full materials from the 1st year (not only the review), you can visit the complete repository [here](https://github.com/FabianaCampanari/PracticalStats-PUCSP-2024).\n\u003e\n\u003e\n\n\n\n\u003c!-- =======================================END DEFAULT HEADER ===========================================  --\u003e\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n##  [Overview]()\n\n\u003cbr\u003e\n\nThis repository contains materials and examples for the **Introduction to Data Mining with Python Class 1** course, focusing on fundamental statistical concepts and data analysis techniques essential for data mining applications.\n\n\u003cbr\u003e\n\n## Repository Structure\n\n```\n├── data/                 # Sample datasets\n├── notebooks/           # Jupyter notebooks with examples\n├── scripts/             # Python scripts for analysis\n├── images/              # Generated plots and visualizations\n└── docs/                # Additional documentation\n```\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## Getting Started\n\n### Prerequisites:\n\n- Python 3.7+\n- Required libraries: pandas, numpy, matplotlib, seaborn, scikit-learn\n\n\u003cbr\u003e\n\n### Installation:\n```bash\npip install pandas numpy matplotlib seaborn scikit-learn jupyter\n```\n\n\u003cbr\u003e\n\n### Quick Start:\n\n\u003cbr\u003e\n\n```python\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# Load sample data\ndata = [50, 40, 41, 17, 11, 7, 22, 44, 28, 21, 19, 23, 37, 51, 54, 42, 86,\n        41, 78, 56, 72, 56, 17, 7, 69, 30, 80, 56, 29, 33, 46, 31, 39, 20,\n        18, 29, 34, 59, 73, 77, 36, 39, 30, 62, 54, 67, 39, 31, 53, 44]\n\n# Create histogram\nplt.figure(figsize=(10, 6))\nplt.hist(data, bins=7, edgecolor='black')\nplt.title('Internet Usage Distribution')\nplt.xlabel('Minutes Online')\nplt.ylabel('Frequency')\nplt.show()\n\n# Calculate statistics\nprint(f\"Mean: {np.mean(data):.2f}\")\nprint(f\"Median: {np.median(data):.2f}\")\nprint(f\"Standard Deviation: {np.std(data):.2f}\")\n```\n\n\u003cbr\u003e\u003cbr\u003e\n\n## Key Learning Outcomes\n\nAfter completing this course, students will be able to:\n\n1. **Construct and interpret frequency distributions** from raw data\n2. **Create various types of histograms** and understand their relationship to frequency distributions\n3. **Identify and handle outliers** in datasets\n4. **Analyze distribution shapes** and their implications\n5. **Calculate and interpret central tendency measures**\n6. **Apply statistical concepts** to data mining problems\n7. **Use Python tools** for statistical analysis and visualization\n\n\u003cbr\u003e\n\n## Important Notes\n\n- **Outliers require careful consideration** - they may represent valuable insights or data quality issues\n- **Histogram bins should be chosen thoughtfully** - too few may hide patterns, too many may create noise\n- **Frequency distributions are fundamental** to understanding data structure before applying advanced data mining techniques\n- **Visual analysis complements numerical statistics** for comprehensive data understanding\n\n\u003cbr\u003e\n\n*This material is part of the Introduction to Data Mining with Python course, focusing on fundamental statistical concepts essential for effective data analysis and mining.*\n\n\u003cbr\u003e\n\n## [Class_1 Content]()\n\n### Syllabus (Ementa)\n\n- **Descriptive Statistics Review**\n- **Data Mining Concepts**\n- **Exploratory Data Analysis**\n- **Predictive Analysis**\n- **Clustering**\n- **Association Rules**\n\n\u003cbr\u003e\n\n### [Assessment Criteria]()\n\n- Minimum 75% attendance required\n- Final grade ≥ 5.0\n- Formula: MF = (N₁ + N₂)/2, where Nᵢ = (Pᵢ + Aᵢ)/2\n  - Pᵢ = Project grade for semester i\n  - Aᵢ = Activity/exam grade for semester i\n\n\u003cbr\u003e\n\n## [Key Topics Covered]()\n\n\u003cbr\u003e\n\n### [1](). Frequency Distribution\n\nA **frequency distribution** is a table that shows classes or intervals of data with a count of the number of entries in each class. It's fundamental for understanding data patterns and is the foundation for creating histograms.\n\n\u003cbr\u003e \n\n#### [Components]():\n\n- **Class limits**: Lower and upper boundaries of each class\n- **Class size**: The width of each class interval\n- **Frequency (f)**: Number of data entries in each class\n- **Relative frequency**: Proportion of data in each class (f/n)\n- **Cumulative frequency**: Sum of frequencies up to a given class\n\n\u003cbr\u003e \n\n#### Construction Steps:\n1. Decide the number of classes (typically 5-20)\n2. Calculate class size: (max - min) / number of classes\n3. Determine class limits\n4. Count frequencies for each class\n5. Calculate additional measures (relative, cumulative frequencies)\n\n\u003cbr\u003e\u003cbr\u003e \n\n### 2. Histograms and Their Relationship to Frequency Distributions\n\n\n**Histograms are vectorially related to frequency distributions** - they are the graphical representation of frequency distribution tables.\n\n\u003cbr\u003e \n\n#### Key Characteristics:\n\n- **Bar chart** representing frequency distribution\n- **Horizontal axis**: Quantitative data values (class boundaries)\n- **Vertical axis**: Frequencies or relative frequencies\n- **Consecutive bars must touch** (unlike regular bar charts)\n- **Class boundaries**: Numbers that separate classes without gaps\n\n\u003cbr\u003e \n\n#### Types of Histograms:\n\n1. **Frequency Histogram**: Shows absolute frequencies\n2. **Relative Frequency Histogram**: Shows proportions/percentages\n3. **Frequency Polygon**: Line graph emphasizing continuous change\n\n\u003cbr\u003e\u003cbr\u003e \n\n### 3. Outliers in Histograms  \n\n**Outliers, by definition, have few values** and can represent various phenomena:\n\n\u003cbr\u003e\n\n#### What Outliers May Indicate:\n\n- **Data entry errors** (typing mistakes)\n- **Measurement errors**\n- **Fraudulent activities**\n- **Genuine extreme values**\n- **Equipment malfunctions**\n\n\u003cbr\u003e\n\n#### Impact on Histograms:\n- **Generate few bars** (sparse representation)\n- **Create gaps** in the distribution\n- **Skew the overall pattern**\n- **Affect central tendency measures**\n- **May require special handling** in analysis\n\n\u003cbr\u003e\n\n#### Outlier Detection in Histograms:\n- Visible as **isolated bars** far from main distribution\n- **Large gaps** between bars\n- **Extremely tall or short bars** at distribution extremes\n- **Asymmetric patterns** in otherwise normal distributions\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 4. Distribution Shapes\n\nUnderstanding distribution shapes helps identify data characteristics:\n\n#### Symmetric Distribution:\n\n- Mean ≈ Median ≈ Mode\n- Bell-shaped or uniform patterns\n- Equal spread on both sides\n\n\u003cbr\u003e\n\n#### Left-Skewed (Negatively Skewed):\n\n- Mean \u003c Median \u003c Mode\n- Tail extends to the left\n- Few extremely low values\n\n\u003cbr\u003e\n\n#### Right-Skewed (Positively Skewed):\n\n- Mode \u003c Median \u003c Mean\n- Tail extends to the right\n- Few extremely high values\n\n\u003cbr\u003e\n\n#### Uniform Distribution:\n\n- All classes have equal frequencies\n- Rectangular shape in histogram\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 5. Central Tendency Measures\n\n#### Mean (μ or x̄):\n- Sum of all values divided by count\n- Most affected by outliers\n- Uses all data points\n\n#### Median:\n- Middle value when data is ordered\n- Less affected by outliers\n- Robust measure\n\n#### Mode:\n- Most frequently occurring value\n- May not exist or may be multiple\n- Good for categorical data\n\n\u003cbr\u003e\u003cbr\u003e\n\n### 6. Practical Applications\n\n#### Data Mining Context:\n- **Pattern Recognition**: Identifying data distributions\n- **Anomaly Detection**: Finding outliers\n- **Data Quality Assessment**: Checking for errors\n- **Feature Engineering**: Understanding variable distributions\n- **Model Selection**: Choosing appropriate algorithms based on data distribution\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n#### Python Implementation Examples:\n\n\u003cbr\u003e\n\n```python\nimport matplotlib.pyplot as plt\nimport numpy as np\n\n# Create frequency distribution\ndef create_frequency_distribution(data, num_classes=7):\n    min_val, max_val = min(data), max(data)\n    class_size = (max_val - min_val) / num_classes\n    \n    # Define class boundaries\n    boundaries = [min_val + i * class_size for i in range(num_classes + 1)]\n    \n    # Count frequencies\n    frequencies = []\n    for i in range(num_classes):\n        count = sum(1 for x in data if boundaries[i] \u003c= x \u003c boundaries[i+1])\n        frequencies.append(count)\n    \n    return boundaries, frequencies\n\n# Create histogram\ndef plot_histogram(data, title=\"Frequency Distribution\"):\n    plt.figure(figsize=(10, 6))\n    plt.hist(data, bins=7, edgecolor='black', alpha=0.7)\n    plt.title(title)\n    plt.xlabel('Values')\n    plt.ylabel('Frequency')\n    plt.grid(True, alpha=0.3)\n    plt.show()\n```\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\n## [Exemple 1]() - Finding the Mean of a Frequency Distribution\n\n\n### [Step-by-Step Instructions]()\n\n### In Words \\\u0026 In Symbols](\n\n\u003cbr\u003e\n\n\n| In Words | In Symbols |\n| :-- | :-- |\n| 1. Find the midpoint of each class. | \\$ x = \\frac{lower limit + upper limit}{2} \\$ |\n| 2. Multiply each midpoint by its class frequency and sum the results. | \\$ \\sum (x \\cdot f) \\$ |\n| 3. Find the sum of all frequencies. | \\$ n = \\sum f \\$ |\n| 4. Calculate the mean by dividing the sum from step 2 by step 3. | \\$ \\bar{x} = \\frac{\\sum (x \\cdot f)}{n} \\$ \n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Example](): Finding the Mean of a Frequency Distribution\n\nUse the frequency distribution below to approximate the average number of minutes that a sample of internet users spent connected in their last session.\n\n\u003cbr\u003e\n\n| Class | Midpoint ($x$) | Frequency ($f$) |\n| :-- | :--: | :--: |\n| 7 – 18 | 12.5 | 6 |\n| 19 – 30 | 24.5 | 10 |\n| 31 – 42 | 36.5 | 13 |\n| 43 – 54 | 48.5 | 8 |\n| 55 – 66 | 60.5 | 5 |\n| 67 – 78 | 72.5 | 6 |\n| 79 – 90 | 84.5 | 2 |\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Let's compute the products and their sumLet's compute the products and their sum:\n\n\n| Class | Midpoint ($x$) | Frequency ($f$) | $x \\cdot f$ |\n| :-- | :-- | :-- | :-- |\n| 7 – 18 | 12.5 | 6 | 75.0 |\n| 19 – 30 | 24.5 | 10 | 245.0 |\n| 31 – 42 | 36.5 | 13 | 474.5 |\n| 43 – 54 | 48.5 | 8 | 388.0 |\n| 55 – 66 | 60.5 | 5 | 302.5 |\n| 67 – 78 | 72.5 | 6 | 435.0 |\n| 79 – 90 | 84.5 | 2 | 169.0 |\n| **Total** |  | **50** | **2089.0** |:\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n### [Therefore, the mean is]():\n\n\u003cbr\u003e\u003cbr\u003e\n\n$$\n\\Huge\n\\bar{x} = \\frac{\\sum (x \\cdot f)}{n} = \\frac{2089}{50} \\approx 41.8 \\text{ minutes}\n$$\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n```latex\n\\Huge\n\\bar{x} = \\frac{\\sum (x \\cdot f)}{n} = \\frac{2089}{50} \\approx 41.8 \\text{ minutes}\n```\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Shapes of Distributions]()\n\n### Symmetrical Distribution\n\n- A vertical line can be drawn at the middle of the graph, and the halves are nearly identical.\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Shapes of Distributions]()\n\n\u003cbr\u003e\n\n### Symmetrical Distribution\n\n- A vertical line can be drawn at the middle of the graph, and the halves are nearly identical.\n\n\n\u003cbr\u003e\n\n\n### [Uniform Distribution]() (Rectangular)\n\n- All entries have equal or nearly equal frequencies.\n- The distribution is symmetric.\n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Left-Skewed Distribution]() (Negatively Skewed)\n\n- The \"tail\" of the graph extends more to the left.\n- The mean is to the left of the median.\n\n\n\u003cbr\u003e\n\n\n### [Right-Skewed Distribution]() (Positively Skewed)\n\n- The \"tail\" of the graph extends more to the right.\n- The mean is to the right of the median.\n\n\n\u003cbr\u003e\n\n\n### [Finding the Weighted Mean]()\n\nSometimes, the mean is calculated considering different \"weights\" for each value.\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## [Exemple 2]() \n\n\u003cbr\u003e\n\n### [A student's grade is determined based on 5 sources]():\n\n- 50% for the average of exams\n- 15% for the midterm exam\n- 20% for the final exam\n- 10% for computer lab work\n- 5% for homework\n\n\u003cbr\u003e\n\n### [Suppose your grades are]():\n\n- Exam average: 86\n- Midterm: 96\n- Final Exam: 82\n- Lab: 98\n- Homework: 100\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n### [Weighted Mean Calculation Table](()\n\n| Source | Grade ($x$) | Weight ($w$) | $x \\cdot w$ |\n| :-- | :--: | :--: | :--: |\n| Exam Average | 86 | 0.50 | 43.0 |\n| Midterm | 96 | 0.15 | 14.4 |\n| Final Exam | 82 | 0.20 | 16.4 |\n| Lab | 98 | 0.10 | 9.8 |\n| Homework | 100 | 0.05 | 5.0 |\n| **Sum** |  | **1** | **88.6** |\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n$$\n\\Huge\n\\bar{x} = \\frac{\\sum (x \\cdot w)}{\\sum w} = \\frac{88.6}{1} = 88.6\n$$\n\n\n\u003cbr\u003e\n\n```latex\n\\Huge\n\\bar{x} = \\frac{\\sum (x \\cdot w)}{\\sum w} = \\frac{88.6}{1} = 88.6\n```\n\n\n\u003cbr\u003e\u003cbr\u003e\n\nSo, the student did [**not**]() get an [A (minimum required is 90)]().\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## [Mean of Grouped Data]()\n\nThe mean of a frequency distribution is calculated as:\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n$$\n\\Huge\n\\bar{x} = \\frac{\\sum (x \\cdot f)}{n}\n$$\n\n\n\u003cbr\u003e\n\n\n```latex\n\\Huge\n\\bar{x} = \\frac{\\sum (x \\cdot f)}{n}\n```\n\n\u003cbr\u003e\u003cbr\u003e\n\n\nWhere [x]() is the class midpoint and [f]() is the frequency of the class.\n\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n\u003c!-- ========================== [Bibliographr ====================  --\u003e\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n## [Bibliography]()\n\n[1](). **Castro, L. N. \u0026 Ferrari, D. G.** (2016). *Introduction to Data Mining: Basic Concepts, Algorithms, and Applications*. Saraiva.\n\n[2](). **Ferreira, A. C. P. L. et al.** (2024). *Artificial Intelligence – A Machine Learning Approach*. 2nd Ed. LTC.\n\n[3](). **Larson \u0026 Farber** (2015). *Applied Statistics*. Pearson.\n\n\n\u003cbr\u003e\u003cbr\u003e\n\n      \n\u003c!-- ======================================= Bibliography Portugues ===========================================  --\u003e\n\n\u003c!--\n\n## [Bibliography]()\n\n\n[1](). **Castro, L. N. \u0026 Ferrari, D. G.** (2016). *Introdução à mineração de dados: conceitos básicos, algoritmos e aplicações*. Saraiva.\n\n[2](). **Ferreira, A. C. P. L. et al.** (2024). *Inteligência Artificial - Uma Abordagem de Aprendizado de Máquina*. 2nd Ed. LTC.\n\n[3](). **Larson \u0026 Farber** (2015). *Estatística Aplicada*. Pearson.\n\n\n\u003cbr\u003e\u003cbr\u003e\n--\u003e\n\n\u003c!-- ======================================= Start Footer ===========================================  --\u003e\n\n\n\n\n\n## 💌 [Let the data flow... Ping Me !](mailto:fabicampanari@proton.me)\n\n\u003cbr\u003e\u003cbr\u003e\n\n\n\n#### \u003cp align=\"center\"\u003e  🛸๋ My Contacts [Hub](https://linktr.ee/fabianacampanari)\n\n\n\u003cbr\u003e\n\n### \u003cp align=\"center\"\u003e \u003cimg src=\"https://github.com/user-attachments/assets/517fc573-7607-4c5d-82a7-38383cc0537d\" /\u003e\n\n\n\n\n\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\n\n\u003cp align=\"center\"\u003e  ────────────── 🔭⋆ ──────────────\n\n\n\u003cp align=\"center\"\u003e ➣➢➤ \u003ca href=\"#top\"\u003eBack to Top \u003c/a\u003e\n\n\u003c!--\n\u003cp align=\"center\"\u003e  ────────────── ✦ ──────────────\n--\u003e\n\n\n\n\u003c!-- Programmers and artists are the only professionals whose hobby is their profession.\"\n\n\" I love people who are committed to transforming the world \"\n\n\" I'm big fan of those who are making waves in the world! \"\n\n##### \u003cp align=\"center\"\u003e( Rafael Lain ) \u003c/p\u003e   --\u003e\n\n#\n\n###### \u003cp align=\"center\"\u003e Copyright 2025 Quantum Software Development. Code released under the [MIT License license.](https://github.com/Quantum-Software-Development/Math/blob/3bf8270ca09d3848f2bf22f9ac89368e52a2fb66/LICENSE)\n\n\n\n\n\n\n\n\n\n\n","funding_links":["https://github.com/sponsors/Quantum-Software-Development","https://github.com/sponsors/Quantum-Software-Development/card"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantum-software-development%2F2-datamining_statistical_measures","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquantum-software-development%2F2-datamining_statistical_measures","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquantum-software-development%2F2-datamining_statistical_measures/lists"}