{"id":15595046,"url":"https://github.com/simonneutert/wine_quality_data","last_synced_at":"2025-05-09T00:52:16.612Z","repository":{"id":49388542,"uuid":"105995107","full_name":"simonneutert/wine_quality_data","owner":"simonneutert","description":null,"archived":false,"fork":false,"pushed_at":"2017-12-15T12:16:26.000Z","size":5422,"stargazers_count":3,"open_issues_count":0,"forks_count":5,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-09T00:52:09.623Z","etag":null,"topics":["data-mining","data-science","data-visualization","jupyter","jupyter-notebooks","pandas","plotting","python","python-3-6","python3","wine","wine-quality"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simonneutert.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-06T11:01:49.000Z","updated_at":"2022-04-03T04:02:31.000Z","dependencies_parsed_at":"2022-07-26T13:32:02.460Z","dependency_job_id":null,"html_url":"https://github.com/simonneutert/wine_quality_data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonneutert%2Fwine_quality_data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonneutert%2Fwine_quality_data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonneutert%2Fwine_quality_data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simonneutert%2Fwine_quality_data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simonneutert","download_url":"https://codeload.github.com/simonneutert/wine_quality_data/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253171233,"owners_count":21865289,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-mining","data-science","data-visualization","jupyter","jupyter-notebooks","pandas","plotting","python","python-3-6","python3","wine","wine-quality"],"created_at":"2024-10-03T00:42:47.105Z","updated_at":"2025-05-09T00:52:16.589Z","avatar_url":"https://github.com/simonneutert.png","language":"Jupyter Notebook","readme":"\n# Wine Quality Data Analysis\n\nhttps://archive.ics.uci.edu/ml/datasets/Wine+Quality\n\nCitation Request:\n  This dataset is public available for research. The details are described in [Cortez et al., 2009].\n  Please include this citation if you plan to use this database:\n\n  P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.\n  Modeling wine preferences by data mining from physicochemical properties.\n  In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.\n\n  Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016\n                [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf\n                [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib\n\n1. Title: Wine Quality\n\n2. Sources\n   Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009\n\n3. Past Usage:\n\n  P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.\n  Modeling wine preferences by data mining from physicochemical properties.\n  In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.\n\n  In the above reference, two datasets were created, using red and white wine samples.\n  The inputs include objective tests (e.g. PH values) and the output is based on sensory data\n  (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality\n  between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model\n  these datasets under a regression approach. The support vector machine model achieved the\n  best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T),\n  etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity\n  analysis procedure).\n\n4. Relevant Information:\n\n   The two datasets are related to red and white variants of the Portuguese \"Vinho Verde\" wine.\n   For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].\n   Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables\n   are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).\n\n   These datasets can be viewed as classification or regression tasks.\n   The classes are ordered and not balanced (e.g. there are munch more normal wines than\n   excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent\n   or poor wines. Also, we are not sure if all input variables are relevant. So\n   it could be interesting to test feature selection methods.\n\n5. Number of Instances: red wine - 1599; white wine - 4898.\n\n6. Number of Attributes: 11 + output attribute\n\n   Note: several of the attributes may be correlated, thus it makes sense to apply some sort of\n   feature selection.\n\n7. Attribute information:\n\n   For more information, read [Cortez et al., 2009].\n\n   Input variables (based on physicochemical tests):\n   1 - fixed acidity\n   2 - volatile acidity\n   3 - citric acid\n   4 - residual sugar\n   5 - chlorides\n   6 - free sulfur dioxide\n   7 - total sulfur dioxide\n   8 - density\n   9 - pH\n   10 - sulphates\n   11 - alcohol\n   Output variable (based on sensory data):\n   12 - quality (score between 0 and 10)\n\n8. Missing Attribute Values: None\n\n# content of the current directory:\n\n\n```python\n% ls\n```\n\n    Readme.md                    winequality-red.csv\r\n    Wine Quality Analysis.ipynb  winequality-white.csv\r\n    Wine+Quality+Analysis.html   winequality.names.txt\r\n\n\n\n```python\n# import libs\n%matplotlib inline\nimport pandas as pd\nimport seaborn as sns; sns.set(style=\"whitegrid\", palette=\"muted\")\nimport numpy as np\nimport matplotlib.pyplot as plt\n```\n\n### Create DataFrames for white and red wines\n\n\n```python\nwhite_wine_df = pd.read_csv('winequality-white.csv', sep=\";\")\nred_wine_df = pd.read_csv('winequality-red.csv', sep=\";\")\n```\n\n### DataFrames for red and white wines combined\n\n\n```python\nww = white_wine_df.loc[:]\nww[\"color\"] = \"white\"\nrw = red_wine_df.loc[:]\nrw[\"color\"] = \"red\"\nwine_df = pd.concat([ww, rw], ignore_index=True)\n```\n\n# Data\n\n\n```python\nwhite_wine_df.head()\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003efixed acidity\u003c/th\u003e\n      \u003cth\u003evolatile acidity\u003c/th\u003e\n      \u003cth\u003ecitric acid\u003c/th\u003e\n      \u003cth\u003eresidual sugar\u003c/th\u003e\n      \u003cth\u003echlorides\u003c/th\u003e\n      \u003cth\u003efree sulfur dioxide\u003c/th\u003e\n      \u003cth\u003etotal sulfur dioxide\u003c/th\u003e\n      \u003cth\u003edensity\u003c/th\u003e\n      \u003cth\u003epH\u003c/th\u003e\n      \u003cth\u003esulphates\u003c/th\u003e\n      \u003cth\u003ealcohol\u003c/th\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003ecolor\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e0\u003c/th\u003e\n      \u003ctd\u003e7.0\u003c/td\u003e\n      \u003ctd\u003e0.27\u003c/td\u003e\n      \u003ctd\u003e0.36\u003c/td\u003e\n      \u003ctd\u003e20.7\u003c/td\u003e\n      \u003ctd\u003e0.045\u003c/td\u003e\n      \u003ctd\u003e45.0\u003c/td\u003e\n      \u003ctd\u003e170.0\u003c/td\u003e\n      \u003ctd\u003e1.0010\u003c/td\u003e\n      \u003ctd\u003e3.00\u003c/td\u003e\n      \u003ctd\u003e0.45\u003c/td\u003e\n      \u003ctd\u003e8.8\u003c/td\u003e\n      \u003ctd\u003e6\u003c/td\u003e\n      \u003ctd\u003ewhite\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e1\u003c/th\u003e\n      \u003ctd\u003e6.3\u003c/td\u003e\n      \u003ctd\u003e0.30\u003c/td\u003e\n      \u003ctd\u003e0.34\u003c/td\u003e\n      \u003ctd\u003e1.6\u003c/td\u003e\n      \u003ctd\u003e0.049\u003c/td\u003e\n      \u003ctd\u003e14.0\u003c/td\u003e\n      \u003ctd\u003e132.0\u003c/td\u003e\n      \u003ctd\u003e0.9940\u003c/td\u003e\n      \u003ctd\u003e3.30\u003c/td\u003e\n      \u003ctd\u003e0.49\u003c/td\u003e\n      \u003ctd\u003e9.5\u003c/td\u003e\n      \u003ctd\u003e6\u003c/td\u003e\n      \u003ctd\u003ewhite\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e2\u003c/th\u003e\n      \u003ctd\u003e8.1\u003c/td\u003e\n      \u003ctd\u003e0.28\u003c/td\u003e\n      \u003ctd\u003e0.40\u003c/td\u003e\n      \u003ctd\u003e6.9\u003c/td\u003e\n      \u003ctd\u003e0.050\u003c/td\u003e\n      \u003ctd\u003e30.0\u003c/td\u003e\n      \u003ctd\u003e97.0\u003c/td\u003e\n      \u003ctd\u003e0.9951\u003c/td\u003e\n      \u003ctd\u003e3.26\u003c/td\u003e\n      \u003ctd\u003e0.44\u003c/td\u003e\n      \u003ctd\u003e10.1\u003c/td\u003e\n      \u003ctd\u003e6\u003c/td\u003e\n      \u003ctd\u003ewhite\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e7.2\u003c/td\u003e\n      \u003ctd\u003e0.23\u003c/td\u003e\n      \u003ctd\u003e0.32\u003c/td\u003e\n      \u003ctd\u003e8.5\u003c/td\u003e\n      \u003ctd\u003e0.058\u003c/td\u003e\n      \u003ctd\u003e47.0\u003c/td\u003e\n      \u003ctd\u003e186.0\u003c/td\u003e\n      \u003ctd\u003e0.9956\u003c/td\u003e\n      \u003ctd\u003e3.19\u003c/td\u003e\n      \u003ctd\u003e0.40\u003c/td\u003e\n      \u003ctd\u003e9.9\u003c/td\u003e\n      \u003ctd\u003e6\u003c/td\u003e\n      \u003ctd\u003ewhite\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e7.2\u003c/td\u003e\n      \u003ctd\u003e0.23\u003c/td\u003e\n      \u003ctd\u003e0.32\u003c/td\u003e\n      \u003ctd\u003e8.5\u003c/td\u003e\n      \u003ctd\u003e0.058\u003c/td\u003e\n      \u003ctd\u003e47.0\u003c/td\u003e\n      \u003ctd\u003e186.0\u003c/td\u003e\n      \u003ctd\u003e0.9956\u003c/td\u003e\n      \u003ctd\u003e3.19\u003c/td\u003e\n      \u003ctd\u003e0.40\u003c/td\u003e\n      \u003ctd\u003e9.9\u003c/td\u003e\n      \u003ctd\u003e6\u003c/td\u003e\n      \u003ctd\u003ewhite\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n\n```python\nred_wine_df.head()\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003efixed acidity\u003c/th\u003e\n      \u003cth\u003evolatile acidity\u003c/th\u003e\n      \u003cth\u003ecitric acid\u003c/th\u003e\n      \u003cth\u003eresidual sugar\u003c/th\u003e\n      \u003cth\u003echlorides\u003c/th\u003e\n      \u003cth\u003efree sulfur dioxide\u003c/th\u003e\n      \u003cth\u003etotal sulfur dioxide\u003c/th\u003e\n      \u003cth\u003edensity\u003c/th\u003e\n      \u003cth\u003epH\u003c/th\u003e\n      \u003cth\u003esulphates\u003c/th\u003e\n      \u003cth\u003ealcohol\u003c/th\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003ecolor\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e0\u003c/th\u003e\n      \u003ctd\u003e7.4\u003c/td\u003e\n      \u003ctd\u003e0.70\u003c/td\u003e\n      \u003ctd\u003e0.00\u003c/td\u003e\n      \u003ctd\u003e1.9\u003c/td\u003e\n      \u003ctd\u003e0.076\u003c/td\u003e\n      \u003ctd\u003e11.0\u003c/td\u003e\n      \u003ctd\u003e34.0\u003c/td\u003e\n      \u003ctd\u003e0.9978\u003c/td\u003e\n      \u003ctd\u003e3.51\u003c/td\u003e\n      \u003ctd\u003e0.56\u003c/td\u003e\n      \u003ctd\u003e9.4\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003ered\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e1\u003c/th\u003e\n      \u003ctd\u003e7.8\u003c/td\u003e\n      \u003ctd\u003e0.88\u003c/td\u003e\n      \u003ctd\u003e0.00\u003c/td\u003e\n      \u003ctd\u003e2.6\u003c/td\u003e\n      \u003ctd\u003e0.098\u003c/td\u003e\n      \u003ctd\u003e25.0\u003c/td\u003e\n      \u003ctd\u003e67.0\u003c/td\u003e\n      \u003ctd\u003e0.9968\u003c/td\u003e\n      \u003ctd\u003e3.20\u003c/td\u003e\n      \u003ctd\u003e0.68\u003c/td\u003e\n      \u003ctd\u003e9.8\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003ered\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e2\u003c/th\u003e\n      \u003ctd\u003e7.8\u003c/td\u003e\n      \u003ctd\u003e0.76\u003c/td\u003e\n      \u003ctd\u003e0.04\u003c/td\u003e\n      \u003ctd\u003e2.3\u003c/td\u003e\n      \u003ctd\u003e0.092\u003c/td\u003e\n      \u003ctd\u003e15.0\u003c/td\u003e\n      \u003ctd\u003e54.0\u003c/td\u003e\n      \u003ctd\u003e0.9970\u003c/td\u003e\n      \u003ctd\u003e3.26\u003c/td\u003e\n      \u003ctd\u003e0.65\u003c/td\u003e\n      \u003ctd\u003e9.8\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003ered\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e11.2\u003c/td\u003e\n      \u003ctd\u003e0.28\u003c/td\u003e\n      \u003ctd\u003e0.56\u003c/td\u003e\n      \u003ctd\u003e1.9\u003c/td\u003e\n      \u003ctd\u003e0.075\u003c/td\u003e\n      \u003ctd\u003e17.0\u003c/td\u003e\n      \u003ctd\u003e60.0\u003c/td\u003e\n      \u003ctd\u003e0.9980\u003c/td\u003e\n      \u003ctd\u003e3.16\u003c/td\u003e\n      \u003ctd\u003e0.58\u003c/td\u003e\n      \u003ctd\u003e9.8\u003c/td\u003e\n      \u003ctd\u003e6\u003c/td\u003e\n      \u003ctd\u003ered\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e7.4\u003c/td\u003e\n      \u003ctd\u003e0.70\u003c/td\u003e\n      \u003ctd\u003e0.00\u003c/td\u003e\n      \u003ctd\u003e1.9\u003c/td\u003e\n      \u003ctd\u003e0.076\u003c/td\u003e\n      \u003ctd\u003e11.0\u003c/td\u003e\n      \u003ctd\u003e34.0\u003c/td\u003e\n      \u003ctd\u003e0.9978\u003c/td\u003e\n      \u003ctd\u003e3.51\u003c/td\u003e\n      \u003ctd\u003e0.56\u003c/td\u003e\n      \u003ctd\u003e9.4\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n      \u003ctd\u003ered\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n\n```python\nassert white_wine_df.columns.all() == red_wine_df.columns.all()\n\",\".join(list(white_wine_df.columns))\n```\n\n\n\n\n    'fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,color'\n\n\n\n### test for null values and check correct datatypes\n\n\n```python\nassert white_wine_df.notnull().all().all()\nwhite_wine_df.info()\n```\n\n    \u003cclass 'pandas.core.frame.DataFrame'\u003e\n    RangeIndex: 4898 entries, 0 to 4897\n    Data columns (total 13 columns):\n    fixed acidity           4898 non-null float64\n    volatile acidity        4898 non-null float64\n    citric acid             4898 non-null float64\n    residual sugar          4898 non-null float64\n    chlorides               4898 non-null float64\n    free sulfur dioxide     4898 non-null float64\n    total sulfur dioxide    4898 non-null float64\n    density                 4898 non-null float64\n    pH                      4898 non-null float64\n    sulphates               4898 non-null float64\n    alcohol                 4898 non-null float64\n    quality                 4898 non-null int64\n    color                   4898 non-null object\n    dtypes: float64(11), int64(1), object(1)\n    memory usage: 497.5+ KB\n\n\n__no null values in white wine dataframe found__\n\n\n```python\nassert red_wine_df.notnull().all().all()\nred_wine_df.info()\n```\n\n    \u003cclass 'pandas.core.frame.DataFrame'\u003e\n    RangeIndex: 1599 entries, 0 to 1598\n    Data columns (total 13 columns):\n    fixed acidity           1599 non-null float64\n    volatile acidity        1599 non-null float64\n    citric acid             1599 non-null float64\n    residual sugar          1599 non-null float64\n    chlorides               1599 non-null float64\n    free sulfur dioxide     1599 non-null float64\n    total sulfur dioxide    1599 non-null float64\n    density                 1599 non-null float64\n    pH                      1599 non-null float64\n    sulphates               1599 non-null float64\n    alcohol                 1599 non-null float64\n    quality                 1599 non-null int64\n    color                   1599 non-null object\n    dtypes: float64(11), int64(1), object(1)\n    memory usage: 162.5+ KB\n\n\n__no null values in red wine dataframe found__\n\n### All datatypes are numeric.\n\n# Build categoricals\n\n\n```python\n# this can be crucial :)\nwhite_wine_df[\"color\"] = white_wine_df[\"color\"].astype(\"category\")\nred_wine_df[\"color\"] = red_wine_df[\"color\"].astype(\"category\")\nwine_df[\"color\"] = wine_df[\"color\"].astype(\"category\")\n```\n\n# Means\n\n### White Wines:\n\n\n```python\nwhite_wine_df.mean()\n```\n\n\n\n\n    fixed acidity             6.854788\n    volatile acidity          0.278241\n    citric acid               0.334192\n    residual sugar            6.391415\n    chlorides                 0.045772\n    free sulfur dioxide      35.308085\n    total sulfur dioxide    138.360657\n    density                   0.994027\n    pH                        3.188267\n    sulphates                 0.489847\n    alcohol                  10.514267\n    quality                   5.877909\n    dtype: float64\n\n\n\n### Red Wines:\n\n\n```python\nred_wine_df.mean()\n```\n\n\n\n\n    fixed acidity            8.319637\n    volatile acidity         0.527821\n    citric acid              0.270976\n    residual sugar           2.538806\n    chlorides                0.087467\n    free sulfur dioxide     15.874922\n    total sulfur dioxide    46.467792\n    density                  0.996747\n    pH                       3.311113\n    sulphates                0.658149\n    alcohol                 10.422983\n    quality                  5.636023\n    dtype: float64\n\n\n\n#### Differences between red and white wine means that are greater than 1.0\n\n\n```python\nmean_diff = white_wine_df.mean() - red_wine_df.mean()\nmean_diff_abs = mean_diff.apply(lambda x: abs(x))\nmean_diff[mean_diff_abs \u003e= 1.0]\n```\n\n\n\n\n    fixed acidity           -1.464850\n    residual sugar           3.852609\n    free sulfur dioxide     19.433163\n    total sulfur dioxide    91.892865\n    dtype: float64\n\n\n\n# Distribution of Quality\n\n\n```python\nsns.countplot(data=white_wine_df, x=\"quality\")\nsns.plt.title(\"White Wines\")\n```\n\n\n\n\n    \u003cmatplotlib.text.Text at 0x1112b1ac8\u003e\n\n\n\n\n![png](images/output_29_1.png)\n\n\n\n```python\nwhite_wine_df.quality.describe()\n```\n\n\n\n\n    count    4898.000000\n    mean        5.877909\n    std         0.885639\n    min         3.000000\n    25%         5.000000\n    50%         6.000000\n    75%         6.000000\n    max         9.000000\n    Name: quality, dtype: float64\n\n\n\n\n```python\nx = sns.countplot(data=red_wine_df, x=\"quality\")\nsns.plt.title(\"Red Wines\")\n```\n\n\n\n\n    \u003cmatplotlib.text.Text at 0x11144de80\u003e\n\n\n\n\n![png](images/output_31_1.png)\n\n\n\n```python\nred_wine_df.quality.describe()\n```\n\n\n\n\n    count    1599.000000\n    mean        5.636023\n    std         0.807569\n    min         3.000000\n    25%         5.000000\n    50%         6.000000\n    75%         6.000000\n    max         8.000000\n    Name: quality, dtype: float64\n\n\n\n# What may be important for a high quality rating?\n\n__To find out, the percentual mean differences for low quality to high quality wines over the total mean are calculated, resulting in percentual changes.__\n\n## For white wines:\n\n\n```python\nx = white_wine_df.groupby([\"quality\"]).mean()\nlower_quals = x.loc[:4].mean()\nhigher_quals = x.loc[7:].mean()\nww_perc_means = (higher_quals - lower_quals) / white_wine_df.mean() * 100\nww_perc_means\n```\n\n\n\n\n    alcohol                 14.068993\n    chlorides              -38.372538\n    citric acid              7.758984\n    density                 -0.254610\n    fixed acidity           -6.235608\n    free sulfur dioxide    -10.177344\n    pH                       1.934262\n    quality                       NaN\n    residual sugar          -8.100321\n    sulphates                1.999155\n    total sulfur dioxide   -18.439304\n    volatile acidity       -27.979143\n    dtype: float64\n\n\n\n### Comparing low quality means to high quality ones, the following attributes differ more than 5 per cent:\n\n\n```python\nww_perc_means[abs(ww_perc_means) \u003e 5]\n```\n\n\n\n\n    alcohol                 14.068993\n    chlorides              -38.372538\n    citric acid              7.758984\n    fixed acidity           -6.235608\n    free sulfur dioxide    -10.177344\n    residual sugar          -8.100321\n    total sulfur dioxide   -18.439304\n    volatile acidity       -27.979143\n    dtype: float64\n\n\n\n### Comparing low quality means to high quality ones, the following attributes differ more than 10 per cent:\n\n\n```python\nww_perc_means[abs(ww_perc_means) \u003e 10]\n```\n\n\n\n\n    alcohol                 14.068993\n    chlorides              -38.372538\n    free sulfur dioxide    -10.177344\n    total sulfur dioxide   -18.439304\n    volatile acidity       -27.979143\n    dtype: float64\n\n\n\n## For red wines:\n\n\n```python\nx = red_wine_df.groupby([\"quality\"]).mean()\nlower_quals = x.loc[:4].mean()\nhigher_quals = x.loc[7:].mean()\nrw_perc_means = (higher_quals - lower_quals) / red_wine_df.mean() * 100\nrw_perc_means\n```\n\n\n\n\n    alcohol                 16.023546\n    chlorides              -38.955960\n    citric acid             77.707371\n    density                 -0.134937\n    fixed acidity            7.811538\n    free sulfur dioxide     12.783852\n    pH                      -3.345302\n    quality                       NaN\n    residual sugar          -0.609713\n    sulphates               26.028988\n    total sulfur dioxide     7.875629\n    volatile acidity       -71.161438\n    dtype: float64\n\n\n\n### Comparing low quality means to high quality ones, the following attributes differ more than 5 per cent:\n\n\n```python\nrw_perc_means[abs(rw_perc_means) \u003e 5]\n```\n\n\n\n\n    alcohol                 16.023546\n    chlorides              -38.955960\n    citric acid             77.707371\n    fixed acidity            7.811538\n    free sulfur dioxide     12.783852\n    sulphates               26.028988\n    total sulfur dioxide     7.875629\n    volatile acidity       -71.161438\n    dtype: float64\n\n\n\n### Comparing low quality means to high quality ones, the following attributes differ more than 10 per cent:\n\n\n```python\nrw_perc_means[abs(rw_perc_means) \u003e 10]\n```\n\n\n\n\n    alcohol                16.023546\n    chlorides             -38.955960\n    citric acid            77.707371\n    free sulfur dioxide    12.783852\n    sulphates              26.028988\n    volatile acidity      -71.161438\n    dtype: float64\n\n\n\n# What will be taken a closer look at:\n\n* Alcohol\n* Chlorides\n* Citric Acid\n* Sulphates\n* Sulfur Dioxides\n* Volatile Acidity\n\n# Sulfur Dioxides and Quality\n\n### White Wines\n\n\n```python\nfig, (ax1, ax2) = plt.subplots(1,2)\nfig.set_size_inches(14.5, 4.5)\nfig.dpi = 300\nsns.stripplot(data=white_wine_df, x=\"quality\", y=\"total sulfur dioxide\", jitter=True, ax=ax1)\nsns.stripplot(data=white_wine_df, x=\"quality\", y=\"free sulfur dioxide\", jitter=True, ax=ax2)\n```\n\n\n\n\n    \u003cmatplotlib.axes._subplots.AxesSubplot at 0x114f6d4e0\u003e\n\n\n\n\n![png](images/output_49_1.png)\n\n\n\n```python\nhigh_qual_ww_tsd_mean = white_wine_df[white_wine_df[\"quality\"] \u003e= 7][\"total sulfur dioxide\"].mean()\nhigh_qual_ww_tsd_mean = format(high_qual_ww_tsd_mean, '.1f')\nprint(f\"The mean for higher quality white wines (quality \u003e= 7) is {high_qual_ww_tsd_mean}\")\n```\n\n    The mean for higher quality white wines (quality \u003e= 7) is 125.2\n\n\n### Interpretation White Wines\n\nBoth plots show, that higher quality white wines tend to have less total sulfur dioxide in it.\n\n### red wine\n\n\n```python\nfig, (ax1, ax2) = plt.subplots(1,2)\nfig.set_size_inches(14.5, 4.5)\nfig.dpi = 300\nsns.stripplot(data=red_wine_df, x=\"quality\", y=\"total sulfur dioxide\", jitter=True, ax=ax1)\nsns.stripplot(data=red_wine_df, x=\"quality\", y=\"free sulfur dioxide\", jitter=True, ax=ax2)\n```\n\n\n\n\n    \u003cmatplotlib.axes._subplots.AxesSubplot at 0x114e00cc0\u003e\n\n\n\n\n![png](images/output_53_1.png)\n\n\n\n```python\nhigh_qual_rw_tsd_mean = red_wine_df[red_wine_df[\"quality\"] \u003e= 7][\"total sulfur dioxide\"].mean()\nhigh_qual_rw_tsd_mean = format(high_qual_rw_tsd_mean, '.1f')\nprint(f\"The mean for higher quality red wines (quality \u003e= 7) is {high_qual_rw_tsd_mean}\")\n```\n\n    The mean for higher quality red wines (quality \u003e= 7) is 34.9\n\n\n## Interpretation Red Wines\n\nFor the red wines, there are much lower concentrations of sulfur dioxides. Additionally, there seems to be no direct correlation between sulfur dioxide concentration and percepted quality.\n\n# Conclusion: Sulfur Dioxides and Quality\n\nRegarding high quality white wines (\u003e= 7), those wines have a mean of sulfur dioxides of around 125. Respectively high quality Red Wines (\u003e=7) have a mean concentration of sulfur dioxide of 35.\n\n# Sulphates and Quality\n\n\n```python\nsns.stripplot(data=wine_df, x=\"quality\", y=\"sulphates\", jitter=True, hue=\"color\", split=True)\n```\n\n\n\n\n    \u003cmatplotlib.axes._subplots.AxesSubplot at 0x117849240\u003e\n\n\n\n\n![png](images/output_58_1.png)\n\n\n## Alcohol in Wine\n\n### White Wine\n\n\n```python\nwhite_wine_df.groupby(\"quality\")[\"alcohol\"].describe()\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003cth\u003e25%\u003c/th\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003cth\u003e75%\u003c/th\u003e\n      \u003cth\u003emax\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e20.0\u003c/td\u003e\n      \u003ctd\u003e10.345000\u003c/td\u003e\n      \u003ctd\u003e1.224089\u003c/td\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e9.55\u003c/td\u003e\n      \u003ctd\u003e10.45\u003c/td\u003e\n      \u003ctd\u003e11.00\u003c/td\u003e\n      \u003ctd\u003e12.6\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e163.0\u003c/td\u003e\n      \u003ctd\u003e10.152454\u003c/td\u003e\n      \u003ctd\u003e1.003217\u003c/td\u003e\n      \u003ctd\u003e8.4\u003c/td\u003e\n      \u003ctd\u003e9.40\u003c/td\u003e\n      \u003ctd\u003e10.10\u003c/td\u003e\n      \u003ctd\u003e10.75\u003c/td\u003e\n      \u003ctd\u003e13.5\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e5\u003c/th\u003e\n      \u003ctd\u003e1457.0\u003c/td\u003e\n      \u003ctd\u003e9.808840\u003c/td\u003e\n      \u003ctd\u003e0.847065\u003c/td\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e9.20\u003c/td\u003e\n      \u003ctd\u003e9.50\u003c/td\u003e\n      \u003ctd\u003e10.30\u003c/td\u003e\n      \u003ctd\u003e13.6\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e6\u003c/th\u003e\n      \u003ctd\u003e2198.0\u003c/td\u003e\n      \u003ctd\u003e10.575372\u003c/td\u003e\n      \u003ctd\u003e1.147776\u003c/td\u003e\n      \u003ctd\u003e8.5\u003c/td\u003e\n      \u003ctd\u003e9.60\u003c/td\u003e\n      \u003ctd\u003e10.50\u003c/td\u003e\n      \u003ctd\u003e11.40\u003c/td\u003e\n      \u003ctd\u003e14.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e7\u003c/th\u003e\n      \u003ctd\u003e880.0\u003c/td\u003e\n      \u003ctd\u003e11.367936\u003c/td\u003e\n      \u003ctd\u003e1.246536\u003c/td\u003e\n      \u003ctd\u003e8.6\u003c/td\u003e\n      \u003ctd\u003e10.60\u003c/td\u003e\n      \u003ctd\u003e11.40\u003c/td\u003e\n      \u003ctd\u003e12.30\u003c/td\u003e\n      \u003ctd\u003e14.2\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e8\u003c/th\u003e\n      \u003ctd\u003e175.0\u003c/td\u003e\n      \u003ctd\u003e11.636000\u003c/td\u003e\n      \u003ctd\u003e1.280138\u003c/td\u003e\n      \u003ctd\u003e8.5\u003c/td\u003e\n      \u003ctd\u003e11.00\u003c/td\u003e\n      \u003ctd\u003e12.00\u003c/td\u003e\n      \u003ctd\u003e12.60\u003c/td\u003e\n      \u003ctd\u003e14.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e9\u003c/th\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e12.180000\u003c/td\u003e\n      \u003ctd\u003e1.013410\u003c/td\u003e\n      \u003ctd\u003e10.4\u003c/td\u003e\n      \u003ctd\u003e12.40\u003c/td\u003e\n      \u003ctd\u003e12.50\u003c/td\u003e\n      \u003ctd\u003e12.70\u003c/td\u003e\n      \u003ctd\u003e12.9\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n### Red Wine\n\n\n```python\nred_wine_df.groupby(\"quality\")[\"alcohol\"].describe()\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003cth\u003e25%\u003c/th\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003cth\u003e75%\u003c/th\u003e\n      \u003cth\u003emax\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e10.0\u003c/td\u003e\n      \u003ctd\u003e9.955000\u003c/td\u003e\n      \u003ctd\u003e0.818009\u003c/td\u003e\n      \u003ctd\u003e8.4\u003c/td\u003e\n      \u003ctd\u003e9.725\u003c/td\u003e\n      \u003ctd\u003e9.925\u003c/td\u003e\n      \u003ctd\u003e10.575\u003c/td\u003e\n      \u003ctd\u003e11.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e53.0\u003c/td\u003e\n      \u003ctd\u003e10.265094\u003c/td\u003e\n      \u003ctd\u003e0.934776\u003c/td\u003e\n      \u003ctd\u003e9.0\u003c/td\u003e\n      \u003ctd\u003e9.600\u003c/td\u003e\n      \u003ctd\u003e10.000\u003c/td\u003e\n      \u003ctd\u003e11.000\u003c/td\u003e\n      \u003ctd\u003e13.1\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e5\u003c/th\u003e\n      \u003ctd\u003e681.0\u003c/td\u003e\n      \u003ctd\u003e9.899706\u003c/td\u003e\n      \u003ctd\u003e0.736521\u003c/td\u003e\n      \u003ctd\u003e8.5\u003c/td\u003e\n      \u003ctd\u003e9.400\u003c/td\u003e\n      \u003ctd\u003e9.700\u003c/td\u003e\n      \u003ctd\u003e10.200\u003c/td\u003e\n      \u003ctd\u003e14.9\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e6\u003c/th\u003e\n      \u003ctd\u003e638.0\u003c/td\u003e\n      \u003ctd\u003e10.629519\u003c/td\u003e\n      \u003ctd\u003e1.049639\u003c/td\u003e\n      \u003ctd\u003e8.4\u003c/td\u003e\n      \u003ctd\u003e9.800\u003c/td\u003e\n      \u003ctd\u003e10.500\u003c/td\u003e\n      \u003ctd\u003e11.300\u003c/td\u003e\n      \u003ctd\u003e14.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e7\u003c/th\u003e\n      \u003ctd\u003e199.0\u003c/td\u003e\n      \u003ctd\u003e11.465913\u003c/td\u003e\n      \u003ctd\u003e0.961933\u003c/td\u003e\n      \u003ctd\u003e9.2\u003c/td\u003e\n      \u003ctd\u003e10.800\u003c/td\u003e\n      \u003ctd\u003e11.500\u003c/td\u003e\n      \u003ctd\u003e12.100\u003c/td\u003e\n      \u003ctd\u003e14.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e8\u003c/th\u003e\n      \u003ctd\u003e18.0\u003c/td\u003e\n      \u003ctd\u003e12.094444\u003c/td\u003e\n      \u003ctd\u003e1.224011\u003c/td\u003e\n      \u003ctd\u003e9.8\u003c/td\u003e\n      \u003ctd\u003e11.325\u003c/td\u003e\n      \u003ctd\u003e12.150\u003c/td\u003e\n      \u003ctd\u003e12.875\u003c/td\u003e\n      \u003ctd\u003e14.0\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n### Plotting Alcohol to Quality\n\n\n```python\nsns.barplot(data=wine_df, x=\"quality\", y=\"alcohol\", hue=\"color\")\n```\n\n\n\n\n    \u003cmatplotlib.axes._subplots.AxesSubplot at 0x114e342e8\u003e\n\n\n\n\n![png](images/output_65_1.png)\n\n\n\n```python\nsns.lmplot(data=wine_df, x=\"quality\", y=\"alcohol\", hue=\"color\")\n```\n\n\n\n\n    \u003cseaborn.axisgrid.FacetGrid at 0x118bd3160\u003e\n\n\n\n\n![png](images/output_66_1.png)\n\n\n#### Alcohol to Quality relation for Wines equal or greater than 7\n\n\n```python\nhq_wines = wine_df[wine_df.quality \u003e= 7]\nsns.lmplot(data=hq_wines, x=\"quality\", y=\"alcohol\", hue=\"color\")\nsns.plt.title(\"Quality \u003e= 7\")\n```\n\n\n\n\n    \u003cmatplotlib.text.Text at 0x118fe82e8\u003e\n\n\n\n\n![png](images/output_68_1.png)\n\n\n## Heatmap Alcohol to Quality\n\n\n```python\nheat_table = wine_df[[\"quality\", \"alcohol\"]].copy()\nheat_table[\"alcohol\"] = heat_table.alcohol.apply(func=lambda x: round(x * 2) / 2)\nheat_table = heat_table.groupby([\"quality\", \"alcohol\"])[\"alcohol\"].count().reset_index(name='counts')\nsns.heatmap(heat_table.pivot(\"quality\", \"alcohol\", \"counts\"))\n```\n\n\n\n\n    \u003cmatplotlib.axes._subplots.AxesSubplot at 0x118e0eb70\u003e\n\n\n\n\n![png](images/output_70_1.png)\n\n\n# Chlorides\n\n\n```python\nsns.barplot(data=wine_df, hue=\"color\", x=\"quality\", y=\"chlorides\")\n```\n\n\n\n\n    \u003cmatplotlib.axes._subplots.AxesSubplot at 0x1192a8c18\u003e\n\n\n\n\n![png](images/output_72_1.png)\n\n\n__The less chlorides in a wine the higher the quality.__\n\n## Chlorides and Alcohol\n\n\n```python\ng = sns.PairGrid(wine_df[[\"alcohol\", \"chlorides\", \"quality\"]], hue=\"quality\")\ng = g.map_diag(plt.hist)\ng = g.map_offdiag(plt.scatter)\ng = g.add_legend()\n```\n\n\n![png](images/output_75_0.png)\n\n\n## Acids\n\n\n```python\nsns.pairplot(white_wine_df[[\"volatile acidity\", \"citric acid\", \"quality\"]], hue=\"quality\")\n```\n\n\n\n\n    \u003cseaborn.axisgrid.PairGrid at 0x1194682b0\u003e\n\n\n\n\n![png](images/output_77_1.png)\n\n\n## Bringing the relevant attributes together\n\n\n```python\nsns.pairplot(white_wine_df[[\"volatile acidity\", \"citric acid\", \"quality\", \"free sulfur dioxide\", \"chlorides\"]], hue=\"quality\")\n```\n\n\n\n\n    \u003cseaborn.axisgrid.PairGrid at 0x11a544128\u003e\n\n\n\n\n![png](images/output_79_1.png)\n\n\n\n```python\nwhite_wine_df[[\"volatile acidity\", \"citric acid\", \"quality\"]].groupby(\"quality\").describe(percentiles=[])\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth colspan=\"6\" halign=\"left\"\u003ecitric acid\u003c/th\u003e\n      \u003cth colspan=\"6\" halign=\"left\"\u003evolatile acidity\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003cth\u003emax\u003c/th\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003cth\u003emax\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e20.0\u003c/td\u003e\n      \u003ctd\u003e0.336000\u003c/td\u003e\n      \u003ctd\u003e0.081460\u003c/td\u003e\n      \u003ctd\u003e0.21\u003c/td\u003e\n      \u003ctd\u003e0.345\u003c/td\u003e\n      \u003ctd\u003e0.47\u003c/td\u003e\n      \u003ctd\u003e20.0\u003c/td\u003e\n      \u003ctd\u003e0.333250\u003c/td\u003e\n      \u003ctd\u003e0.140827\u003c/td\u003e\n      \u003ctd\u003e0.17\u003c/td\u003e\n      \u003ctd\u003e0.26\u003c/td\u003e\n      \u003ctd\u003e0.640\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e163.0\u003c/td\u003e\n      \u003ctd\u003e0.304233\u003c/td\u003e\n      \u003ctd\u003e0.163857\u003c/td\u003e\n      \u003ctd\u003e0.00\u003c/td\u003e\n      \u003ctd\u003e0.290\u003c/td\u003e\n      \u003ctd\u003e0.88\u003c/td\u003e\n      \u003ctd\u003e163.0\u003c/td\u003e\n      \u003ctd\u003e0.381227\u003c/td\u003e\n      \u003ctd\u003e0.173463\u003c/td\u003e\n      \u003ctd\u003e0.11\u003c/td\u003e\n      \u003ctd\u003e0.32\u003c/td\u003e\n      \u003ctd\u003e1.100\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e5\u003c/th\u003e\n      \u003ctd\u003e1457.0\u003c/td\u003e\n      \u003ctd\u003e0.337653\u003c/td\u003e\n      \u003ctd\u003e0.140814\u003c/td\u003e\n      \u003ctd\u003e0.00\u003c/td\u003e\n      \u003ctd\u003e0.320\u003c/td\u003e\n      \u003ctd\u003e1.00\u003c/td\u003e\n      \u003ctd\u003e1457.0\u003c/td\u003e\n      \u003ctd\u003e0.302011\u003c/td\u003e\n      \u003ctd\u003e0.100066\u003c/td\u003e\n      \u003ctd\u003e0.10\u003c/td\u003e\n      \u003ctd\u003e0.28\u003c/td\u003e\n      \u003ctd\u003e0.905\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e6\u003c/th\u003e\n      \u003ctd\u003e2198.0\u003c/td\u003e\n      \u003ctd\u003e0.338025\u003c/td\u003e\n      \u003ctd\u003e0.119325\u003c/td\u003e\n      \u003ctd\u003e0.00\u003c/td\u003e\n      \u003ctd\u003e0.320\u003c/td\u003e\n      \u003ctd\u003e1.66\u003c/td\u003e\n      \u003ctd\u003e2198.0\u003c/td\u003e\n      \u003ctd\u003e0.260564\u003c/td\u003e\n      \u003ctd\u003e0.088142\u003c/td\u003e\n      \u003ctd\u003e0.08\u003c/td\u003e\n      \u003ctd\u003e0.25\u003c/td\u003e\n      \u003ctd\u003e0.965\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e7\u003c/th\u003e\n      \u003ctd\u003e880.0\u003c/td\u003e\n      \u003ctd\u003e0.325625\u003c/td\u003e\n      \u003ctd\u003e0.079183\u003c/td\u003e\n      \u003ctd\u003e0.01\u003c/td\u003e\n      \u003ctd\u003e0.310\u003c/td\u003e\n      \u003ctd\u003e0.74\u003c/td\u003e\n      \u003ctd\u003e880.0\u003c/td\u003e\n      \u003ctd\u003e0.262767\u003c/td\u003e\n      \u003ctd\u003e0.091106\u003c/td\u003e\n      \u003ctd\u003e0.08\u003c/td\u003e\n      \u003ctd\u003e0.25\u003c/td\u003e\n      \u003ctd\u003e0.760\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e8\u003c/th\u003e\n      \u003ctd\u003e175.0\u003c/td\u003e\n      \u003ctd\u003e0.326514\u003c/td\u003e\n      \u003ctd\u003e0.085439\u003c/td\u003e\n      \u003ctd\u003e0.04\u003c/td\u003e\n      \u003ctd\u003e0.320\u003c/td\u003e\n      \u003ctd\u003e0.74\u003c/td\u003e\n      \u003ctd\u003e175.0\u003c/td\u003e\n      \u003ctd\u003e0.277400\u003c/td\u003e\n      \u003ctd\u003e0.108029\u003c/td\u003e\n      \u003ctd\u003e0.12\u003c/td\u003e\n      \u003ctd\u003e0.26\u003c/td\u003e\n      \u003ctd\u003e0.660\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e9\u003c/th\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e0.386000\u003c/td\u003e\n      \u003ctd\u003e0.082037\u003c/td\u003e\n      \u003ctd\u003e0.29\u003c/td\u003e\n      \u003ctd\u003e0.360\u003c/td\u003e\n      \u003ctd\u003e0.49\u003c/td\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e0.298000\u003c/td\u003e\n      \u003ctd\u003e0.057619\u003c/td\u003e\n      \u003ctd\u003e0.24\u003c/td\u003e\n      \u003ctd\u003e0.27\u003c/td\u003e\n      \u003ctd\u003e0.360\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n\n```python\nwhite_wine_df[[\"quality\", \"free sulfur dioxide\", \"chlorides\"]].groupby(\"quality\").describe(percentiles=[])\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth colspan=\"6\" halign=\"left\"\u003echlorides\u003c/th\u003e\n      \u003cth colspan=\"6\" halign=\"left\"\u003efree sulfur dioxide\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003cth\u003emax\u003c/th\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003cth\u003emax\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003e\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003e3\u003c/th\u003e\n      \u003ctd\u003e20.0\u003c/td\u003e\n      \u003ctd\u003e0.054300\u003c/td\u003e\n      \u003ctd\u003e0.046468\u003c/td\u003e\n      \u003ctd\u003e0.022\u003c/td\u003e\n      \u003ctd\u003e0.041\u003c/td\u003e\n      \u003ctd\u003e0.244\u003c/td\u003e\n      \u003ctd\u003e20.0\u003c/td\u003e\n      \u003ctd\u003e53.325000\u003c/td\u003e\n      \u003ctd\u003e69.420776\u003c/td\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e33.5\u003c/td\u003e\n      \u003ctd\u003e289.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e4\u003c/th\u003e\n      \u003ctd\u003e163.0\u003c/td\u003e\n      \u003ctd\u003e0.050098\u003c/td\u003e\n      \u003ctd\u003e0.025888\u003c/td\u003e\n      \u003ctd\u003e0.013\u003c/td\u003e\n      \u003ctd\u003e0.046\u003c/td\u003e\n      \u003ctd\u003e0.290\u003c/td\u003e\n      \u003ctd\u003e163.0\u003c/td\u003e\n      \u003ctd\u003e23.358896\u003c/td\u003e\n      \u003ctd\u003e20.391349\u003c/td\u003e\n      \u003ctd\u003e3.0\u003c/td\u003e\n      \u003ctd\u003e18.0\u003c/td\u003e\n      \u003ctd\u003e138.5\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e5\u003c/th\u003e\n      \u003ctd\u003e1457.0\u003c/td\u003e\n      \u003ctd\u003e0.051546\u003c/td\u003e\n      \u003ctd\u003e0.026496\u003c/td\u003e\n      \u003ctd\u003e0.009\u003c/td\u003e\n      \u003ctd\u003e0.047\u003c/td\u003e\n      \u003ctd\u003e0.346\u003c/td\u003e\n      \u003ctd\u003e1457.0\u003c/td\u003e\n      \u003ctd\u003e36.432052\u003c/td\u003e\n      \u003ctd\u003e18.145991\u003c/td\u003e\n      \u003ctd\u003e2.0\u003c/td\u003e\n      \u003ctd\u003e35.0\u003c/td\u003e\n      \u003ctd\u003e131.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e6\u003c/th\u003e\n      \u003ctd\u003e2198.0\u003c/td\u003e\n      \u003ctd\u003e0.045217\u003c/td\u003e\n      \u003ctd\u003e0.020453\u003c/td\u003e\n      \u003ctd\u003e0.015\u003c/td\u003e\n      \u003ctd\u003e0.043\u003c/td\u003e\n      \u003ctd\u003e0.255\u003c/td\u003e\n      \u003ctd\u003e2198.0\u003c/td\u003e\n      \u003ctd\u003e35.650591\u003c/td\u003e\n      \u003ctd\u003e15.735679\u003c/td\u003e\n      \u003ctd\u003e3.0\u003c/td\u003e\n      \u003ctd\u003e34.0\u003c/td\u003e\n      \u003ctd\u003e112.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e7\u003c/th\u003e\n      \u003ctd\u003e880.0\u003c/td\u003e\n      \u003ctd\u003e0.038191\u003c/td\u003e\n      \u003ctd\u003e0.010697\u003c/td\u003e\n      \u003ctd\u003e0.012\u003c/td\u003e\n      \u003ctd\u003e0.037\u003c/td\u003e\n      \u003ctd\u003e0.135\u003c/td\u003e\n      \u003ctd\u003e880.0\u003c/td\u003e\n      \u003ctd\u003e34.125568\u003c/td\u003e\n      \u003ctd\u003e13.244737\u003c/td\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e33.0\u003c/td\u003e\n      \u003ctd\u003e108.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e8\u003c/th\u003e\n      \u003ctd\u003e175.0\u003c/td\u003e\n      \u003ctd\u003e0.038314\u003c/td\u003e\n      \u003ctd\u003e0.013164\u003c/td\u003e\n      \u003ctd\u003e0.014\u003c/td\u003e\n      \u003ctd\u003e0.036\u003c/td\u003e\n      \u003ctd\u003e0.121\u003c/td\u003e\n      \u003ctd\u003e175.0\u003c/td\u003e\n      \u003ctd\u003e36.720000\u003c/td\u003e\n      \u003ctd\u003e16.203675\u003c/td\u003e\n      \u003ctd\u003e6.0\u003c/td\u003e\n      \u003ctd\u003e35.0\u003c/td\u003e\n      \u003ctd\u003e105.0\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e9\u003c/th\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e0.027400\u003c/td\u003e\n      \u003ctd\u003e0.007436\u003c/td\u003e\n      \u003ctd\u003e0.018\u003c/td\u003e\n      \u003ctd\u003e0.031\u003c/td\u003e\n      \u003ctd\u003e0.035\u003c/td\u003e\n      \u003ctd\u003e5.0\u003c/td\u003e\n      \u003ctd\u003e33.400000\u003c/td\u003e\n      \u003ctd\u003e13.427584\u003c/td\u003e\n      \u003ctd\u003e24.0\u003c/td\u003e\n      \u003ctd\u003e28.0\u003c/td\u003e\n      \u003ctd\u003e57.0\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n# Final Conclusion\n\nno attribute alone is strong enough to define a high quality wine, but as the figures show. For a wine to score high, having the acids and sulfur dioxide values all within in a certain range can help.\n\n## Best vs. Worst\n\nwhen comparing the best (8,9) vs. worst (3,4) we can see that they well overlap each other.\n\n\n```python\nqual3 = white_wine_df[white_wine_df[\"quality\"] == 3]\nqual4 = white_wine_df[white_wine_df[\"quality\"] == 4].copy()\nqual4.quality = 3\nqual8 = white_wine_df[white_wine_df[\"quality\"] == 8]\nqual9 = white_wine_df[white_wine_df[\"quality\"] == 9].copy()\nqual9.quality = 8\nwhite_wine_sample = pd.concat([qual3, qual4, qual8, qual9], ignore_index=True)\nwhite_wine_sample\nsns.pairplot(white_wine_sample[[\"volatile acidity\", \"citric acid\", \"quality\"]], hue=\"quality\")\n```\n\n\n\n\n    \u003cseaborn.axisgrid.PairGrid at 0x11cc89ef0\u003e\n\n\n\n\n![png](images/output_83_1.png)\n\n\n\n```python\nsns.pairplot(white_wine_sample[[\"volatile acidity\", \"citric acid\", \"quality\", \"sulphates\", \"chlorides\"]], hue=\"quality\")\n```\n\n\n\n\n    \u003cseaborn.axisgrid.PairGrid at 0x11d07cb00\u003e\n\n\n\n\n![png](images/output_84_1.png)\n\n\n# Seems like labs can't measure a wine's inner spirit (yet).\n\n__But if you have to pick a wine only based on specs, i would suggest white wines close to this values:__\n\n\n```python\nqual8[[\"quality\", \"chlorides\", \"alcohol\", \"citric acid\", \"sulphates\"]].describe(percentiles=[])\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003echlorides\u003c/th\u003e\n      \u003cth\u003ealcohol\u003c/th\u003e\n      \u003cth\u003ecitric acid\u003c/th\u003e\n      \u003cth\u003esulphates\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003ctd\u003e175.0\u003c/td\u003e\n      \u003ctd\u003e175.000000\u003c/td\u003e\n      \u003ctd\u003e175.000000\u003c/td\u003e\n      \u003ctd\u003e175.000000\u003c/td\u003e\n      \u003ctd\u003e175.000000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.038314\u003c/td\u003e\n      \u003ctd\u003e11.636000\u003c/td\u003e\n      \u003ctd\u003e0.326514\u003c/td\u003e\n      \u003ctd\u003e0.486229\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003ctd\u003e0.0\u003c/td\u003e\n      \u003ctd\u003e0.013164\u003c/td\u003e\n      \u003ctd\u003e1.280138\u003c/td\u003e\n      \u003ctd\u003e0.085439\u003c/td\u003e\n      \u003ctd\u003e0.147073\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.014000\u003c/td\u003e\n      \u003ctd\u003e8.500000\u003c/td\u003e\n      \u003ctd\u003e0.040000\u003c/td\u003e\n      \u003ctd\u003e0.250000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.036000\u003c/td\u003e\n      \u003ctd\u003e12.000000\u003c/td\u003e\n      \u003ctd\u003e0.320000\u003c/td\u003e\n      \u003ctd\u003e0.460000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emax\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.121000\u003c/td\u003e\n      \u003ctd\u003e14.000000\u003c/td\u003e\n      \u003ctd\u003e0.740000\u003c/td\u003e\n      \u003ctd\u003e0.950000\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\n\n__and red wines close to this values:__\n\n\n\n```python\nrqual8 = red_wine_df[red_wine_df[\"quality\"] \u003e= 8]\nrqual8[[\"quality\", \"chlorides\", \"alcohol\", \"citric acid\", \"sulphates\"]].describe(percentiles=[])\n```\n\n\n\n\n\u003cdiv\u003e\n\u003ctable border=\"1\" class=\"dataframe\"\u003e\n  \u003cthead\u003e\n    \u003ctr style=\"text-align: right;\"\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003equality\u003c/th\u003e\n      \u003cth\u003echlorides\u003c/th\u003e\n      \u003cth\u003ealcohol\u003c/th\u003e\n      \u003cth\u003ecitric acid\u003c/th\u003e\n      \u003cth\u003esulphates\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003cth\u003ecount\u003c/th\u003e\n      \u003ctd\u003e18.0\u003c/td\u003e\n      \u003ctd\u003e18.000000\u003c/td\u003e\n      \u003ctd\u003e18.000000\u003c/td\u003e\n      \u003ctd\u003e18.000000\u003c/td\u003e\n      \u003ctd\u003e18.000000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emean\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.068444\u003c/td\u003e\n      \u003ctd\u003e12.094444\u003c/td\u003e\n      \u003ctd\u003e0.391111\u003c/td\u003e\n      \u003ctd\u003e0.767778\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003estd\u003c/th\u003e\n      \u003ctd\u003e0.0\u003c/td\u003e\n      \u003ctd\u003e0.011678\u003c/td\u003e\n      \u003ctd\u003e1.224011\u003c/td\u003e\n      \u003ctd\u003e0.199526\u003c/td\u003e\n      \u003ctd\u003e0.115379\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emin\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.044000\u003c/td\u003e\n      \u003ctd\u003e9.800000\u003c/td\u003e\n      \u003ctd\u003e0.030000\u003c/td\u003e\n      \u003ctd\u003e0.630000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e50%\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.070500\u003c/td\u003e\n      \u003ctd\u003e12.150000\u003c/td\u003e\n      \u003ctd\u003e0.420000\u003c/td\u003e\n      \u003ctd\u003e0.740000\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003emax\u003c/th\u003e\n      \u003ctd\u003e8.0\u003c/td\u003e\n      \u003ctd\u003e0.086000\u003c/td\u003e\n      \u003ctd\u003e14.000000\u003c/td\u003e\n      \u003ctd\u003e0.720000\u003c/td\u003e\n      \u003ctd\u003e1.100000\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonneutert%2Fwine_quality_data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonneutert%2Fwine_quality_data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonneutert%2Fwine_quality_data/lists"}